Chapter 19. Mapping Domain Objects to the Index Structure
19.1. Basic Mapping
19.1.1. Basic Mapping
In Red Hat JBoss Data Grid, the identifier for all @Indexed
objects is the key used to store the value. How the key is indexed can still be customized by using a combination of @Transformable
, @ProvidedId
, custom types and custom FieldBridge
implementations.
The @DocumentId
identifier does not apply to JBoss Data Grid values.
The Lucene-based Query API uses the following common annotations to map entities:
- @Indexed
- @Field
- @NumericField
19.1.2. @Indexed
The @Indexed
annotation declares a cached entry indexable. All entries not annotated with @Indexed
are ignored.
Making a class indexable with @Indexed
@Indexed public class Essay { }
Optionally, specify the index
attribute of the @Indexed annotation to change the default name of the index.
19.1.3. @Field
Each property or attribute of an entity can be indexed. Properties and attributes are not annotated by default, and therefore are ignored by the indexing process. The @Field
annotation declares a property as indexed and allows the configuration of several aspects of the indexing process by setting one or more of the following attributes:
name
- The name under which the property will be stored in the Lucene Document. By default, this attribute is the same as the property name, following the JavaBeans convention.
store
Specifies if the property is stored in the Lucene index. When a property is stored it can be retrieved in its original value from the Lucene Document. This is regardless of whether or not the element is indexed. Valid options are:
-
Store.YES
: Consumes more index space but allows projection. See Projection. -
Store.COMPRESS
: Stores the property as compressed. This attribute consumes more CPU. -
Store.NO
: No storage. This is the default setting for the store attribute.
-
index
Describes if property is indexed or not. The following values are applicable:
-
Index.NO
: No indexing is applied; cannot be found by querying. This setting is used for properties that are not required to be searchable, but are able to be projected. -
Index.YES
: The element is indexed and is searchable. This is the default setting for the index attribute.
-
analyze
Determines if the property is analyzed. The analyze attribute allows a property to be searched by its contents. For example, it may be worthwhile to analyze a text field, whereas a date field does not need to be analyzed. Enable or disable the Analyze attribute using the following:
-
Analyze.YES
-
Analyze.NO
-
The analyze attribute is enabled by default. The Analyze.YES
setting requires the property to be indexed via the Index.YES
attribute.
It is not possible to use relational operators if properties are analyzed with the @Field(analyze=Analyze.YES)
annotation.
The following attributes are used for sorting, and must not be analyzed.
norms
Determines whether or not to store index time boosting information. Valid settings are:
-
Norms.YES
-
Norms.NO
-
The default for this attribute is Norms.YES
. Disabling norms conserves memory, however no index time boosting information will be available.
termVector
Describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value is
TermVector.NO
. Available settings for this attribute are:-
TermVector.YES
: Stores the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term’s frequency. -
TermVector.NO
: Does not store term vectors. -
TermVector.WITH_OFFSETS
: Stores the term vector and token offset information. This is the same asTermVector.YES
plus it contains the starting and ending offset position information for the terms. -
TermVector.WITH_POSITIONS
: Stores the term vector and token position information. This is the same asTermVector.YES
plus it contains the ordinal positions of each occurrence of a term in a document. -
TermVector.WITH_POSITION_OFFSETS
: Stores the term vector, token position and offset information. This is a combination of theYES
,WITH_OFFSETS
, andWITH_POSITIONS
.
-
indexNullAs
This attribute provides replacement values for null properties. The value must conform to the following format requirements:
- String values have no format requirement.
-
Numeric values must use formats accepted by
Double.parseDouble()
,Integer.parseInteger()
, and other primitive parsing methods, depending on the field type. -
Boolean values must be either
true
orfalse
. -
Date values, such as
java.util.Calendar
,java.util.Date
, andjava.time.*
, must use the ISO-8601 format.
19.1.4. @NumericField
The @NumericField
annotation can be specified in the same scope as @Field
.
The @NumericField
annotation can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field
properties. The @NumericField
annotation accept the following optional parameters:
-
forField
: Specifies the name of the related@Field
that will be indexed as numeric. It is mandatory when a property contains more than a@Field
declaration. -
precisionStep
: Changes the way that the Trie structure is stored in the index. SmallerprecisionSteps
lead to more disk space usage, and faster range and sort queries. Larger values lead to less space used, and range query performance closer to the range query in normal@Fields
. The default value forprecisionStep
is 4.
@NumericField
supports only Double
, Long
, Integer
, and Float
. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, therefore remaining types must use the string encoding via the default or custom TwoWayFieldBridge
.
Custom NumericFieldBridge
can also be used. Custom configurations require approximation during type transformation. The following is an example defines a custom NumericFieldBridge
.
Defining a custom NumericFieldBridge
public class BigDecimalNumericFieldBridge extends NumericFieldBridge { private static final BigDecimal storeFactor = BigDecimal.valueOf(100); @Override public void set(String name, Object value, Document document, LuceneOptions luceneOptions) { if (value != null) { BigDecimal decimalValue = (BigDecimal) value; Long indexedValue = Long.valueOf( decimalValue .multiply(storeFactor) .longValue()); luceneOptions.addNumericFieldToDocument(name, indexedValue, document); } } @Override public Object get(String name, Document document) { String fromLucene = document.get(name); BigDecimal storedBigDecimal = new BigDecimal(fromLucene); return storedBigDecimal.divide(storeFactor); } }
19.2. Mapping Properties Multiple Times
Properties may need to be mapped multiple times per index, using different indexing strategies. For example, sorting a query by field requires that the field is not analyzed. To search by words in this property and sort it, the property will need to be indexed twice - once analyzed and once un-analyzed. @Fields
can be used to perform this search. For example:
Using @Fields to map a property multiple times
@Indexed(index = "Book") public class Book { @Fields( { @Field, @Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES) }) public String getSummary() { return summary; } }
In the example above, the field summary
is indexed twice - once as summary
in a tokenized way, and once as summary_forSort
in an untokenized way. @Field
supports 2 attributes useful when @Fields
is used:
- analyzer: defines a @Analyzer annotation per field rather than per property
- bridge: defines a @FieldBridge annotation per field rather than per property
19.3. Embedded and Associated Objects
19.3.1. Embedded and Associated Objects
Associated objects and embedded objects can be indexed as part of the root entity index. This allows searches of an entity based on properties of associated objects.
19.3.2. Indexing Associated Objects
The aim of the following example is to return places where the associated city is Atlanta via the Lucene query address.city:Atlanta
. The place fields are indexed in the Place
index. The Place
index documents also contain the following fields:
-
address.street
-
address.city
These fields are also able to be queried.
Indexing associations
@Indexed public class Place { @Field private String name; @IndexedEmbedded @ManyToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE}) private Address address; } public class Address { @Field private String street; @Field private String city; @ContainedIn @OneToMany(mappedBy = "address") private Set<Place> places; }
19.3.3. @IndexedEmbedded
When using the @IndexedEmbedded
technique, data is denormalized in the Lucene index. As a result, the Lucene-based Query API must be updated with any changes in the Place
and Address
objects to keep the index up to date. Ensure the Place
Lucene document is updated when its Address
changes by marking the other side of the bidirectional relationship with @ContainedIn
. @ContainedIn
can be used for both associations pointing to entities and on embedded objects.
The @IndexedEmbedded
annotation can be nested. Attributes can be annotated with @IndexedEmbedded
. The attributes of the associated class are then added to the main entity index. In the following example, the index will contain the following fields:
- name
- address.street
- address.city
- address.ownedBy_name
Nested usage of @IndexedEmbedded
and @ContainedIn
@Indexed public class Place { @Field private String name; @IndexedEmbedded @ManyToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE}) private Address address; } public class Address { @Field private String street; @Field private String city; @IndexedEmbedded(depth = 1, prefix = "ownedBy_") private Owner ownedBy; @ContainedIn @OneToMany(mappedBy = "address") private Set<Place> places; } public class Owner { @Field private String name; }
The default prefix is propertyName
, following the traditional object navigation convention. This can be overridden using the prefix attribute as it is shown on the ownedBy
property.
The prefix cannot be set to the empty string.
The depth
property is used when the object graph contains a cyclic dependency of classes. For example, if Owner
points to Place
. the Query Module stops including attributes after reaching the expected depth, or object graph boundaries. A self-referential class is an example of cyclic dependency. In the provided example, because depth is set to 1, any @IndexedEmbedded
attribute in Owner
is ignored.
Using @IndexedEmbedded
for object associations allows queries to be expressed using Lucene’s query syntax. For example:
Return places where name contains JBoss and where address city is Atlanta. In Lucene query this is:
+name:jboss +address.city:atlanta
Return places where name contains JBoss and where owner’s name contain Joe. In Lucene query this is:
+name:jboss +address.ownedBy_name:joe
This operation is similar to the relational join operation, without data duplication. Out of the box, Lucene indexes have no notion of association; the join operation does not exist. It may be beneficial to maintain the normalized relational model while benefiting from the full text index speed and feature richness.
An associated object can be also be @Indexed
. When @IndexedEmbedded
points to an entity, the association must be directional and the other side must be annotated using @ContainedIn
. If not, the Lucene-based Query API cannot update the root index when the associated entity is updated. In the provided example, a Place
index document is updated when the associated Address instance updates.
19.3.4. The targetElement Property
It is possible to override the object type targeted using the targetElement
parameter. This method can be used when the object type annotated by @IndexedEmbedded
is not the object type targeted by the data grid and the Lucene-based Query API. This occurs when interfaces are used instead of their implementation.
Using the targetElement
property of @IndexedEmbedded
@Indexed public class Address { @Field private String street; @IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class) private Person ownedBy; ... } public class Owner implements Person { ... }
19.4. Boosting
19.4.1. Boosting
Lucene uses boosting to attach more importance to specific fields or documents over others. Lucene differentiates between index and search-time boosting.
19.4.2. Static Index Time Boosting
The @Boost
annotation is used to define a static boost value for an indexed class or property. This annotation can be used within @Field
, or can be specified directly on the method or class level.
In the following example:
- the probability of Essay reaching the top of the search list will be multiplied by 1.7.
-
@Field.boost
and@Boost
on a property are cumulative, therefore the summary field will be 3.0 (2 x 1.5), and more important than the ISBN field. - The text field is 1.2 times more important than the ISBN field.
Different ways of using @Boost
@Indexed @Boost(1.7f) public class Essay { @Field(name = "Abstract", store=Store.YES, boost = @Boost(2f)) @Boost(1.5f) public String getSummary() { return summary; } @Field(boost = @Boost(1.2f)) public String getText() { return text; } @Field public String getISBN() { return isbn; } }
19.4.3. Dynamic Index Time Boosting
The @Boost
annotation defines a static boost factor that is independent of the state of the indexed entity at runtime. However, in some cases the boost factor may depend on the actual state of the entity. In this case, use the @DynamicBoost
annotation together with an accompanying custom BoostStrategy
.
@Boost
and @DynamicBoost
annotations can both be used in relation to an entity, and all defined boost factors are cumulative. The @DynamicBoost
can be placed at either class or field level.
In the following example, a dynamic boost is defined on class level specifying VIPBoostStrategy
as implementation of the BoostStrategy
interface used at indexing time. Depending on the annotation placement, either the whole entity is passed to the defineBoost
method or only the annotated field/property value. The passed object must be cast to the correct type.
Dynamic boost example
public enum PersonType { NORMAL, VIP } @Indexed @DynamicBoost(impl = VIPBoostStrategy.class) public class Person { private PersonType type; } public class VIPBoostStrategy implements BoostStrategy { public float defineBoost(Object value) { Person person = (Person) value; if (person.getType().equals(PersonType.VIP)) { return 2.0f; } else { return 1.0f; } } }
In the provided example all indexed values of a VIP would be twice the importance of the values of a non-VIP.
The specified BoostStrategy
implementation must define a public no argument constructor.
19.5. Analysis
Analysis is the process of converting text strings into single terms that you can index and then query.
19.5.1. Default Analyzer and Analyzer by Class
The default analyzer class is used to index tokenized fields, and is configurable through the default.analyzer
property. The default value for this property is org.apache.lucene.analysis.standard.StandardAnalyzer
.
The analyzer class can be defined per entity, property, and per @Field
, which is useful when multiple fields are indexed from a single property.
In the following example, EntityAnalyzer
is used to index all tokenized properties, such as name except, summary and body, which are indexed with PropertyAnalyzer
and FieldAnalyzer
respectively.
Different ways of using @Analyzer
@Indexed @Analyzer(impl = EntityAnalyzer.class) public class MyEntity { @Field private String name; @Field @Analyzer(impl = PropertyAnalyzer.class) private String summary; @Field(analyzer = @Analyzer(impl = FieldAnalyzer.class)) private String body; }
Avoid using different analyzers on a single entity. Doing so can create complications in building queries, and make results less predictable, particularly if using a QueryParser
. Use the same analyzer for indexing and querying on any field.
19.5.2. Named Analyzers
The Query Module uses analyzer definitions to deal with the complexity of the Analyzer function. Analyzer definitions are reusable by multiple @Analyzer
declarations and includes the following:
- a name: the unique string used to refer to the definition.
-
a list of
CharFilters
: eachCharFilter
is responsible to pre-process input characters before the tokenization.CharFilters
can add, change, or remove characters. One common usage is for character normalization. -
a
Tokenizer
: responsible for tokenizing the input stream into individual words. -
a list of filters: each filter is responsible to remove, modify, or sometimes add words into the stream provided by the
Tokenizer
.
The Analyzer
separates these components into multiple tasks, allowing individual components to be reused and components to be built with flexibility using the following procedure:
The Analyzer Process
-
The
CharFilters
process the character input. -
Tokenizer
converts the character input into tokens. -
The tokens are the processed by the
TokenFilters
.
The Lucene-based Query API supports this infrastructure by utilizing the Solr analyzer framework.
JBoss Data Grid provides a set of default analyzers as follows:
Definition | Description |
---|---|
| Splits text fields into tokens, treating whitespace and punctuation as delimiters. |
| Tokenizes input streams by delimiting at non-letters and then converting all letters to lowercase characters. Whitespace and non-letters are discarded. |
| Splits text streams on whitespace and returns sequences of non-whitespace characters as tokens. |
| Treats entire text fields as single tokens. |
| Stems English words using the Snowball Porter filter. |
| Generates n-gram tokens that are 3 grams in size by default. |
These analyzer definitions are based on Apache Lucene and are provided "as-is". For more information about tokenizers, filters, and CharFilters, see the appropriate Lucene documentation.
If you require custom analyzer definitions, create an implementation of the ProgrammaticSearchMappingProvider
interface packaged in a JAR
and deploy it to JBoss Data Grid. You must also specify the JAR
in the cache container configuration, for example:
<cache-container name="mycache" default-cache="default"> <modules> <module name="my.analyzers.jar"/> </modules> ...</analysis_default_analyzers>
19.5.3. Referencing Analyzer Definitions
Use the @Analyzer
annotation to reference an analyzer definition.
Referencing an analyzer by name
@Indexed @AnalyzerDef(name = "standard") public class Team { @Field private String name; @Field private String location; @Field @Analyzer(definition = "standard") private String description; }
Analyzer instances declared by @AnalyzerDef
are also available by their name in the SearchFactory
, which is useful when building queries.
Analyzer analyzer = Search.getSearchManager(cache).getAnalyzer("standard")
When querying, fields must use the same analyzer that has been used to index the field. The same tokens are reused between the query and the indexing process.
19.5.4. @AnalyzerDef for Solr
When using Maven all required Apache Solr dependencies are now defined as dependencies of the artifact org.hibernate:hibernate-search-analyzers
. Add the following dependency:
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-analyzers</artifactId> <version>${version.hibernate.search}</version> <dependency>
In the following example, a CharFilter
is defined by its factory. In this example, a mapping char filter is used, which will replace characters in the input based on the rules specified in the mapping file. Finally, a list of filters is defined by their factories. In this example, the StopFilter
filter is built reading the dedicated words property file. The filter will ignore case.
@AnalyzerDef and the Solr framework
Configure the CharFilter
Define a
CharFilter
by factory. In this example, a mappingCharFilter
is used, which will replace characters in the input based on the rules specified in the mapping file.@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) },
Define the Tokenizer
A
Tokenizer
is then defined using theStandardTokenizerFactory.class
.@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class)
List of Filters
Define a list of filters by their factories. In this example, the
StopFilter
filter is built reading the dedicated words property file. The filter will ignore case.@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name = "words", value= "org/hibernate/search/test/analyzer/solr/stoplist.properties" ), @Parameter(name = "ignoreCase", value = "true") }) }) public class Team { }
Filters and CharFilters
are applied in the order they are defined in the @AnalyzerDef
annotation.
19.5.5. Loading Analyzer Resources
Tokenizers
, TokenFilters
, and CharFilters
can load resources such as configuration or metadata files using the StopFilterFactory.class
or the synonym filter. The virtual machine default can be explicitly specified by adding a resource_charset
parameter.
Use a specific charset to load the property file
@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name="words", value= "org/hibernate/search/test/analyzer/solr/stoplist.properties"), @Parameter(name = "resource_charset", value = "UTF-16BE"), @Parameter(name = "ignoreCase", value = "true") }) }) public class Team { }
19.5.6. Dynamic Analyzer Selection
The Query Module uses the @AnalyzerDiscriminator
annotation to enable the dynamic analyzer selection.
An analyzer can be selected based on the current state of an entity that is to be indexed. This is particularly useful in multilingual applications. For example, when using the BlogEntry
class, the analyzer can depend on the language property of the entry. Depending on this property, the correct language-specific stemmer can then be chosen to index the text.
An implementation of the Discriminator
interface must return the name of an existing Analyzer definition, or null if the default analyzer is not overridden.
The following example assumes that the language parameter is either 'de
' or 'en
', which is specified in the @AnalyzerDefs
.
Configure the @AnalyzerDiscriminator
Predefine Dynamic Analyzers
The
@AnalyzerDiscriminator
requires that all analyzers that are to be used dynamically are predefined via@AnalyzerDef
. The@AnalyzerDiscriminator
annotation can then be placed either on the class, or on a specific property of the entity, in order to dynamically select an analyzer. An implementation of theDiscriminator
interface can be specified using the@AnalyzerDiscriminator
impl
parameter.@Indexed @AnalyzerDefs({ @AnalyzerDef(name = "en", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = EnglishPorterFilterFactory.class) }), @AnalyzerDef(name = "de", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = GermanStemFilterFactory.class) }) }) public class BlogEntry { @Field @AnalyzerDiscriminator(impl = LanguageDiscriminator.class) private String language; @Field private String text; private Set<BlogEntry> references; // standard getter/setter }
Implement the Discriminator Interface
Implement the
getAnalyzerDefinitionName()
method, which is called for each field added to the Lucene document. The entity being indexed is also passed to the interface method.The
value
parameter is set if the@AnalyzerDiscriminator
is placed on the property level instead of the class level. In this example, the value represents the current value of this property.public class LanguageDiscriminator implements Discriminator { public String getAnalyzerDefinitionName(Object value, Object entity, String field) { if (value == null || !(entity instanceof Article)) { return null; } return (String) value; } }
19.5.7. Retrieving an Analyzer
Retrieving an analyzer can be used when multiple analyzers have been used in a domain model, in order to benefit from stemming or phonetic approximation, etc. In this case, use the same analyzers to building a query. Alternatively, use the Lucene-based Query API, which selects the correct analyzer automatically. See Building a Lucene Query.
The scoped analyzer for a given entity can be retrieved using either the Lucene programmatic API or the Lucene query parser. A scoped analyzer applies the right analyzers depending on the field indexed. Multiple analyzers can be defined on a given entity, each working on an individual field. A scoped analyzer unifies these analyzers into a context-aware analyzer.
In the following example, the song title is indexed in two fields:
-
Standard analyzer: used in the
title
field. -
Stemming analyzer: used in the
title_stemmed
field.
Using the analyzer provided by the search factory, the query uses the appropriate analyzer depending on the field targeted.
Using the scoped analyzer when building a full-text query
SearchManager manager = Search.getSearchManager(cache); org.apache.lucene.queryparser.classic.QueryParser parser = new QueryParser( org.apache.lucene.util.Version.LUCENE_5_5_1, "title", manager.getAnalyzer(Song.class) ); org.apache.lucene.search.Query luceneQuery = parser.parse("title:sky Or title_stemmed:diamond"); // wrap Lucene query in a org.infinispan.query.CacheQuery CacheQuery cacheQuery = manager.getQuery(luceneQuery, Song.class); List result = cacheQuery.list(); //return the list of matching objects
Analyzers defined via @AnalyzerDef
can also be retrieved by their definition name using searchManager.getAnalyzer(String)
.
19.6. Bridge
19.6.1. Bridges
When mapping entities, Lucene represents all index fields as strings. All entity properties annotated with @Field
are converted to strings to be indexed. Built-in bridges automatically translates properties for the Lucene-based Query API. The bridges can be customized to gain control over the translation process.
19.6.2. Built-in Bridges
The Lucene-based Query API includes a set of built-in bridges between a Java property type and its full text representation.
- null
-
Per default
null
elements are not indexed. Lucene does not support null elements. However, in some situation it can be useful to insert a custom token representing thenull
value. See @Field for more information. - java.lang.String
Strings are indexed, as are:
-
short
,Short
-
integer
,Integer
-
long
,Long
-
float
,Float
-
double
,Double
-
BigInteger
-
BigDecimal
Numbers are converted into their string representation. Note that numbers cannot be compared by Lucene, or used in ranged queries out of the box, and must be padded
-
Using a Range query has disadvantages. An alternative approach is to use a Filter query which will filter the result query to the appropriate range.
The Query Module supports using a custom StringBridge
. See Custom Bridges.
- java.util.Date
Dates are stored as
yyyyMMddHHmmssSSS
in GMT time (200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). When using aTermRangeQuery
, dates are expressed in GMT.@DateBridge
defines the appropriate resolution to store in the index, for example:@DateBridge(resolution=Resolution.DAY)
. The date pattern will then be truncated accordingly.@Indexed public class Meeting { @Field(analyze=Analyze.NO) @DateBridge(resolution=Resolution.MINUTE) private Date date;
The default
Date
bridge uses Lucene’sDateTools
to convert from and toString
. All dates are expressed in GMT time. Implement a custom date bridge in order to store dates in a fixed time zone.- java.net.URI, java.net.URL
- URI and URL are converted to their string representation
- java.lang.Class
- Class are converted to their fully qualified class name. The thread context classloader is used when the class is rehydrated
19.6.3. Custom Bridges
19.6.3.1. Custom Bridges
Custom bridges are available in situations where built-in bridges, or the bridge’s String representation, do not sufficiently address the required property types.
19.6.3.2. FieldBridge
For improved flexibility, a bridge can be implemented as a FieldBridge
. The FieldBridge
interface provides a property value, which can then be mapped in the Lucene Document
. For example, a property can be stored in two different document fields.
Implementing the FieldBridge Interface
public class DateSplitBridge implements FieldBridge { private final static TimeZone GMT = TimeZone.getTimeZone("GMT"); public void set(String name, Object value, Document document, LuceneOptions luceneOptions) { Date date = (Date) value; Calendar cal = GregorianCalendar.getInstance(GMT); cal.setTime(date); int year = cal.get(Calendar.YEAR); int month = cal.get(Calendar.MONTH) + 1; int day = cal.get(Calendar.DAY_OF_MONTH); // set year luceneOptions.addFieldToDocument( name + ".year", String.valueOf(year), document); // set month and pad it if needed luceneOptions.addFieldToDocument( name + ".month", month < 10 ? "0" : "" + String.valueOf(month), document); // set day and pad it if needed luceneOptions.addFieldToDocument( name + ".day", day < 10 ? "0" : "" + String.valueOf(day), document); } } //property @FieldBridge(impl = DateSplitBridge.class) private Date date;
In the following example, the fields are not added directly to the Lucene Document
. Instead the addition is delegated to the LuceneOptions
helper. The helper will apply the options selected on @Field
, such as Store
or TermVector
, or apply the chosen @Boost
value.
It is recommended that LuceneOptions
is delegated to add fields to the Document
, however the Document
can also be edited directly, ignoring the LuceneOptions
.
LuceneOptions
shields the application from changes in Lucene API
and simplifies the code.
19.6.3.3. StringBridge
Use the org.infinispan.query.bridge.StringBridge
interface to provide the Lucene-based Query API with an implementation of the expected Object
to String
bridge, or StringBridge
. All implementations are used concurrently, and therefore must be thread-safe.
Custom StringBridge
implementation
/** * Padding Integer bridge. * All numbers will be padded with 0 to match 5 digits * * @author Emmanuel Bernard */ public class PaddedIntegerBridge implements StringBridge { private int PADDING = 5; public String objectToString(Object object) { String rawInteger = ((Integer) object).toString(); if (rawInteger.length() > PADDING) throw new IllegalArgumentException("Try to pad on a number too big"); StringBuilder paddedInteger = new StringBuilder(); for (int padIndex = rawInteger.length() ; padIndex < PADDING ; padIndex++) { paddedInteger.append('0'); } return paddedInteger.append(rawInteger).toString(); } }
The @FieldBridge
annotation allows any property or field in the provided example to use the bridge:
@FieldBridge(impl = PaddedIntegerBridge.class) private Integer length;
19.6.3.4. Two-Way Bridge
A TwoWayStringBridge
is an extended version of a StringBridge
, which can be used when the bridge implementation is used on an ID property. The Lucene-based Query API reads the string representation of the identifier and uses it to generate an object. The @FieldBridge
annotation is used in the same way.
Implementing a TwoWayStringBridge for ID Properties
public class PaddedIntegerBridge implements TwoWayStringBridge, ParameterizedBridge { public static String PADDING_PROPERTY = "padding"; private int padding = 5; //default public void setParameterValues(Map parameters) { Object padding = parameters.get(PADDING_PROPERTY); if (padding != null) this.padding = (Integer) padding; } public String objectToString(Object object) { String rawInteger = ((Integer) object).toString(); if (rawInteger.length() > padding) throw new IllegalArgumentException("Try to pad on a number too big"); StringBuilder paddedInteger = new StringBuilder(); for (int padIndex = rawInteger.length(); padIndex < padding; padIndex++) { paddedInteger.append('0'); } return paddedInteger.append(rawInteger).toString(); } public Object stringToObject(String stringValue) { return new Integer(stringValue); } } @FieldBridge(impl = PaddedIntegerBridge.class, params = @Parameter(name = "padding", value = "10")) private Integer id;
The two-way process must be idempotent (ie object = stringToObject(objectToString(object))).
19.6.3.5. Parameterized Bridge
A ParameterizedBridge
interface passes parameters to the bridge implementation, making it more flexible. The ParameterizedBridge
interface can be implemented by StringBridge
, TwoWayStringBridge
, FieldBridge
implementations. All implementations must be thread-safe.
The following example implements a ParameterizedBridge
interface, with parameters passed through the @FieldBridge
annotation.
Configure the ParameterizedBridge Interface
public class PaddedIntegerBridge implements StringBridge, ParameterizedBridge { public static String PADDING_PROPERTY = "padding"; private int padding = 5; //default public void setParameterValues(Map <String,String> parameters) { String padding = parameters.get(PADDING_PROPERTY); if (padding != null) this.padding = Integer.parseInt(padding); } public String objectToString(Object object) { String rawInteger = ((Integer) object).toString(); if (rawInteger.length() > padding) throw new IllegalArgumentException("Try to pad on a number too big"); StringBuilder paddedInteger = new StringBuilder(); for (int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++) { paddedInteger.append('0'); } return paddedInteger.append(rawInteger).toString(); } } //property @FieldBridge(impl = PaddedIntegerBridge.class, params = @Parameter(name = "padding", value = "10") ) private Integer length;
19.6.3.6. Type Aware Bridge
Any bridge implementing AppliedOnTypeAwareBridge
will get the type the bridge is applied on injected. For example:
- the return type of the property for field/getter-level bridges.
- the class type for class-level bridges.
The type injected does not have any specific thread-safety requirements.
19.6.3.7. ClassBridge
More than one property of an entity can be combined and indexed in a specific way to the Lucene index using the @ClassBridge
annotation. @ClassBridge
can be defined at class level, and supports the termVector
attribute.
In the following example, the custom FieldBridge
implementation receives the entity instance as the value parameter, rather than a particular property. The particular CatFieldsClassBridge
is applied to the department instance.The FieldBridge
then concatenates both branch and network, and indexes the concatenation.
Implementing a ClassBridge
@Indexed @ClassBridge(name = "branchnetwork", store = Store.YES, impl = CatFieldsClassBridge.class, params = @Parameter(name = "sepChar", value = "")) public class Department { private int id; private String network; private String branchHead; private String branch; private Integer maxEmployees; } public class CatFieldsClassBridge implements FieldBridge, ParameterizedBridge { private String sepChar; public void setParameterValues(Map parameters) { this.sepChar = (String) parameters.get("sepChar"); } public void set(String name, Object value, Document document, LuceneOptions luceneOptions) { Department dep = (Department) value; String fieldValue1 = dep.getBranch(); if (fieldValue1 == null) { fieldValue1 = ""; } String fieldValue2 = dep.getNetwork(); if (fieldValue2 == null) { fieldValue2 = ""; } String fieldValue = fieldValue1 + sepChar + fieldValue2; Field field = new Field(name, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector()); field.setBoost(luceneOptions.getBoost()); document.add(field); } }