此内容没有您所选择的语言版本。

Chapter 16. Mapping Domain Objects to the Index Structure


16.1. Basic Mapping

16.1.1. Basic Mapping

In Red Hat JBoss Data Grid, the identifier for all @Indexed objects is the key used to store the value. How the key is indexed can still be customized by using a combination of @Transformable, @ProvidedId, custom types and custom FieldBridge implementations.

The @DocumentId identifier does not apply to JBoss Data Grid values.

The Lucene-based Query API uses the following common annotations to map entities:

  • @Indexed
  • @Field
  • @NumericField

16.1.2. @Indexed

The @Indexed annotation declares a cached entry indexable. All entries not annotated with @Indexed are ignored.

Making a class indexable with @Indexed

@Indexed
public class Essay {
}
Copy to Clipboard Toggle word wrap

Optionally, specify the index attribute of the @Indexed annotation to change the default name of the index.

16.1.3. @Field

Each property or attribute of an entity can be indexed. Properties and attributes are not annotated by default, and therefore are ignored by the indexing process. The @Field annotation declares a property as indexed and allows the configuration of several aspects of the indexing process by setting one or more of the following attributes:

name
The name under which the property will be stored in the Lucene Document. By default, this attribute is the same as the property name, following the JavaBeans convention.
store

Specifies if the property is stored in the Lucene index. When a property is stored it can be retrieved in its original value from the Lucene Document. This is regardless of whether or not the element is indexed. Valid options are:

  • Store.YES: Consumes more index space but allows projection. See Projection.
  • Store.COMPRESS: Stores the property as compressed. This attribute consumes more CPU.
  • Store.NO: No storage. This is the default setting for the store attribute.
index

Describes if property is indexed or not. The following values are applicable:

  • Index.NO: No indexing is applied; cannot be found by querying. This setting is used for properties that are not required to be searchable, but are able to be projected.
  • Index.YES: The element is indexed and is searchable. This is the default setting for the index attribute.
analyze

Determines if the property is analyzed. The analyze attribute allows a property to be searched by its contents. For example, it may be worthwhile to analyze a text field, whereas a date field does not need to be analyzed. Enable or disable the Analyze attribute using the following:

  • Analyze.YES
  • Analyze.NO

The analyze attribute is enabled by default. The Analyze.YES setting requires the property to be indexed via the Index.YES attribute.

The following attributes are used for sorting, and must not be analyzed.

norms

Determines whether or not to store index time boosting information. Valid settings are:

  • Norms.YES
  • Norms.NO

The default for this attribute is Norms.YES. Disabling norms conserves memory, however no index time boosting information will be available.

termVector

Describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value is TermVector.NO. Available settings for this attribute are:

  • TermVector.YES: Stores the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term’s frequency.
  • TermVector.NO: Does not store term vectors.
  • TermVector.WITH_OFFSETS: Stores the term vector and token offset information. This is the same as TermVector.YES plus it contains the starting and ending offset position information for the terms.
  • TermVector.WITH_POSITIONS: Stores the term vector and token position information. This is the same as TermVector.YES plus it contains the ordinal positions of each occurrence of a term in a document.
  • TermVector.WITH_POSITION_OFFSETS: Stores the term vector, token position and offset information. This is a combination of the YES, WITH_OFFSETS, and WITH_POSITIONS.
indexNullAs

By default, null values are ignored and not indexed. However, using indexNullAs permits specification of a string to be inserted as token for the null value. When using the indexNullAs parameter, use the same token in the search query to search for null value. Use this feature only with Analyze.NO. Valid settings for this attribute are:

  • Field.DO_NOT_INDEX_NULL: This is the default value for this attribute. This setting indicates that null values will not be indexed.
  • Field.DEFAULT_NULL_TOKEN: Indicates that a default null token is used. This default null token can be specified in the configuration using the default_null_token property. If this property is not set and Field.DEFAULT_NULL_TOKEN is specified, the string "null" will be used as default.
Warning

When implementing a custom FieldBridge or TwoWayFieldBridge it is up to the developer to handle the indexing of null values (see JavaDocs of LuceneOptions.indexNullAs()).

16.1.4. @NumericField

The @NumericField annotation can be specified in the same scope as @Field.

The @NumericField annotation can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following optional parameters:

  • forField: Specifies the name of the related @Field that will be indexed as numeric. It is mandatory when a property contains more than a @Field declaration.
  • precisionStep: Changes the way that the Trie structure is stored in the index. Smaller precisionSteps lead to more disk space usage, and faster range and sort queries. Larger values lead to less space used, and range query performance closer to the range query in normal @Fields. The default value for precisionStep is 4.

@NumericField supports only Double, Long, Integer, and Float. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, therefore remaining types must use the string encoding via the default or custom TwoWayFieldBridge.

Custom NumericFieldBridge can also be used. Custom configurations require approximation during type transformation. The following is an example defines a custom NumericFieldBridge.

Defining a custom NumericFieldBridge

public class BigDecimalNumericFieldBridge extends NumericFieldBridge {
    private static final BigDecimal storeFactor = BigDecimal.valueOf(100);

    @Override
    public void set(String name,
                    Object value,
                    Document document,
                    LuceneOptions luceneOptions) {
        if (value != null) {
            BigDecimal decimalValue = (BigDecimal) value;
            Long indexedValue = Long.valueOf(
                decimalValue
                .multiply(storeFactor)
                .longValue());
            luceneOptions.addNumericFieldToDocument(name, indexedValue, document);
        }
    }

    @Override
    public Object get(String name, Document document) {
        String fromLucene = document.get(name);
        BigDecimal storedBigDecimal = new BigDecimal(fromLucene);
        return storedBigDecimal.divide(storeFactor);
    }
}
Copy to Clipboard Toggle word wrap

16.2. Mapping Properties Multiple Times

Properties may need to be mapped multiple times per index, using different indexing strategies. For example, sorting a query by field requires that the field is not analyzed. To search by words in this property and also sort it, the property will need to be indexed it twice - once analyzed and once un-analyzed. @Fields can be used to perform this search. For example:

Using @Fields to map a property multiple times

@Indexed(index = "Book")
public class Book {
    @Fields( {
        @Field,
        @Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES)
    })
    public String getSummary() {
        return summary;
    }
}
Copy to Clipboard Toggle word wrap

In the example above, the field summary is indexed twice - once as summary in a tokenized way, and once as summary_forSort in an untokenized way. @Field supports 2 attributes useful when @Fields is used:

  • analyzer: defines a @Analyzer annotation per field rather than per property
  • bridge: defines a @FieldBridge annotation per field rather than per property

16.3. Embedded and Associated Objects

16.3.1. Embedded and Associated Objects

Associated objects and embedded objects can be indexed as part of the root entity index. This allows searches of an entity based on properties of associated objects.

16.3.2. Indexing Associated Objects

The aim of the following example is to return places where the associated city is Atlanta via the Lucene query address.city:Atlanta. The place fields are indexed in the Place index. The Place index documents also contain the following fields:

  • address.street
  • address.city

These fields are also able to be queried.

Indexing associations

@Indexed
public class Place {

    @Field
    private String name;

    @IndexedEmbedded
    @ManyToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE})
    private Address address;
}

public class Address {

    @Field
    private String street;

    @Field
    private String city;

    @ContainedIn
    @OneToMany(mappedBy = "address")
    private Set<Place> places;
}
Copy to Clipboard Toggle word wrap

16.3.3. @IndexedEmbedded

When using the @IndexedEmbedded technique, data is denormalized in the Lucene index. As a result, the Lucene-based Query API must be updated with any changes in the Place and Address objects to keep the index up to date. Ensure the Place Lucene document is updated when its Address changes by marking the other side of the bidirectional relationship with @ContainedIn. @ContainedIn can be used for both associations pointing to entities and on embedded objects.

The @IndexedEmbedded annotation can be nested. Attributes can be annotated with @IndexedEmbedded. The attributes of the associated class are then added to the main entity index. In the following example, the index will contain the following fields:

  • name
  • address.street
  • address.city
  • address.ownedBy_name

Nested usage of @IndexedEmbedded and @ContainedIn

@Indexed
public class Place {
    @Field
    private String name;

    @IndexedEmbedded
    @ManyToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE})
    private Address address;
}

public class Address {
    @Field
    private String street;

    @Field
    private String city;

    @IndexedEmbedded(depth = 1, prefix = "ownedBy_")
    private Owner ownedBy;

    @ContainedIn
    @OneToMany(mappedBy = "address")
    private Set<Place> places;
}

public class Owner {
    @Field
    private String name;
}
Copy to Clipboard Toggle word wrap

The default prefix is propertyName, following the traditional object navigation convention. This can be overridden using the prefix attribute as it is shown on the ownedBy property.

Note

The prefix cannot be set to the empty string.

The depth property is used when the object graph contains a cyclic dependency of classes. For example, if Owner points to Place. the Query Module stops including attributes after reaching the expected depth, or object graph boundaries. A self-referential class is an example of cyclic dependency. In the provided example, because depth is set to 1, any @IndexedEmbedded attribute in Owner is ignored.

Using @IndexedEmbedded for object associations allows queries to be expressed using Lucene’s query syntax. For example:

  • Return places where name contains JBoss and where address city is Atlanta. In Lucene query this is:

    +name:jboss +address.city:atlanta
    Copy to Clipboard Toggle word wrap
  • Return places where name contains JBoss and where owner’s name contain Joe. In Lucene query this is:

    +name:jboss +address.ownedBy_name:joe
    Copy to Clipboard Toggle word wrap

This operation is similar to the relational join operation, without data duplication. Out of the box, Lucene indexes have no notion of association; the join operation does not exist. It may be beneficial to maintain the normalized relational model while benefiting from the full text index speed and feature richness.

An associated object can be also be @Indexed. When @IndexedEmbedded points to an entity, the association must be directional and the other side must be annotated using @ContainedIn. If not, the Lucene-based Query API cannot update the root index when the associated entity is updated. In the provided example, a Place index document is updated when the associated Address instance updates.

16.3.4. The targetElement Property

It is possible to override the object type targeted using the targetElement parameter. This method can be used when the object type annotated by @IndexedEmbedded is not the object type targeted by the data grid and the Lucene-based Query API. This occurs when interfaces are used instead of their implementation.

Using the targetElement property of @IndexedEmbedded

@Indexed
public class Address {

    @Field
    private String street;

    @IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class)
    private Person ownedBy;

    ...
}

public class Owner implements Person { ... }
Copy to Clipboard Toggle word wrap

16.4. Boosting

16.4.1. Boosting

Lucene uses boosting to attach more importance to specific fields or documents over others. Lucene differentiates between index and search-time boosting.

16.4.2. Static Index Time Boosting

The @Boost annotation is used to define a static boost value for an indexed class or property. This annotation can be used within @Field, or can be specified directly on the method or class level.

In the following example:

  • the probability of Essay reaching the top of the search list will be multiplied by 1.7.
  • @Field.boost and @Boost on a property are cumulative, therefore the summary field will be 3.0 (2 x 1.5), and more important than the ISBN field.
  • The text field is 1.2 times more important than the ISBN field.

Different ways of using @Boost

@Indexed
@Boost(1.7f)
public class Essay {

    @Field(name = "Abstract", store=Store.YES, boost = @Boost(2f))
    @Boost(1.5f)
    public String getSummary() { return summary; }

    @Field(boost = @Boost(1.2f))
    public String getText() { return text; }

    @Field
    public String getISBN() { return isbn; }

}
Copy to Clipboard Toggle word wrap

16.4.3. Dynamic Index Time Boosting

The @Boost annotation defines a static boost factor that is independent of the state of the indexed entity at runtime. However, in some cases the boost factor may depend on the actual state of the entity. In this case, use the @DynamicBoost annotation together with an accompanying custom BoostStrategy.

@Boost and @DynamicBoost annotations can both be used in relation to an entity, and all defined boost factors are cumulative. The @DynamicBoost can be placed at either class or field level.

In the following example, a dynamic boost is defined on class level specifying VIPBoostStrategy as implementation of the BoostStrategy interface used at indexing time. Depending on the annotation placement, either the whole entity is passed to the defineBoost method or only the annotated field/property value. The passed object must be cast to the correct type.

Dynamic boost example

public enum PersonType {
    NORMAL,
    VIP
}

@Indexed
@DynamicBoost(impl = VIPBoostStrategy.class)
public class Person {
    private PersonType type;
}

public class VIPBoostStrategy implements BoostStrategy {
    public float defineBoost(Object value) {
        Person person = (Person) value;
        if (person.getType().equals(PersonType.VIP)) {
            return 2.0f;
        }
        else {
            return 1.0f;
        }
    }
}
Copy to Clipboard Toggle word wrap

In the provided example all indexed values of a VIP would be twice the importance of the values of a non-VIP.

Note

The specified BoostStrategy implementation must define a public no argument constructor.

16.5. Analysis

16.5.1. Analysis

16.5.2. Default Analyzer and Analyzer by Class

The default analyzer class is used to index tokenized fields, and is configurable through the default.analyzer property. The default value for this property is org.apache.lucene.analysis.standard.StandardAnalyzer.

The analyzer class can be defined per entity, property, and per @Field, which is useful when multiple fields are indexed from a single property.

In the following example, EntityAnalyzer is used to index all tokenized properties, such as name except, summary and body, which are indexed with PropertyAnalyzer and FieldAnalyzer respectively.

Different ways of using @Analyzer

@Indexed
@Analyzer(impl = EntityAnalyzer.class)
public class MyEntity {

    @Field
    private String name;

    @Field
    @Analyzer(impl = PropertyAnalyzer.class)
    private String summary;

    @Field(analyzer = @Analyzer(impl = FieldAnalyzer.class))
    private String body;
}
Copy to Clipboard Toggle word wrap

Note

Avoid using different analyzers on a single entity. Doing so can create complications in building queries, and make results less predictable, particularly if using a QueryParser. Use the same analyzer for indexing and querying on any field.

16.5.3. Named Analyzers

The Query Module uses analyzer definitions to deal with the complexity of the Analyzer function. Analyzer definitions are reusable by multiple @Analyzer declarations and includes the following:

  • a name: the unique string used to refer to the definition.
  • a list of CharFilters: each CharFilter is responsible to pre-process input characters before the tokenization. CharFilters can add, change, or remove characters. One common usage is for character normalization.
  • a Tokenizer: responsible for tokenizing the input stream into individual words.
  • a list of filters: each filter is responsible to remove, modify, or sometimes add words into the stream provided by the Tokenizer.

The Analyzer separates these components into multiple tasks, allowing individual components to be reused and components to be built with flexibility using the following procedure:

The Analyzer Process

  1. The CharFilters process the character input.
  2. Tokenizer converts the character input into tokens.
  3. The tokens are the processed by the TokenFilters.

The Lucene-based Query API supports this infrastructure by utilizing the Solr analyzer framework.

16.5.4. Analyzer Definitions

Once defined, an analyzer definition can be reused by an @Analyzer annotation.

Referencing an analyzer by name

@Indexed
@AnalyzerDef(name = "customanalyzer")
public class Team {

    @Field
    private String name;

    @Field
    private String location;

    @Field
    @Analyzer(definition = "customanalyzer")
    private String description;
}
Copy to Clipboard Toggle word wrap

Analyzer instances declared by @AnalyzerDef are also available by their name in the SearchFactory, which is useful when building queries.

Analyzer analyzer = Search.getSearchManager(cache).getAnalyzer("customanalyzer")
Copy to Clipboard Toggle word wrap

When querying, fields must use the same analyzer that has been used to index the field. The same tokens are reused between the query and the indexing process.

16.5.5. @AnalyzerDef for Solr

When using Maven all required Apache Solr dependencies are now defined as dependencies of the artifact org.hibernate:hibernate-search-analyzers. Add the following dependency:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-search-analyzers</artifactId>
    <version>${version.hibernate.search}</version>
<dependency>
Copy to Clipboard Toggle word wrap

In the following example, a CharFilter is defined by its factory. In this example, a mapping char filter is used, which will replace characters in the input based on the rules specified in the mapping file. Finally, a list of filters is defined by their factories. In this example, the StopFilter filter is built reading the dedicated words property file. The filter will ignore case.

@AnalyzerDef and the Solr framework

  1. Configure the CharFilter

    Define a CharFilter by factory. In this example, a mapping CharFilter is used, which will replace characters in the input based on the rules specified in the mapping file.

    @AnalyzerDef(name = "customanalyzer",
        charFilters = {
            @CharFilterDef(factory = MappingCharFilterFactory.class, params = {
                @Parameter(name = "mapping",
                    value =
                        "org/hibernate/search/test/analyzer/solr/mapping-chars.properties")
            })
        },
    Copy to Clipboard Toggle word wrap
  2. Define the Tokenizer

    A Tokenizer is then defined using the StandardTokenizerFactory.class.

    @AnalyzerDef(name = "customanalyzer",
        charFilters = {
            @CharFilterDef(factory = MappingCharFilterFactory.class, params = {
                @Parameter(name = "mapping",
                    value =
                        "org/hibernate/search/test/analyzer/solr/mapping-chars.properties")
            })
        },
    
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class)
    Copy to Clipboard Toggle word wrap
  3. List of Filters

    Define a list of filters by their factories. In this example, the StopFilter filter is built reading the dedicated words property file. The filter will ignore case.

    @AnalyzerDef(name = "customanalyzer",
        charFilters = {
            @CharFilterDef(factory = MappingCharFilterFactory.class, params = {
                @Parameter(name = "mapping",
                    value =
                        "org/hibernate/search/test/analyzer/solr/mapping-chars.properties")
            })
        },
    
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
    
            @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = StopFilterFactory.class, params = {
                @Parameter(name = "words",
                    value= "org/hibernate/search/test/analyzer/solr/stoplist.properties" ),
                @Parameter(name = "ignoreCase", value = "true")
            })
        })
    public class Team {
    }
    Copy to Clipboard Toggle word wrap
Note

Filters and CharFilters are applied in the order they are defined in the @AnalyzerDef annotation.

16.5.6. Loading Analyzer Resources

Tokenizers, TokenFilters, and CharFilters can load resources such as configuration or metadata files using the StopFilterFactory.class or the synonym filter. The virtual machine default can be explicitly specified by adding a resource_charset parameter.

Use a specific charset to load the property file

@AnalyzerDef(name = "customanalyzer",
    charFilters = {
        @CharFilterDef(factory = MappingCharFilterFactory.class, params = {
            @Parameter(name = "mapping",
                value =
                    "org/hibernate/search/test/analyzer/solr/mapping-chars.properties")
        })
    },
    tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
    filters = {
        @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
        @TokenFilterDef(factory = StopFilterFactory.class, params = {
            @Parameter(name="words",
                value= "org/hibernate/search/test/analyzer/solr/stoplist.properties"),
            @Parameter(name = "resource_charset", value = "UTF-16BE"),
            @Parameter(name = "ignoreCase", value = "true")
        })
    })
public class Team {
}
Copy to Clipboard Toggle word wrap

16.5.7. Dynamic Analyzer Selection

The Query Module uses the @AnalyzerDiscriminator annotation to enable the dynamic analyzer selection.

An analyzer can be selected based on the current state of an entity that is to be indexed. This is particularly useful in multilingual applications. For example, when using the BlogEntry class, the analyzer can depend on the language property of the entry. Depending on this property, the correct language-specific stemmer can then be chosen to index the text.

An implementation of the Discriminator interface must return the name of an existing Analyzer definition, or null if the default analyzer is not overridden.

The following example assumes that the language parameter is either 'de' or 'en', which is specified in the @AnalyzerDefs.

Configure the @AnalyzerDiscriminator

  1. Predefine Dynamic Analyzers

    The @AnalyzerDiscriminator requires that all analyzers that are to be used dynamically are predefined via @AnalyzerDef. The @AnalyzerDiscriminator annotation can then be placed either on the class, or on a specific property of the entity, in order to dynamically select an analyzer. An implementation of the Discriminator interface can be specified using the @AnalyzerDiscriminatorimpl parameter.

    @Indexed
    @AnalyzerDefs({
        @AnalyzerDef(name = "en",
            tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
            filters = {
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = EnglishPorterFilterFactory.class)
            }),
        @AnalyzerDef(name = "de",
            tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
            filters = {
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = GermanStemFilterFactory.class)
            })
        })
    public class BlogEntry {
    
        @Field
        @AnalyzerDiscriminator(impl = LanguageDiscriminator.class)
        private String language;
    
        @Field
        private String text;
    
        private Set<BlogEntry> references;
    
        // standard getter/setter
    }
    Copy to Clipboard Toggle word wrap
  2. Implement the Discriminator Interface

    Implement the getAnalyzerDefinitionName() method, which is called for each field added to the Lucene document. The entity being indexed is also passed to the interface method.

    The value parameter is set if the @AnalyzerDiscriminator is placed on the property level instead of the class level. In this example, the value represents the current value of this property.

    public class LanguageDiscriminator implements Discriminator {
        public String getAnalyzerDefinitionName(Object value, Object entity, String field) {
            if (value == null || !(entity instanceof Article)) {
                return null;
            }
            return (String) value;
        }
    }
    Copy to Clipboard Toggle word wrap

16.5.8. Retrieving an Analyzer

Retrieving an analyzer can be used when multiple analyzers have been used in a domain model, in order to benefit from stemming or phonetic approximation, etc. In this case, use the same analyzers to building a query. Alternatively, use the Lucene-based Query API, which selects the correct analyzer automatically. See Building a Lucene Query.

The scoped analyzer for a given entity can be retrieved using either the Lucene programmatic API or the Lucene query parser. A scoped analyzer applies the right analyzers depending on the field indexed. Multiple analyzers can be defined on a given entity, each working on an individual field. A scoped analyzer unifies these analyzers into a context-aware analyzer.

In the following example, the song title is indexed in two fields:

  • Standard analyzer: used in the title field.
  • Stemming analyzer: used in the title_stemmed field.

Using the analyzer provided by the search factory, the query uses the appropriate analyzer depending on the field targeted.

Using the scoped analyzer when building a full-text query

SearchManager manager = Search.getSearchManager(cache);

org.apache.lucene.queryparser.classic.QueryParser parser = new QueryParser(
        org.apache.lucene.util.Version.LUCENE_5_5_1,
        "title",
        manager.getAnalyzer(Song.class)
);

org.apache.lucene.search.Query luceneQuery =
        parser.parse("title:sky Or title_stemmed:diamond");

// wrap Lucene query in a org.infinispan.query.CacheQuery
CacheQuery cacheQuery = manager.getQuery(luceneQuery, Song.class);

List result = cacheQuery.list();
//return the list of matching objects
Copy to Clipboard Toggle word wrap

Note

Analyzers defined via @AnalyzerDef can also be retrieved by their definition name using searchManager.getAnalyzer(String).

16.5.9. Available Analyzers

Apache Solr and Lucene ship with a number of default CharFilters, tokenizers, and filters. A complete list of CharFilter, tokenizer, and filter factories is available at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. The following tables provide some example CharFilters, tokenizers, and filters.

Expand
Table 16.1. Example of available CharFilters
FactoryDescriptionParametersAdditional dependencies

MappingCharFilterFactory

Replaces one or more characters with one or more characters, based on mappings specified in the resource file

mapping: points to a resource file containing the mappings using the format:

                    "á" => "a"
                    "ñ" => "n"
                    "ø" => "o"
Copy to Clipboard Toggle word wrap

none

HTMLStripCharFilterFactory

Remove HTML standard tags, keeping the text

none

none

Expand
Table 16.2. Example of available tokenizers
FactoryDescriptionParametersAdditional dependencies

StandardTokenizerFactory

Use the Lucene StandardTokenizer

none

none

HTMLStripCharFilterFactory

Remove HTML tags, keep the text and pass it to a StandardTokenizer.

none

solr-core

PatternTokenizerFactory

Breaks text at the specified regular expression pattern.

pattern: the regular expression to use for tokenizing

group: says which pattern group to extract into tokens

solr-core

Expand
Table 16.3. Examples of available filters
FactoryDescriptionParametersAdditional dependencies

StandardFilterFactory

Remove dots from acronyms and 's from words

none

solr-core

LowerCaseFilterFactory

Lowercases all words

none

solr-core

StopFilterFactory

Remove words (tokens) matching a list of stop words

words: points to a resource file containing the stop words

ignoreCase: true if case should be ignored when comparing stop words, false otherwise

solr-core

SnowballPorterFilterFactory

Reduces a word to it’s root in a given language. (example: protect, protects, protection share the same root). Using such a filter allows searches matching related words.

language: Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish and a few more

solr-core

ISOLatin1AccentFilterFactory

Remove accents for languages like French

none

solr-core

PhoneticFilterFactory

Inserts phonetically similar tokens into the token stream

encoder: One of DoubleMetaphone, Metaphone, Soundex or RefinedSoundex

inject: true will add tokens to the stream, false will replace the existing token

maxCodeLength: sets the maximum length of the code to be generated. Supported only for Metaphone and DoubleMetaphone encodings

solr-core and commons-codec

CollationKeyFilterFactory

Converts each token into its java.text.CollationKey, and then encodes the CollationKey with IndexableBinaryStringTools, to allow it to be stored as an index term.

custom, language, country, variant, strength, decompositionsee Lucene’s CollationKeyFilter javadocs for more info

solr-core and commons-io

16.6. Bridge

16.6.1. Bridges

When mapping entities, Lucene represents all index fields as strings. All entity properties annotated with @Field are converted to strings to be indexed. Built-in bridges automatically translates properties for the Lucene-based Query API. The bridges can be customized to gain control over the translation process.

16.6.2. Built-in Bridges

The Lucene-based Query API includes a set of built-in bridges between a Java property type and its full text representation.

null
Per default null elements are not indexed. Lucene does not support null elements. However, in some situation it can be useful to insert a custom token representing the null value. See @Field for more information.
java.lang.String

Strings are indexed, as are:

  • short, Short
  • integer, Integer
  • long, Long
  • float, Float
  • double, Double
  • BigInteger
  • BigDecimal

Numbers are converted into their string representation. Note that numbers cannot be compared by Lucene, or used in ranged queries out of the box, and must be padded

Note

Using a Range query has disadvantages. An alternative approach is to use a Filter query which will filter the result query to the appropriate range.

The Query Module supports using a custom StringBridge. See Custom Bridges.

java.util.Date

Dates are stored as yyyyMMddHHmmssSSS in GMT time (200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). When using a TermRangeQuery, dates are expressed in GMT.

@DateBridge defines the appropriate resolution to store in the index, for example: @DateBridge(resolution=Resolution.DAY). The date pattern will then be truncated accordingly.

@Indexed
public class Meeting {
    @Field(analyze=Analyze.NO)
    @DateBridge(resolution=Resolution.MINUTE)
    private Date date;
Copy to Clipboard Toggle word wrap

The default Date bridge uses Lucene’s DateTools to convert from and to String. All dates are expressed in GMT time. Implement a custom date bridge in order to store dates in a fixed time zone.

java.net.URI, java.net.URL
URI and URL are converted to their string representation
java.lang.Class
Class are converted to their fully qualified class name. The thread context classloader is used when the class is rehydrated

16.6.3. Custom Bridges

16.6.3.1. Custom Bridges

Custom bridges are available in situations where built-in bridges, or the bridge’s String representation, do not sufficiently address the required property types.

16.6.3.2. FieldBridge

For improved flexibility, a bridge can be implemented as a FieldBridge. The FieldBridge interface provides a property value, which can then be mapped in the Lucene Document. For example, a property can be stored in two different document fields.

Implementing the FieldBridge Interface

public class DateSplitBridge implements FieldBridge {
    private final static TimeZone GMT = TimeZone.getTimeZone("GMT");

    public void set(String name,
                    Object value,
                    Document document,
                    LuceneOptions luceneOptions) {
        Date date = (Date) value;
        Calendar cal = GregorianCalendar.getInstance(GMT);
        cal.setTime(date);
        int year = cal.get(Calendar.YEAR);
        int month = cal.get(Calendar.MONTH) + 1;
        int day = cal.get(Calendar.DAY_OF_MONTH);

        // set year
        luceneOptions.addFieldToDocument(
            name + ".year",
            String.valueOf(year),
            document);

        // set month and pad it if needed
        luceneOptions.addFieldToDocument(
            name + ".month",
            month < 10 ? "0" : "" + String.valueOf(month),
            document);

        // set day and pad it if needed
        luceneOptions.addFieldToDocument(
            name + ".day",
            day < 10 ? "0" : "" + String.valueOf(day),
            document);
    }
}

//property
@FieldBridge(impl = DateSplitBridge.class)
private Date date;
Copy to Clipboard Toggle word wrap

In the following example, the fields are not added directly to the Lucene Document. Instead the addition is delegated to the LuceneOptions helper. The helper will apply the options selected on @Field, such as Store or TermVector, or apply the chosen @Boost value.

It is recommended that LuceneOptions is delegated to add fields to the Document, however the Document can also be edited directly, ignoring the LuceneOptions.

Note

LuceneOptions shields the application from changes in Lucene API and simplifies the code.

16.6.3.3. StringBridge

Use the org.infinispan.query.bridge.StringBridge interface to provide the Lucene-based Query API with an implementation of the expected Object to String bridge, or StringBridge. All implementations are used concurrently, and therefore must be thread-safe.

Custom StringBridge implementation

/**
 * Padding Integer bridge.
 * All numbers will be padded with 0 to match 5 digits
 *
 * @author Emmanuel Bernard
 */
public class PaddedIntegerBridge implements StringBridge {

    private int PADDING = 5;

    public String objectToString(Object object) {
        String rawInteger = ((Integer) object).toString();
        if (rawInteger.length() > PADDING)
            throw new IllegalArgumentException("Try to pad on a number too big");
        StringBuilder paddedInteger = new StringBuilder();
        for (int padIndex = rawInteger.length() ; padIndex < PADDING ; padIndex++) {
            paddedInteger.append('0');
        }
        return paddedInteger.append(rawInteger).toString();
    }
}
Copy to Clipboard Toggle word wrap

The @FieldBridge annotation allows any property or field in the provided example to use the bridge:

@FieldBridge(impl = PaddedIntegerBridge.class)
private Integer length;
Copy to Clipboard Toggle word wrap

16.6.3.4. Two-Way Bridge

A TwoWayStringBridge is an extended version of a StringBridge, which can be used when the bridge implementation is used on an ID property. The Lucene-based Query API reads the string representation of the identifier and uses it to generate an object. The @FieldBridge annotation is used in the same way.

Implementing a TwoWayStringBridge for ID Properties

public class PaddedIntegerBridge implements TwoWayStringBridge, ParameterizedBridge {

    public static String PADDING_PROPERTY = "padding";
    private int padding = 5; //default

    public void setParameterValues(Map parameters) {
        Object padding = parameters.get(PADDING_PROPERTY);
        if (padding != null) this.padding = (Integer) padding;
    }

    public String objectToString(Object object) {
        String rawInteger = ((Integer) object).toString();
        if (rawInteger.length() > padding)
            throw new IllegalArgumentException("Try to pad on a number too big");
        StringBuilder paddedInteger = new StringBuilder();
        for (int padIndex = rawInteger.length(); padIndex < padding; padIndex++) {
            paddedInteger.append('0');
        }
        return paddedInteger.append(rawInteger).toString();
    }

    public Object stringToObject(String stringValue) {
        return new Integer(stringValue);
    }
}


@FieldBridge(impl = PaddedIntegerBridge.class,
             params = @Parameter(name = "padding", value = "10"))
private Integer id;
Copy to Clipboard Toggle word wrap

Important

The two-way process must be idempotent (ie object = stringToObject(objectToString(object))).

16.6.3.5. Parameterized Bridge

A ParameterizedBridge interface passes parameters to the bridge implementation, making it more flexible. The ParameterizedBridge interface can be implemented by StringBridge, TwoWayStringBridge, FieldBridge implementations. All implementations must be thread-safe.

The following example implements a ParameterizedBridge interface, with parameters passed through the @FieldBridge annotation.

Configure the ParameterizedBridge Interface

public class PaddedIntegerBridge implements StringBridge, ParameterizedBridge {

    public static String PADDING_PROPERTY = "padding";
    private int padding = 5; //default

    public void setParameterValues(Map <String,String> parameters) {
        String padding = parameters.get(PADDING_PROPERTY);
        if (padding != null) this.padding = Integer.parseInt(padding);
    }

    public String objectToString(Object object) {
        String rawInteger = ((Integer) object).toString();
        if (rawInteger.length() > padding)
            throw new IllegalArgumentException("Try to pad on a number too big");
        StringBuilder paddedInteger = new StringBuilder();
        for (int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++) {
            paddedInteger.append('0');
        }
        return paddedInteger.append(rawInteger).toString();
    }
}

//property
@FieldBridge(impl = PaddedIntegerBridge.class,
             params = @Parameter(name = "padding", value = "10")
            )
private Integer length;
Copy to Clipboard Toggle word wrap

16.6.3.6. Type Aware Bridge

Any bridge implementing AppliedOnTypeAwareBridge will get the type the bridge is applied on injected. For example:

  • the return type of the property for field/getter-level bridges.
  • the class type for class-level bridges.

The type injected does not have any specific thread-safety requirements.

16.6.3.7. ClassBridge

More than one property of an entity can be combined and indexed in a specific way to the Lucene index using the @ClassBridge annotation. @ClassBridge can be defined at class level, and supports the termVector attribute.

In the following example, the custom FieldBridge implementation receives the entity instance as the value parameter, rather than a particular property. The particular CatFieldsClassBridge is applied to the department instance.The FieldBridge then concatenates both branch and network, and indexes the concatenation.

Implementing a ClassBridge

@Indexed
@ClassBridge(name = "branchnetwork",
             store = Store.YES,
             impl = CatFieldsClassBridge.class,
             params = @Parameter(name = "sepChar", value = ""))
public class Department {
    private int id;
    private String network;
    private String branchHead;
    private String branch;
    private Integer maxEmployees;
}

public class CatFieldsClassBridge implements FieldBridge, ParameterizedBridge {
    private String sepChar;

    public void setParameterValues(Map parameters) {
        this.sepChar = (String) parameters.get("sepChar");
    }

    public void set(String name,
                    Object value,
                    Document document,
                    LuceneOptions luceneOptions) {

        Department dep = (Department) value;
        String fieldValue1 = dep.getBranch();
        if (fieldValue1 == null) {
            fieldValue1 = "";
        }
        String fieldValue2 = dep.getNetwork();
        if (fieldValue2 == null) {
            fieldValue2 = "";
        }
        String fieldValue = fieldValue1 + sepChar + fieldValue2;
        Field field = new Field(name, fieldValue, luceneOptions.getStore(),
            luceneOptions.getIndex(), luceneOptions.getTermVector());
        field.setBoost(luceneOptions.getBoost());
        document.add(field);
   }
}
Copy to Clipboard Toggle word wrap

返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat