14.6. Advanced Features

14.6.1. Accessing the SearchFactory

The SearchFactory object keeps track of the underlying Lucene resources for Hibernate Search. It is a convenient way to access Lucene natively. The SearchFactory can be accessed from a FullTextSession:

Example 14.70. Accessing the SearchFactory

FullTextSession fullTextSession = Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();

Report a bug

14.6.2. Using an IndexReader

Queries in Lucene are executed on an IndexReader. Hibernate Search might cache index readers to maximize performance, or provide other efficient strategies to retrieve an updated IndexReader minimizing I/O operations. Your code can access these cached resources, but there are several requirements.

Example 14.71. Accessing an IndexReader

IndexReader reader = searchFactory.getIndexReaderAccessor().open(Order.class);
try {
   //perform read-only operations on the reader
}
finally {
   searchFactory.getIndexReaderAccessor().close(reader);
}

In this example the SearchFactory determines which indexes are needed to query this entity (considering a Sharding strategy). Using the configured ReaderProvider on each index, it returns a compound IndexReader on top of all involved indexes. Because this IndexReader is shared amongst several clients, you must adhere to the following rules:

Never call indexReader.close(), instead use readerProvider.closeReader(reader) when necessary, preferably in a finally block.
Don not use this IndexReader for modification operations (it is a readonly IndexReader, and any such attempt will result in an exception).

Aside from those rules, you can use the IndexReader freely, especially to do native Lucene queries. Using the shared IndexReaders will make most queries more efficient than by opening one directly from, for example, the filesystem.

As an alternative to the method open(Class... types) you can use open(String... indexNames), allowing you to pass in one or more index names. Using this strategy you can also select a subset of the indexes for any indexed type if sharding is used.

Example 14.72. Accessing an IndexReader by index names

IndexReader reader = searchFactory.getIndexReaderAccessor().open("Products.1", "Products.3");

Report a bug

14.6.3. Accessing a Lucene Directory

A Directory is the most common abstraction used by Lucene to represent the index storage; Hibernate Search doesn't interact directly with a Lucene Directory but abstracts these interactions via an IndexManager: an index does not necessarily need to be implemented by a Directory.

If you know your index is represented as a Directory and need to access it, you can get a reference to the Directory via the IndexManager. Cast the IndexManager to a DirectoryBasedIndexManager and then use getDirectoryProvider().getDirectory() to get a reference to the underlying Directory. This is not recommended, we would encourage to use the IndexReader instead.

Report a bug

14.6.4. Sharding Indexes

In some cases it can be useful to split (shard) the indexed data of a given entity into several Lucene indexes.

Warning

Sharding should only be implemented if the advantages outweigh the disadvantages. Searching sharded indexes will typically be slower as all shards have to be opened for a single search.

Possible use cases for sharding are:

A single index is so large that index update times are slowing the application down.
A typical search will only hit a subset of the index, such as when data is naturally segmented by customer, region or application.

By default sharding is not enabled unless the number of shards is configured. To do this use the hibernate.search.<indexName>.sharding_strategy.nbr_of_shards property.

Example 14.73. Enabling Index Sharding

In this example 5 shards are enabled.

hibernate.search.<indexName>.sharding_strategy.nbr_of_shards = 5

Responsible for splitting the data into sub-indexes is the IndexShardingStrategy. The default sharding strategy splits the data according to the hash value of the ID string representation (generated by the FieldBridge). This ensures a fairly balanced sharding. You can replace the default strategy by implementing a custom IndexShardingStrategy. To use your custom strategy you have to set the hibernate.search.<indexName>.sharding_strategy property.

Example 14.74. Specifying a Custom Sharding Strategy

hibernate.search.<indexName>.sharding_strategy = my.shardingstrategy.Implementation

The IndexShardingStrategy property also allows for optimizing searches by selecting which shard to run the query against. By activating a filter a sharding strategy can select a subset of the shards used to answer a query (IndexShardingStrategy.getIndexManagersForQuery) and thus speed up the query execution.

Each shard has an independent IndexManager and so can be configured to use a different directory provider and back end configuration. The IndexManager index names for the Animal entity in Example 14.75, “Sharding Configuration for Entity Animal” are Animal.0 to Animal.4. In other words, each shard has the name of its owning index followed by . (dot) and its index number.

Example 14.75. Sharding Configuration for Entity Animal

hibernate.search.default.indexBase = /usr/lucene/indexes
hibernate.search.Animal.sharding_strategy.nbr_of_shards = 5
hibernate.search.Animal.directory_provider = filesystem
hibernate.search.Animal.0.indexName = Animal00 
hibernate.search.Animal.3.indexBase = /usr/lucene/sharded
hibernate.search.Animal.3.indexName = Animal03

In Example 14.75, “Sharding Configuration for Entity Animal”, the configuration uses the default id string hashing strategy and shards the Animal index into 5 sub-indexes. All sub-indexes are filesystem instances and the directory where each sub-index is stored is as followed:

for sub-index 0: /usr/lucene/indexes/Animal00 (shared indexBase but overridden indexName)
for sub-index 1: /usr/lucene/indexes/Animal.1 (shared indexBase, default indexName)
for sub-index 2: /usr/lucene/indexes/Animal.2 (shared indexBase, default indexName)
for sub-index 3: /usr/lucene/shared/Animal03 (overridden indexBase, overridden indexName)
for sub-index 4: /usr/lucene/indexes/Animal.4 (shared indexBase, default indexName)

When implementing a IndexShardingStrategy any field can be used to determine the sharding selection. Consider that to handle deletions, purge and purgeAll operations, the implementation might need to return one or more indexes without being able to read all the field values or the primary identifier. In that case the information is not enough to pick a single index, all indexes should be returned, so that the delete operation will be propagated to all indexes potentially containing the documents to be deleted.

Report a bug

14.6.5. Customizing Lucene's Scoring Formula

Lucene allows the user to customize its scoring formula by extending org.apache.lucene.search.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:

score(q,d) = coord(q,d) · queryNorm(q) · ∑ _{t in q} ( tf(t in d) · idf(t) ² · t.getBoost() · norm(t,d) )

Factor	Description
tf(t ind)	Term frequency factor for the term (t) in the document (d).
idf(t)	Inverse document frequency of the term.
coord(q,d)	Score factor based on how many of the query terms are found in the specified document.
queryNorm(q)	Normalizing factor used to make scores between queries comparable.
t.getBoost()	Field boost.
norm(t,d)	Encapsulates a few (indexing time) boost and length factors.

It is beyond the scope of this manual to explain this formula in more detail. Please refer to Similarity's Javadocs for more information.

Hibernate Search provides three ways to modify Lucene's similarity calculation.

First you can set the default similarity by specifying the fully specified classname of your Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.DefaultSimilarity.

You can also override the similarity used for a specific index by setting the similarity property

hibernate.search.default.similarity = my.custom.Similarity

Finally you can override the default similarity on class level using the @Similarity annotation.

@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}

As an example, let's assume it is not important how often a term appears in a document. Documents with a single occurrence of the term should be scored the same as documents with multiple occurrences. In this case your custom implementation of the method tf(float freq) should return 1.0.

Warning

When two entities share the same index they must declare the same Similarity implementation. Classes in the same class hierarchy always share the index, so it's not allowed to override the Similarity implementation in a subtype.

Likewise, it does not make sense to define the similarity via the index setting and the class-level setting as they would conflict. Such a configuration will be rejected.

Report a bug

14.6.6. Exception Handling Configuration

Hibernate Search allows you to configure how exceptions are handled during the indexing process. If no configuration is provided then exceptions are logged to the log output by default. It is possible to explicitly declare the exception logging mechanism as follows:

hibernate.search.error_handler = log

The default exception handling occurs for both synchronous and asynchronous indexing. Hibernate Search provides an easy mechanism to override the default error handling implementation.

In order to provide your own implementation you must implement the ErrorHandler interface, which provides the handle(ErrorContext context) method. ErrorContext provides a reference to the primary LuceneWork instance, the underlying exception and any subsequent LuceneWork instances that could not be processed due to the primary exception.

public interface ErrorContext  {
   List<LuceneWork> getFailingOperations();
   LuceneWork getOperationAtFault();
   Throwable getThrowable();
   boolean hasErrors();
}

To register this error handler with Hibernate Search you must declare the fully qualified classname of your ErrorHandler implementation in the configuration properties:

hibernate.search.error_handler = CustomerErrorHandler

Report a bug

14.6.7. Disable Hibernate Search

Hibernate Search can be partially or completely disabled as required. Hibernate Search's indexing can be disabled, for example, if the index is read-only, or you prefer to perform indexing manually, rather than automatically. It is also possible to completely disable Hibernate Search, preventing indexing and searching.

Disable Indexing

To disable Hibernate Search indexing, change the indexing_strategy configuration option to manual, then restart JBoss EAP.

hibernate.search.indexing_strategy = manual

Disable Hibernate Search Completely

To disable Hibernate Search completely, disable all listeners by changing the autoregister_listeners configuration option to false, then restart JBoss EAP.

hibernate.search.autoregister_listeners = false

Report a bug

14.6. Advanced Features

14.6.1. Accessing the SearchFactory

14.6.2. Using an IndexReader

14.6.3. Accessing a Lucene Directory

14.6.4. Sharding Indexes

14.6.5. Customizing Lucene's Scoring Formula

14.6.6. Exception Handling Configuration

14.6.7. Disable Hibernate Search

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links