14.6. Advanced Features
14.6.1. Accessing the SearchFactory
SearchFactory
object keeps track of the underlying Lucene resources for Hibernate Search. It is a convenient way to access Lucene natively. The SearchFactory
can be accessed from a FullTextSession
:
Example 14.70. Accessing the SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory();
14.6.2. Using an IndexReader
IndexReader
. Hibernate Search might cache index readers to maximize performance, or provide other efficient strategies to retrieve an updated IndexReader
minimizing I/O operations. Your code can access these cached resources, but there are several requirements.
Example 14.71. Accessing an IndexReader
IndexReader reader = searchFactory.getIndexReaderAccessor().open(Order.class); try { //perform read-only operations on the reader } finally { searchFactory.getIndexReaderAccessor().close(reader); }
SearchFactory
determines which indexes are needed to query this entity (considering a Sharding strategy). Using the configured ReaderProvider
on each index, it returns a compound IndexReader
on top of all involved indexes. Because this IndexReader
is shared amongst several clients, you must adhere to the following rules:
- Never call indexReader.close(), instead use readerProvider.closeReader(reader) when necessary, preferably in a finally block.
- Don not use this
IndexReader
for modification operations (it is a readonlyIndexReader
, and any such attempt will result in an exception).
IndexReader
freely, especially to do native Lucene queries. Using the shared IndexReader
s will make most queries more efficient than by opening one directly from, for example, the filesystem.
open(Class... types)
you can use open(String... indexNames)
, allowing you to pass in one or more index names. Using this strategy you can also select a subset of the indexes for any indexed type if sharding is used.
Example 14.72. Accessing an IndexReader by index names
IndexReader reader = searchFactory.getIndexReaderAccessor().open("Products.1", "Products.3");
14.6.3. Accessing a Lucene Directory
Directory
is the most common abstraction used by Lucene to represent the index storage; Hibernate Search doesn't interact directly with a Lucene Directory
but abstracts these interactions via an IndexManager
: an index does not necessarily need to be implemented by a Directory
.
Directory
and need to access it, you can get a reference to the Directory
via the IndexManager
. Cast the IndexManager
to a DirectoryBasedIndexManager
and then use getDirectoryProvider().getDirectory()
to get a reference to the underlying Directory
. This is not recommended, we would encourage to use the IndexReader
instead.
14.6.4. Sharding Indexes
Warning
- A single index is so large that index update times are slowing the application down.
- A typical search will only hit a subset of the index, such as when data is naturally segmented by customer, region or application.
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards
property.
Example 14.73. Enabling Index Sharding
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards = 5
IndexShardingStrategy
. The default sharding strategy splits the data according to the hash value of the ID string representation (generated by the FieldBridge
). This ensures a fairly balanced sharding. You can replace the default strategy by implementing a custom IndexShardingStrategy
. To use your custom strategy you have to set the hibernate.search.<indexName>.sharding_strategy
property.
Example 14.74. Specifying a Custom Sharding Strategy
hibernate.search.<indexName>.sharding_strategy = my.shardingstrategy.Implementation
IndexShardingStrategy
property also allows for optimizing searches by selecting which shard to run the query against. By activating a filter a sharding strategy can select a subset of the shards used to answer a query (IndexShardingStrategy.getIndexManagersForQuery
) and thus speed up the query execution.
IndexManager
and so can be configured to use a different directory provider and back end configuration. The IndexManager
index names for the Animal entity in Example 14.75, “Sharding Configuration for Entity Animal” are Animal.0
to Animal.4
. In other words, each shard has the name of its owning index followed by .
(dot) and its index number.
Example 14.75. Sharding Configuration for Entity Animal
hibernate.search.default.indexBase = /usr/lucene/indexes hibernate.search.Animal.sharding_strategy.nbr_of_shards = 5 hibernate.search.Animal.directory_provider = filesystem hibernate.search.Animal.0.indexName = Animal00 hibernate.search.Animal.3.indexBase = /usr/lucene/sharded hibernate.search.Animal.3.indexName = Animal03
Animal
index into 5 sub-indexes. All sub-indexes are filesystem instances and the directory where each sub-index is stored is as followed:
- for sub-index 0:
/usr/lucene/indexes/Animal00
(shared indexBase but overridden indexName) - for sub-index 1:
/usr/lucene/indexes/Animal.1
(shared indexBase, default indexName) - for sub-index 2:
/usr/lucene/indexes/Animal.2
(shared indexBase, default indexName) - for sub-index 3:
/usr/lucene/shared/Animal03
(overridden indexBase, overridden indexName) - for sub-index 4:
/usr/lucene/indexes/Animal.4
(shared indexBase, default indexName)
IndexShardingStrategy
any field can be used to determine the sharding selection. Consider that to handle deletions, purge
and purgeAll
operations, the implementation might need to return one or more indexes without being able to read all the field values or the primary identifier. In that case the information is not enough to pick a single index, all indexes should be returned, so that the delete operation will be propagated to all indexes potentially containing the documents to be deleted.
14.6.5. Customizing Lucene's Scoring Formula
org.apache.lucene.search.Similarity
. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:
Factor | Description |
---|---|
tf(t ind) | Term frequency factor for the term (t) in the document (d). |
idf(t) | Inverse document frequency of the term. |
coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
t.getBoost() | Field boost. |
norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
Similarity
's Javadocs for more information.
Similarity
implementation using the property hibernate.search.similarity
. The default value is org.apache.lucene.search.DefaultSimilarity
.
similarity
property
hibernate.search.default.similarity = my.custom.Similarity
@Similarity
annotation.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}
tf(float freq)
should return 1.0.
Warning
Similarity
implementation. Classes in the same class hierarchy always share the index, so it's not allowed to override the Similarity
implementation in a subtype.
14.6.6. Exception Handling Configuration
hibernate.search.error_handler = log
ErrorHandler
interface, which provides the handle(ErrorContext context)
method. ErrorContext
provides a reference to the primary LuceneWork
instance, the underlying exception and any subsequent LuceneWork
instances that could not be processed due to the primary exception.
public interface ErrorContext { List<LuceneWork> getFailingOperations(); LuceneWork getOperationAtFault(); Throwable getThrowable(); boolean hasErrors(); }
ErrorHandler
implementation in the configuration properties:
hibernate.search.error_handler = CustomerErrorHandler
14.6.7. Disable Hibernate Search
To disable Hibernate Search indexing, change the indexing_strategy
configuration option to manual
, then restart JBoss EAP.
hibernate.search.indexing_strategy = manual
To disable Hibernate Search completely, disable all listeners by changing the autoregister_listeners
configuration option to false
, then restart JBoss EAP.
hibernate.search.autoregister_listeners = false