5.3. Filters
- security
- temporal data (example, view only last month's data)
- population filter (example, search limited to a given category)
- and many more
5.3.1. Defining and Implementing a Filter
Example 5.21. Enabling Fulltext Filters for a Query
cacheQuery = Search.getSearchManager(cache).getQuery(query, Driver.class); cacheQuery.enableFullTextFilter("bestDriver"); cacheQuery.enableFullTextFilter("security").setParameter( "login", "andre" ); cacheQuery.list(); //returns only best drivers where andre has credentials
@FullTextFilterDef
annotation. This annotation applies to @Indexed
entities irrespective of the filter's query. Filter definitions are global therefore each filter must have a unique name. If two @FullTextFilterDef
annotations with the same name are defined, a SearchException
is thrown. Each named filter must specify its filter implementation.
Example 5.22. Defining and Implementing a Filter
@FullTextFilterDefs( { @FullTextFilterDef(name = "bestDriver", impl = BestDriversFilter.class), @FullTextFilterDef(name = "security", impl = SecurityFilterFactory.class) }) public class Driver { ... }
public class BestDriversFilter extends org.apache.lucene.search.Filter {
public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
OpenBitSet bitSet = new OpenBitSet( reader.maxDoc() );
TermDocs termDocs = reader.termDocs( new Term( "score", "5" ) );
while ( termDocs.next() ) {
bitSet.set( termDocs.doc() );
}
return bitSet;
}
}
BestDriversFilter
is a Lucene filter that reduces the result set to drivers where the score is 5
. In the example, the filter implements the org.apache.lucene.search.Filter
directly and contains a no-arg constructor.
5.3.2. The @Factory Filter
Example 5.23. Creating a filter using the factory pattern
@FullTextFilterDef(name = "bestDriver", impl = BestDriversFilterFactory.class)
public class Driver { ... }
public class BestDriversFilterFactory {
@Factory
public Filter getFilter() {
//some additional steps to cache the filter results per IndexReader
Filter bestDriversFilter = new BestDriversFilter();
return new CachingWrapperFilter(bestDriversFilter);
}
}
@Factory
annotated method to build the filter instance. The factory must have a no argument constructor.
Example 5.24. Passing parameters to a defined filter
cacheQuery = Search.getSearchManager(cache).getQuery(query, Driver.class); cacheQuery.enableFullTextFilter("security").setParameter( "level", 5 );
Example 5.25. Using parameters in the actual filter implementation
public class SecurityFilterFactory {
private Integer level;
/**
* injected parameter
*/
public void setLevel(Integer level) {
this.level = level;
}
@Key
public FilterKey getKey() {
StandardFilterKey key = new StandardFilterKey();
key.addParameter( level );
return key;
}
@Factory
public Filter getFilter() {
Query query = new TermQuery( new Term("level", level.toString() ) );
return new CachingWrapperFilter( new QueryWrapperFilter(query) );
}
}
@Key
returns a FilterKey
object. The returned object has a special contract: the key object must implement equals()
/ hashCode()
so that two keys are equal if and only if the given Filter
types are the same and the set of parameters are the same. In other words, two filter keys are equal if and only if the filters from which the keys are generated can be interchanged. The key object is used as a key in the cache mechanism.
5.3.3. Key Objects
@Key
methods are needed only if:
- the filter caching system is enabled (enabled by default)
- the filter has parameters
StandardFilterKey
delegates the equals()
/ hashCode()
implementation to each of the parameters equals and hashcode methods.
SoftReferences
when needed. Once the limit of the hard reference cache is reached additional filters are cached as SoftReferences
. To adjust the size of the hard reference cache, use default.filter.cache_strategy.size
(defaults to 128). For advanced use of filter caching, you can implement your own FilterCachingStrategy
. The classname is defined by default.filter.cache_strategy
.
IndexReader
around a CachingWrapperFilter.
The wrapper will cache the DocIdSet
returned from the getDocIdSet(IndexReader reader)
method to avoid expensive recomputation. It is important to mention that the computed DocIdSet
is only cachable for the same IndexReader
instance, because the reader effectively represents the state of the index at the moment it was opened. The document list cannot change within an opened IndexReader
. A different/newIndexReader
instance, however, works potentially on a different set of Document
s (either from a different index or simply because the index has changed), hence the cached DocIdSet
has to be recomputed.
5.3.4. Full Text Filter
cache
flag of @FullTextFilterDef
, set to FilterCacheModeType.INSTANCE_AND_DOCIDSETRESULTS
which automatically caches the filter instance and wraps the filter around a Hibernate specific implementation of CachingWrapperFilter
. Unlike Lucene's version of this class, SoftReference
s are used with a hard reference count (see discussion about filter cache). The hard reference count is adjusted using default.filter.cache_docidresults.size
(defaults to 5
). Wrapping is controlled using the @FullTextFilterDef.cache
parameter. There are three different values for this parameter:
Value | Definition |
---|---|
FilterCacheModeType.NONE | No filter instance and no result is cached by the Query Module. For every filter call, a new filter instance is created. This setting addresses rapidly changing data sets or heavily memory constrained environments. |
FilterCacheModeType.INSTANCE_ONLY | The filter instance is cached and reused across concurrent Filter.getDocIdSet() calls. DocIdSet results are not cached. This setting is useful when a filter uses its own specific caching mechanism or the filter results change dynamically due to application specific events making DocIdSet caching in both cases unnecessary. |
FilterCacheModeType.INSTANCE_AND_DOCIDSETRESULTS | Both the filter instance and the DocIdSet results are cached. This is the default value. |
- The system does not update the targeted entity index often (in other words, the IndexReader is reused a lot).
- The Filter's DocIdSet is expensive to compute (compared to the time spent to execute the query).
5.3.5. Using Filters in a Sharded Environment
- Create a sharding strategy to select a subset of
IndexManager
s depending on filter configurations. - Activate the filter when running the query.
Example 5.26. Querying a Specific Shard
public class CustomerShardingStrategy implements IndexShardingStrategy { // stored IndexManagers in a array indexed by customerID private IndexManager[] indexManagers; public void initialize(Properties properties, IndexManager[] indexManagers) { this.indexManagers = indexManagers; } public IndexManager[] getIndexManagersForAllShards() { return indexManagers; } public IndexManager getIndexManagerForAddition( Class<?> entity, Serializable id, String idInString, Document document) { Integer customerID = Integer.parseInt(document.getFieldable("customerID").stringValue()); return indexManagers[customerID]; } public IndexManager[] getIndexManagersForDeletion( Class<?> entity, Serializable id, String idInString) { return getIndexManagersForAllShards(); } /** * Optimization; don't search ALL shards and union the results; in this case, we * can be certain that all the data for a particular customer Filter is in a single * shard; return that shard by customerID. */ public IndexManager[] getIndexManagersForQuery( FullTextFilterImplementor[] filters) { FullTextFilter filter = getCustomerFilter(filters, "customer"); if (filter == null) { return getIndexManagersForAllShards(); } else { return new IndexManager[] { indexManagers[Integer.parseInt( filter.getParameter("customerID").toString())] }; } } private FullTextFilter getCustomerFilter(FullTextFilterImplementor[] filters, String name) { for (FullTextFilterImplementor filter: filters) { if (filter.getName().equals(name)) return filter; } return null; } }
customer
filter is present in the example, the query only uses the shard dedicated to the customer. The query returns all shards if the customer
filter is not found. The sharding strategy reacts to each filter depending on the provided parameters.
ShardSensitiveOnlyFilter
class to declare the filter.
Example 5.27. Using the ShardSensitiveOnlyFilter
Class
@Indexed @FullTextFilterDef(name="customer", impl=ShardSensitiveOnlyFilter.class) public class Customer { ... } CacheQuery cacheQuery = Search.getSearchManager(cache).getQuery(query, Customer.class); cacheQuery.enableFulltextFilter("customer").setParameter("CustomerID", 5); @SuppressWarnings("unchecked") List results = query.List();
ShardSensitiveOnlyFilter
filter is used, Lucene filters do not need to be implemented. Use filters and sharding strategies reacting to these filters for faster query execution in a sharded environment.