Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 11. Indexing and Searching
Data Grid provides a search API that lets you index and search cache values stored as Java POJOs or as objects encoded as Protocol Buffers.
11.1. Overview Link kopierenLink in die Zwischenablage kopiert!
Searching is possible both in library and client/server mode (for Java, C#, Node.js and other clients), and Data Grid can index data using Apache Lucene, offering an efficient full-text capable search engine in order to cover a wide range of data retrieval use cases.
Indexing configuration relies on a schema definition, and for that Data Grid can use annotated Java classes when in library mode, and protobuf schemas for remote clients. By standardizing on protobuf, Data Grid allows full interoperability between Java and non-Java clients.
Data Grid has its own query language called Ickle, which is string-based and adds support for full-text searching. Ickle support searches over indexed data, partially indexed data or non-indexed data.
Finally, Data Grid has support for Continuous Queries, which works in a reverse manner to the other APIs: instead of creating, executing a query and obtain results, it allows a client to register queries that will be evaluated continuously as data in the cluster changes, generating notifications whenever the changed data matches the queries.
11.2. Indexing Entry Values Link kopierenLink in die Zwischenablage kopiert!
Indexing entry values in Data Grid caches dramatically improves search performance and allows you to perform full-text queries. However, indexing can degrade write throughput for Data Grid clusters. For this reason you should plan to use strategies to optimise query performance, depending on the cache mode and your use case. More information on query performance guide.
11.2.1. Configuration Link kopierenLink in die Zwischenablage kopiert!
To enable indexing via XML, you need to add the <indexing>
element to your cache configuration, specify the entities that are indexed and optionally pass additional properties.
The presence of an <indexing>
element which omits the enabled
attribute will auto-enable indexing for your convenience, even though the default value of the enabled
attribute is defined as "false"
in the XSD schema. In the programmatic config, enabled()
must be used.
Declaratively
Programmatically
11.2.2. Specifying Indexed Entities Link kopierenLink in die Zwischenablage kopiert!
It is recommended to declare the indexed types, as they will be mandatory in the next Data Grid version.
Declaratively
Programmatically
cacheCfg.indexing() .addIndexedEntity(Car.class) .addIndexedEntity(Truck.class)
cacheCfg.indexing()
.addIndexedEntity(Car.class)
.addIndexedEntity(Truck.class)
When the cache is storing protobuf, the indexed types should be the Message
declared in the protobuf schema. For example, for the schema below:
The config should be:
11.2.3. Index Storage Link kopierenLink in die Zwischenablage kopiert!
Data Grid can store indexes in the file system or in memory (local-heap
). File system is the recommended and the default configuration, and memory indexes should only be used for small to medium indexes that don’t need to survive restart.
Configuration for file system indexes:
Configuration for memory indexes:
11.2.4. Index Manager Link kopierenLink in die Zwischenablage kopiert!
Data Grid uses internally a component called "Index Manager" to control how new data is applied to the index and when the data is visible to searches.
The default Index Manager directory-based
writes to the index as soon as the data is written to the cache. The downside is it can slow down considerably cache writes specially under heavy writing scenarios, since it needs to do constant costly operations called "flushes" on the index.
The near-real-time
index manager is similar to the default index manager but takes advantage of the Near-Real-Time features of Lucene. It has better write performance because it flushes the index to the underlying store less often. The drawback is that unflushed index changes can be lost in case of a non-clean shutdown. Can be used in conjunction with local-heap
or filesystem
.
Example with local-heap
:
Example with filesystem
:
<replicated-cache name="default"> <indexing> <property name="default.indexmanager">near-real-time</property> </indexing> </replicated-cache>
<replicated-cache name="default">
<indexing>
<property name="default.indexmanager">near-real-time</property>
</indexing>
</replicated-cache>
11.2.5. Rebuilding Indexes Link kopierenLink in die Zwischenablage kopiert!
Rebuilding an index reconstructs it from data stored in the cache. You need to rebuild indexes if you change things like definitions of indexed types or Analyzers. Likewise you might need to rebuild indexes if they are deleted for some reason. Beware it might take some time as it needs to reprocess all data in the grid!
Indexer indexer = Search.getIndexer(cache); CompletionStage<Void> future = index.run();
Indexer indexer = Search.getIndexer(cache);
CompletionStage<Void> future = index.run();
11.3. Searching Link kopierenLink in die Zwischenablage kopiert!
Create relational and full-text queries in both Library and Remote Client-Server mode with the Ickle query language.
To use the API, first obtain a QueryFactory to the cache and then call the .create()
method, passing in the string to use in the query. Each QueryFactory
instance is bound to the same Cache
instance as the Search
, but it is otherwise a stateless and thread-safe object that can be used for creating multiple queries in parallel.
For instance:
A query will always target a single entity type and is evaluated over the contents of a single cache. Running a query over multiple caches or creating queries that target several entity types (joins) is not supported.
Executing the query and fetching the results is as simple as invoking the run()
method of the Query
object. Once executed, calling run()
on the same instance will re-execute the query.
11.3.1. Pagination Link kopierenLink in die Zwischenablage kopiert!
You can limit the number of returned results by using the Query.maxResults(int maxResults)
. This can be used in conjunction with Query.startOffset(long startOffset)
to achieve pagination of the result set.
// sorted by year and match all books that have "clustering" in their title // and return the third page of 10 results Query<Book> query = queryFactory.create("FROM com.acme.Book WHERE title like '%clustering%' ORDER BY year").startOffset(20).maxResults(10)
// sorted by year and match all books that have "clustering" in their title
// and return the third page of 10 results
Query<Book> query = queryFactory.create("FROM com.acme.Book WHERE title like '%clustering%' ORDER BY year").startOffset(20).maxResults(10)
11.3.2. Number of Hits Link kopierenLink in die Zwischenablage kopiert!
The QueryResult
object has the .hitCount()
method to return the total number of results of the query, regardless of any pagination parameter. The hit count is only available for indexed queries for performance reasons.
11.3.3. Iteration Link kopierenLink in die Zwischenablage kopiert!
The Query
object has the .iterator()
method to obtain the results lazily. It returns an instance of CloseableIterator
that must be closed after usage.
The iteration support for Remote Queries is currently limited, as it will first fetch all entries to the client before iterating.
11.3.4. Using Named Query Parameters Link kopierenLink in die Zwischenablage kopiert!
Instead of building a new Query object for every execution it is possible to include named parameters in the query which can be substituted with actual values before execution. This allows a query to be defined once and be efficiently executed many times. Parameters can only be used on the right-hand side of an operator and are defined when the query is created by supplying an object produced by the org.infinispan.query.dsl.Expression.param(String paramName)
method to the operator instead of the usual constant value. Once the parameters have been defined they can be set by invoking either Query.setParameter(parameterName, value)
or Query.setParameters(parameterMap)
as shown in the examples below.
Alternatively, you can supply a map of actual parameter values to set multiple parameters at once:
Setting multiple named parameters at once
Map<String, Object> parameterMap = new HashMap<>(); parameterMap.put("authorName", "Doe"); parameterMap.put("publicationYear", 2010); query.setParameters(parameterMap);
Map<String, Object> parameterMap = new HashMap<>();
parameterMap.put("authorName", "Doe");
parameterMap.put("publicationYear", 2010);
query.setParameters(parameterMap);
A significant portion of the query parsing, validation and execution planning effort is performed during the first execution of a query with parameters. This effort is not repeated during subsequent executions leading to better performance compared to a similar query using constant values instead of query parameters.
11.3.5. Ickle Query Language Parser Syntax Link kopierenLink in die Zwischenablage kopiert!
The Ickle query language is small subset of the JPQL query language, with some extensions for full-text.
The parser syntax has some notable rules:
- Whitespace is not significant.
- Wildcards are not supported in field names.
- A field name or path must always be specified, as there is no default field.
-
&&
and||
are accepted instead ofAND
orOR
in both full-text and JPA predicates. -
!
may be used instead ofNOT
. -
A missing boolean operator is interpreted as
OR
. - String terms must be enclosed with either single or double quotes.
- Fuzziness and boosting are not accepted in arbitrary order; fuzziness always comes first.
-
!=
is accepted instead of<>
. -
Boosting cannot be applied to
>
,>=
,<
,⇐
operators. Ranges may be used to achieve the same result.
11.3.5.1. Filtering operators Link kopierenLink in die Zwischenablage kopiert!
Ickle support many filtering operators that can be used for both indexed and non-indexed fields.
Operator | Description | Example |
---|---|---|
in | Checks that the left operand is equal to one of the elements from the Collection of values given as argument. | FROM Book WHERE isbn IN ('ZZ', 'X1234') |
like | Checks that the left argument (which is expected to be a String) matches a wildcard pattern that follows the JPA rules. | FROM Book WHERE title LIKE '%Java%' |
= | Checks that the left argument is an exact match of the given value | FROM Book WHERE name = 'Programming Java' |
!= | Checks that the left argument is different from the given value | FROM Book WHERE language != 'English' |
> | Checks that the left argument is greater than the given value. | FROM Book WHERE price > 20 |
>= | Checks that the left argument is greater than or equal to the given value. | FROM Book WHERE price >= 20 |
< | Checks that the left argument is less than the given value. | FROM Book WHERE year < 2012 |
⇐ | Checks that the left argument is less than or equal to the given value. | FROM Book WHERE price ⇐ 50 |
between | Checks that the left argument is between the given range limits. | FROM Book WHERE price BETWEEN 50 AND 100 |
11.3.5.2. Boolean conditions Link kopierenLink in die Zwischenablage kopiert!
Combining multiple attribute conditions with logical conjunction (and
) and disjunction (or
) operators in order to create more complex conditions is demonstrated in the following example. The well known operator precedence rule for boolean operators applies here, so the order of the operators is irrelevant. Here and
operator still has higher priority than or
even though or
was invoked first.
match all books that have "Data Grid" in their title or have an author named "Manik" and their description contains "clustering"
# match all books that have "Data Grid" in their title
# or have an author named "Manik" and their description contains "clustering"
FROM com.acme.Book WHERE title LIKE '%Data Grid%' OR author.name = 'Manik' AND description like '%clustering%'
Boolean negation has highest precedence among logical operators and applies only to the next simple attribute condition.
match all books that do not have "Data Grid" in their title and are authored by "Manik"
# match all books that do not have "Data Grid" in their title and are authored by "Manik"
FROM com.acme.Book WHERE title != 'Data Grid' AND author.name = 'Manik'
11.3.5.3. Nested conditions Link kopierenLink in die Zwischenablage kopiert!
Changing the precedence of logical operators is achieved with parenthesis:
match all books that have an author named "Manik" and their title contains "Data Grid" or their description contains "clustering"
# match all books that have an author named "Manik" and their title contains
# "Data Grid" or their description contains "clustering"
FROM com.acme.Book WHERE author.name = 'Manik' AND ( title like '%Data Grid%' OR description like '% clustering%')
11.3.5.4. Selecting attributes Link kopierenLink in die Zwischenablage kopiert!
In some use cases returning the whole domain object is overkill if only a small subset of the attributes are actually used by the application, especially if the domain entity has embedded entities. The query language allows you to specify a subset of attributes (or attribute paths) to return - the projection. If projections are used then the QueryResult.list()
will not return the whole domain entity but will return a List
of Object[]
, each slot in the array corresponding to a projected attribute.
match all books that have "Data Grid" in their title or description and return only their title and publication year
# match all books that have "Data Grid" in their title or description
# and return only their title and publication year
SELECT title, publicationYear FROM com.acme.Book WHERE title like '%Data Grid%' OR description like '%Data Grid%'
11.3.5.5. Sorting Link kopierenLink in die Zwischenablage kopiert!
Ordering the results based on one or more attributes or attribute paths is done with the ORDER BY
clause. If multiple sorting criteria are specified, then the order will dictate their precedence.
match all books that have "Data Grid" in their title or description and return them sorted by the publication year and title
# match all books that have "Data Grid" in their title or description
# and return them sorted by the publication year and title
FROM com.acme.Book WHERE title like '%Data Grid%' ORDER BY publicationYear DESC, title ASC
11.3.5.6. Grouping and Aggregation Link kopierenLink in die Zwischenablage kopiert!
Data Grid has the ability to group query results according to a set of grouping fields and construct aggregations of the results from each group by applying an aggregation function to the set of values that fall into each group. Grouping and aggregation can only be applied to projection queries (queries with one or more field in the SELECT clause).
The supported aggregations are: avg, sum, count, max, min.
The set of grouping fields is specified with the GROUP BY
clause and the order used for defining grouping fields is not relevant. All fields selected in the projection must either be grouping fields or else they must be aggregated using one of the grouping functions described below. A projection field can be aggregated and used for grouping at the same time. A query that selects only grouping fields but no aggregation fields is legal. Example: Grouping Books by author and counting them.
SELECT author, COUNT(title) FROM com.acme.Book WHERE title LIKE '%engine%' GROUP BY author
SELECT author, COUNT(title) FROM com.acme.Book WHERE title LIKE '%engine%' GROUP BY author
A projection query in which all selected fields have an aggregation function applied and no fields are used for grouping is allowed. In this case the aggregations will be computed globally as if there was a single global group.
11.3.5.7. Aggregations Link kopierenLink in die Zwischenablage kopiert!
The following aggregation functions can be applied to a field:
-
avg()
- Computes the average of a set of numbers. Accepted values are primitive numbers and instances ofjava.lang.Number
. The result is represented asjava.lang.Double
. If there are no non-null values the result isnull
instead. -
count()
- Counts the number of non-null rows and returns ajava.lang.Long
. If there are no non-null values the result is0
instead. -
max()
- Returns the greatest value found. Accepted values must be instances ofjava.lang.Comparable
. If there are no non-null values the result isnull
instead. -
min()
- Returns the smallest value found. Accepted values must be instances ofjava.lang.Comparable
. If there are no non-null values the result isnull
instead. -
sum()
- Computes the sum of a set of Numbers. If there are no non-null values the result isnull
instead. The following table indicates the return type based on the specified field.
Field Type | Return Type |
---|---|
Integral (other than BigInteger) | Long |
Float or Double | Double |
BigInteger | BigInteger |
BigDecimal | BigDecimal |
11.3.5.8. Evaluation of queries with grouping and aggregation Link kopierenLink in die Zwischenablage kopiert!
Aggregation queries can include filtering conditions, like usual queries. Filtering can be performed in two stages: before and after the grouping operation. All filter conditions defined before invoking the groupBy()
method will be applied before the grouping operation is performed, directly to the cache entries (not to the final projection). These filter conditions can reference any fields of the queried entity type, and are meant to restrict the data set that is going to be the input for the grouping stage. All filter conditions defined after invoking the groupBy()
method will be applied to the projection that results from the projection and grouping operation. These filter conditions can either reference any of the groupBy()
fields or aggregated fields. Referencing aggregated fields that are not specified in the select clause is allowed; however, referencing non-aggregated and non-grouping fields is forbidden. Filtering in this phase will reduce the amount of groups based on their properties. Sorting can also be specified similar to usual queries. The ordering operation is performed after the grouping operation and can reference any of the groupBy()
fields or aggregated fields.
11.3.5.9. Using Full-text search Link kopierenLink in die Zwischenablage kopiert!
11.3.5.9.1. Fuzzy Queries Link kopierenLink in die Zwischenablage kopiert!
To execute a fuzzy query add ~
along with an integer, representing the distance from the term used, after the term. For instance
FROM sample_bank_account.Transaction WHERE description : 'cofee'~2
FROM sample_bank_account.Transaction WHERE description : 'cofee'~2
11.3.5.9.2. Range Queries Link kopierenLink in die Zwischenablage kopiert!
To execute a range query define the given boundaries within a pair of braces, as seen in the following example:
FROM sample_bank_account.Transaction WHERE amount : [20 to 50]
FROM sample_bank_account.Transaction WHERE amount : [20 to 50]
11.3.5.9.3. Phrase Queries Link kopierenLink in die Zwischenablage kopiert!
A group of words can be searched by surrounding them in quotation marks, as seen in the following example:
FROM sample_bank_account.Transaction WHERE description : 'bus fare'
FROM sample_bank_account.Transaction WHERE description : 'bus fare'
11.3.5.9.4. Proximity Queries Link kopierenLink in die Zwischenablage kopiert!
To execute a proximity query, finding two terms within a specific distance, add a ~
along with the distance after the phrase. For instance, the following example will find the words canceling and fee provided they are not more than 3 words apart:
FROM sample_bank_account.Transaction WHERE description : 'canceling fee'~3
FROM sample_bank_account.Transaction WHERE description : 'canceling fee'~3
11.3.5.9.5. Wildcard Queries Link kopierenLink in die Zwischenablage kopiert!
To search for "text" or "test", use the ?
single-character wildcard search:
FROM sample_bank_account.Transaction where description : 'te?t'
FROM sample_bank_account.Transaction where description : 'te?t'
To search for "test", "tests", or "tester", use the *
multi-character wildcard search:
FROM sample_bank_account.Transaction where description : 'test*'
FROM sample_bank_account.Transaction where description : 'test*'
11.3.5.9.6. Regular Expression Queries Link kopierenLink in die Zwischenablage kopiert!
Regular expression queries can be performed by specifying a pattern between /
. Ickle uses Lucene’s regular expression syntax, so to search for the words moat
or boat
the following could be used:
FROM sample_library.Book where title : /[mb]oat/
FROM sample_library.Book where title : /[mb]oat/
11.3.5.9.7. Boosting Queries Link kopierenLink in die Zwischenablage kopiert!
Terms can be boosted by adding a ^
after the term to increase their relevance in a given query, the higher the boost factor the more relevant the term will be. For instance to search for titles containing beer and wine with a higher relevance on beer, by a factor of 3, the following could be used:
FROM sample_library.Book WHERE title : beer^3 OR wine
FROM sample_library.Book WHERE title : beer^3 OR wine
11.4. Embedded Search Link kopierenLink in die Zwischenablage kopiert!
Embedded searching is available when Data Grid is used as a library. No protobuf mapping is required, and both indexing and searching are done on top of Java objects.
11.4.1. Quick example Link kopierenLink in die Zwischenablage kopiert!
We’re going to store Book
instances in an Data Grid cache called "books". Book
instances will be indexed, so we enable indexing for the cache:
Data Grid configuration:
infinispan.xml
Obtaining the cache:
Each Book
will be defined as in the following example; we have to choose which properties are indexed, and for each property we can optionally choose advanced indexing options using the annotations defined in the Hibernate Search project.
Book.java
Author.java
public class Author { @Field String name; @Field String surname; // hashCode() and equals() omitted }
public class Author {
@Field String name;
@Field String surname;
// hashCode() and equals() omitted
}
Now assuming we stored several Book
instances in our Data Grid Cache
, we can search them for any matching field as in the following example.
QueryExample.java
Apart from list()
you have the option for obtaining on iterator()
, or use pagination.
11.4.2. Mapping Entities Link kopierenLink in die Zwischenablage kopiert!
Data Grid relies on the rich API of Hibernate Search in order to define fine grained configuration for indexing at entity level. This configuration includes which fields are annotated, which analyzers should be used, how to map nested objects and so on. Detailed documentation is available at the Hibernate Search manual.
11.4.2.1. @DocumentId Link kopierenLink in die Zwischenablage kopiert!
Unlike Hibernate Search, using @DocumentId
to mark a field as identifier does not apply to Data Grid values; in Data Grid the identifier for all @Indexed
objects is the key used to store the value. You can still customize how the key is indexed using a combination of @Transformable
, custom types and custom FieldBridge
implementations.
11.4.2.2. @Transformable keys Link kopierenLink in die Zwischenablage kopiert!
The key for each value needs to be indexed as well, and the key instance must be transformed in a String
. Data Grid includes some default transformation routines to encode common primitives, but to use a custom key you must provide an implementation of org.infinispan.query.Transformer
.
Registering a key Transformer via annotations
You can annotate your key class with org.infinispan.query.Transformable
and your custom transformer implementation will be picked up automatically:
Registering a key Transformer via the cache indexing configuration
Use the key-transformers
xml element in both embedded and server config:
Alternatively, use the Java configuration API (embedded mode):
ConfigurationBuilder builder = ... builder.indexing().enable() .addKeyTransformer(CustomKey.class, CustomTransformer.class);
ConfigurationBuilder builder = ...
builder.indexing().enable()
.addKeyTransformer(CustomKey.class, CustomTransformer.class);
11.4.2.3. Programmatic mapping Link kopierenLink in die Zwischenablage kopiert!
Instead of using annotations to map an entity to the index, it’s also possible to configure it programmatically.
In the following example we map an object Author
which is to be stored in the grid and made searchable on two properties but without annotating the class.
11.5. Remote Search Link kopierenLink in die Zwischenablage kopiert!
Remote search is very similar to embedded with the notable difference that data must use Google Protocol Buffers as an encoding for both over-the-wire and storage. Furthermore, it’s necessary to write (or generate from Java classes) a protobuf schema defining the data structure and indexing elements instead of relying on Hibernate Search annotations.
The usage of protobuf allows remote query to work not only for Java, but for REST, C# and Node.js clients.
11.5.1. A remote query example Link kopierenLink in die Zwischenablage kopiert!
We are going to revisit the Book Sample from embedded query, but this time using the Java Hot Rod client and the Infinispan server. An object called Book
will be stored in a Infinispan cache called "books". Book instances will be indexed, so we enable indexing for the cache:
infinispan.xml
Alternatively, if indexing the cache is not indexed, we configure the <encoding>
as application/x-protostream
to make sure the storage is queryable:
infinispan.xml
Each Book
will be defined as in the following example: we use @Protofield
annotations to identify the protocol buffers message fields and the @ProtoDoc
annotation on the fields to configure indexing attributes:
Book.java
The annotations above will generate during compilation the artifacts necessary to read, write and query Book
instances. To enable this generation, use the @AutoProtoSchemaBuilder
annotation in a newly created class with empty constructor or interface:
RemoteQueryInitializer.java
After compilation, a file book.proto
file will be created in the configured schemaFilePath
, along with an implementation RemoteQueryInitializerImpl.java
of the annotated interface. This concrete class can be used directly in the Hot Rod client code to initialize the serialization context.
Putting all together:
RemoteQuery.java
11.5.2. Indexing of Protobuf encoded entries Link kopierenLink in die Zwischenablage kopiert!
As seen in Remote Query example, one step necessary to query protobuf entities is to provide the client and server with the relevant metadata about entities (.proto file).
The descriptors are stored in a dedicated cache on the server named ___protobuf_metadata
. Both keys and values in this cache are plain strings. Registering a new schema is therefore as simple as performing a put()
operation on this cache using the schema’s name as key and the schema file itself as the value.
Alternatively you can use the CLI (via the cache-container=*:register-proto-schemas()
operation), the Management Console, the REST endpoint /rest/v2/schemas
or the ProtobufMetadataManager
MBean via JMX. Be aware that, when security is enabled, access to the schema cache via the remote protocols requires that the user belongs to the '___schema_manager' role.
Even if indexing is enabled for a cache no fields of Protobuf encoded entries will be indexed unless you use the @Indexed
and @Field
inside protobuf schema documentation annotations (@ProtoDoc)
to specify what fields need to get indexed.
11.5.3. Analysis Link kopierenLink in die Zwischenablage kopiert!
Analysis is a process that converts input data into one or more terms that you can index and query. While in Embedded Query mapping is done through Hibernate Search annotations, that supports a rich set of Lucene based analyzers, in client-server mode the analyzer definitions are declared in a platform neutral way
11.5.3.1. Default Analyzers Link kopierenLink in die Zwischenablage kopiert!
Data Grid provides a set of default analyzers for remote query as follows:
Definition | Description |
---|---|
| Splits text fields into tokens, treating whitespace and punctuation as delimiters. |
| Tokenizes input streams by delimiting at non-letters and then converting all letters to lowercase characters. Whitespace and non-letters are discarded. |
| Splits text streams on whitespace and returns sequences of non-whitespace characters as tokens. |
| Treats entire text fields as single tokens. |
| Stems English words using the Snowball Porter filter. |
| Generates n-gram tokens that are 3 grams in size by default. |
|
Splits text fields into larger size tokens than the |
These analyzer definitions are based on Apache Lucene and are provided "as-is". For more information about tokenizers, filters, and CharFilters, see the appropriate Lucene documentation.
11.5.3.2. Using Analyzer Definitions Link kopierenLink in die Zwischenablage kopiert!
To use analyzer definitions, reference them by name in the .proto
schema file.
-
Include the
Analyze.YES
attribute to indicate that the property is analyzed. -
Specify the analyzer definition with the
@Analyzer
annotation.
The following example shows referenced analyzer definitions:
If using Java classes annotated with @ProtoField
, the declaration is similar:
11.5.3.3. Creating Custom Analyzer Definitions Link kopierenLink in die Zwischenablage kopiert!
If you require custom analyzer definitions, do the following:
-
Create an implementation of the
ProgrammaticSearchMappingProvider
interface packaged in aJAR
file. -
Provide a file named
org.infinispan.query.spi.ProgrammaticSearchMappingProvider
in theMETA-INF/services/
directory of yourJAR
. This file should contain the fully qualified class name of your implementation. Copy the
JAR
to thelib/
directory of your Data Grid installation.ImportantYour jar must be available to the Data Grid server during startup. You cannot add it if the server is already running.
The following is an example implementation of the
ProgrammaticSearchMappingProvider
interface:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
11.6. Continuous Query Link kopierenLink in die Zwischenablage kopiert!
Continuous Queries allow an application to register a listener which will receive the entries that currently match a query filter, and will be continuously notified of any changes to the queried data set that result from further cache operations. This includes incoming matches, for values that have joined the set, updated matches, for matching values that were modified and continue to match, and outgoing matches, for values that have left the set. By using a Continuous Query the application receives a steady stream of events instead of having to repeatedly execute the same query to discover changes, resulting in a more efficient use of resources. For instance, all of the following use cases could utilize Continuous Queries:
-
Return all persons with an age between 18 and 25 (assuming the Person entity has an
age
property and is updated by the user application). - Return all transactions higher than $2000.
- Return all times where the lap speed of F1 racers were less than 1:45.00s (assuming the cache contains Lap entries and that laps are entered live during the race).
11.6.1. Continuous Query Execution Link kopierenLink in die Zwischenablage kopiert!
A continuous query uses a listener that is notified when:
-
An entry starts matching the specified query, represented by a
Join
event. -
A matching entry is updated and continues to match the query, represented by an
Update
vent. -
An entry stops matching the query, represented by a
Leave
event.
When a client registers a continuous query listener it immediately begins to receive the results currently matching the query, received as Join
events as described above. In addition, it will receive subsequent notifications when other entries begin matching the query, as Join
events, or stop matching the query, as Leave
events, as a consequence of any cache operations that would normally generate creation, modification, removal, or expiration events. Updated cache entries will generate Update
events if the entry matches the query filter before and after the operation. To summarize, the logic used to determine if the listener receives a Join
, Update
or Leave
event is:
- If the query on both the old and new values evaluate false, then the event is suppressed.
-
If the query on the old value evaluates false and on the new value evaluates true, then a
Join
event is sent. -
If the query on both the old and new values evaluate true, then an
Update
event is sent. -
If the query on the old value evaluates true and on the new value evaluates false, then a
Leave
event is sent. -
If the query on the old value evaluates true and the entry is removed or expired, then a
Leave
event is sent.
Continuous Queries can use all query capabilities except: grouping, aggregation, and sorting operations.
11.6.2. Running Continuous Queries Link kopierenLink in die Zwischenablage kopiert!
To create a continuous query, do the following:
- Create a Query object. See the searching section
Obtain the ContinuousQuery (
org.infinispan.query.api.continuous.ContinuousQuery
object of your cache by calling the appropriate method:-
org.infinispan.client.hotrod.Search.getContinuousQuery(RemoteCache<K, V> cache)
for remote mode -
org.infinispan.query.Search.getContinuousQuery(Cache<K, V> cache)
for embedded mode
-
-
Register the query and a continuous query listener (
org.infinispan.query.api.continuous.ContinuousQueryListener
) as follows:
continuousQuery.addContinuousQueryListener(query, listener);
continuousQuery.addContinuousQueryListener(query, listener);
The following example demonstrates a simple continuous query use case in embedded mode:
Registering a Continuous Query
As Person instances having an age less than 21 are added to the cache they will be received by the listener and will be placed into the matches
map, and when these entries are removed from the cache or their age is modified to be greater or equal than 21 they will be removed from matches
.
11.6.3. Removing Continuous Queries Link kopierenLink in die Zwischenablage kopiert!
To stop the query from further execution just remove the listener:
continuousQuery.removeContinuousQueryListener(listener);
continuousQuery.removeContinuousQueryListener(listener);
11.6.4. Notes on performance of Continuous Queries Link kopierenLink in die Zwischenablage kopiert!
Continuous queries are designed to provide a constant stream of updates to the application, potentially resulting in a very large number of events being generated for particularly broad queries. A new temporary memory allocation is made for each event. This behavior may result in memory pressure, potentially leading to OutOfMemoryErrors
(especially in remote mode) if queries are not carefully designed. To prevent such issues it is strongly recommended to ensure that each query captures the minimal information needed both in terms of number of matched entries and size of each match (projections can be used to capture the interesting properties), and that each ContinuousQueryListener
is designed to quickly process all received events without blocking and to avoid performing actions that will lead to the generation of new matching events from the cache it listens to.
11.7. Statistics Link kopierenLink in die Zwischenablage kopiert!
Query Statistics can be obtained from the SearchManager
, as demonstrated in the following code snippet.
SearchManager searchManager = Search.getSearchManager(cache); org.hibernate.search.stat.Statistics statistics = searchManager.getStatistics();
SearchManager searchManager = Search.getSearchManager(cache);
org.hibernate.search.stat.Statistics statistics = searchManager.getStatistics();
This data is also available via JMX through the Hibernate Search StatisticsInfoMBean registered under the name org.infinispan:type=Query,manager="{name-of-cache-manager}",cache="{name-of-cache}",component=Statistics
. Please note this MBean is always registered by Data Grid but the statistics are collected only if statistics collection is enabled at cache level.
Hibernate Search has its own configuration properties hibernate.search.jmx_enabled
and hibernate.search.generate_statistics
for JMX statistics as explained here. Using them with Data Grid Query is forbidden as it will only lead to duplicated MBeans and unpredictable results.
11.8. Performance Tuning Link kopierenLink in die Zwischenablage kopiert!
11.8.1. Batch writing in SYNC mode Link kopierenLink in die Zwischenablage kopiert!
By default, the Index Managers work in sync mode, meaning when data is written to Data Grid, it will perform the indexing operations synchronously. This synchronicity guarantees indexes are always consistent with the data (and thus visible in searches), but can slowdown write operations since it will also perform a commit to the index. Committing is an extremely expensive operation in Lucene, and for that reason, multiple writes from different nodes can be automatically batched into a single commit to reduce the impact.
So, when doing data loads to Data Grid with index enabled, try to use multiple threads to take advantage of this batching.
If using multiple threads does not result in the required performance, an alternative is to load data with indexing temporarily disabled and run a re-indexing operation afterwards. This can be done writing data with the SKIP_INDEXING
flag:
cache.getAdvancedCache().withFlags(Flag.SKIP_INDEXING).put("key","value");
cache.getAdvancedCache().withFlags(Flag.SKIP_INDEXING).put("key","value");
11.8.2. Writing using async mode Link kopierenLink in die Zwischenablage kopiert!
If it’s acceptable a small delay between data writes and when that data is visible in queries, an index manager can be configured to work in async mode. The async mode offers much better writing performance, since in this mode commits happen at a configurable interval.
Configuration:
11.8.3. Index reader async strategy Link kopierenLink in die Zwischenablage kopiert!
Lucene internally works with snapshots of the index: once an IndexReader
is opened, it will only see the index changes up to the point it was opened; further index changes will not be visible until the IndexReader
is refreshed. The Index Managers used in Data Grid by default will check the freshness of the index readers before every query and refresh them if necessary.
It is possible to tune this strategy to relax this freshness checking to a pre-configured interval by using the reader.strategy
configuration set as async
:
11.8.4. Lucene Options Link kopierenLink in die Zwischenablage kopiert!
It is possible to apply tuning options in Lucene directly. For more details, see the Hibernate Search manual.