Chapter 4. Querying Values in Caches


Data Grid lets you perform queries to efficiently and quickly look up values in your data set, both with embedded Data Grid clusters or remote Data Grid server clusters.

Note

You can index and query cache values stored as Plain Old Java Objects (POJOs) or objects encoded as Protocol Buffers only.

4.1. Configuring Data Grid to Index Caches

Create indexes of values in your caches to improve query performance and use full-text search capabilities.

Note

Data Grid uses Apache Lucene technology to index values in caches.

Procedure

  1. Enable indexing in your cache configuration.

    <distributed-cache name="my-cache">
      <indexing>
        <!-- Indexing configuration goes here. -->
      </indexing>
    </distributed-cache>

    Adding an <indexing> element automatically enables indexing. You do not need to include the enabled attribute, even though the default value of the enabled attribute is false in the configuration schema.

  2. Specify each entity that you want to index as a value for the indexed-entity element.

    <distributed-cache name="my-cache">
      <indexing>
        <indexed-entities>
          <indexed-entity>...</indexed-entity>
        </indexed-entities>
      </indexing>
    </distributed-cache>

Plain Old Java Objects

For caches that store POJOs, you specify the fully qualified class name that is annotated with @Indexed, for example:

<indexed-entities>
  <indexed-entity>org.infinispan.sample.Car</indexed-entity>
  <indexed-entity>org.infinispan.sample.Truck</indexed-entity>
</indexed-entities>

Protobuf

For caches that store Protobuf-encoded entries, you specify the Message declared in the Protobuf schema.

For example, you use the following Protobuf schema:

package book_sample;

/* @Indexed */
message Book {

    /* @Field(store = Store.YES, analyze = Analyze.YES */
    optional string title = 1;

    /* @Field(store = Store.YES, analyze = Analyze.YES */
    optional string description = 2;
    optional int32 publicationYear = 3; // no native Date type available in Protobuf

    repeated Author authors = 4;
}

message Author {
    optional string name = 1;
    optional string surname = 2;
}

You should then specify the following value for the indexed-entity element:

<indexed-entities>
  <indexed-entity>book_sample.Book</indexed-entity>
</indexed-entities>

4.1.1. Enabling Cache Indexing Programmatically

Configure indexing for caches programmatically via Data Grid APIs.

Procedure

  • When using Data Grid as an embedded library, enable and configure indexing for caches with the IndexingConfigurationBuilder class, as in the following example:
import org.infinispan.configuration.cache.*;

ConfigurationBuilder config=new ConfigurationBuilder();
config.indexing().enable().storage(FILESYSTEM).path("/some/folder").addIndexedEntity(Book.class);

4.1.2. Index Annotations

When you enable indexing in Data Grid caches, you use the following annotations:

  • @Indexed denotes a Java object that you want to index.
  • @Field controls how fields within objects are indexed.

For Data Grid as an embedded library, you add these annotations your Java classes.

For Data Grid Server, you define Protobuf schemas, .proto files, that contain these annotations.

4.1.3. Index Configuration

Data Grid configuration controls how indexes are stored and constructed.

4.1.3.1. Index Storage

You can configure how Data Grid stores indexes:

  • On the host file system, which is the default and persists indexes between restarts.
  • In JVM heap memory, which means that indexes do not survive restarts.
    You should store indexes in JVM heap memory only for small datasets.

File system

<distributed-cache name="my-cache">
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Indexing configuration goes here. -->
  </indexing>
</distributed-cache>

JVM heap memory

<distributed-cache name="my-cache">
  <indexing storage="local-heap">
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

4.1.3.2. Index Reader

The index reader is an internal component that provides access to the indexes to perform queries. As the index content changes, Data Grid needs to refresh the reader so that search results are up to date. You can configure the refresh interval for the index reader. By default Data Grid reads the index before each query if the index changed since the last refresh.

<distributed-cache name="my-cache">
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Sets an interval of one second for the index reader. -->
    <index-reader refresh-interval="1000"/>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

4.1.3.3. Index Writer

The index writer is an internal component that constructs an index composed of one or more segments (sub-indexes) that can be merged over time to improve performance. Fewer segments usually means less overhead during a query because index reader operations need to take into account all segments.

Data Grid uses Apache Lucene internally and indexes entries in two tiers: memory and storage. New entries go to the memory index first and then, when a flush happens, to the configured index storage. Periodic commit operations occur that create segments from the previously flushed data and make all the index changes permanent.

Note

The index-writer configuration is optional. The defaults should work for most cases and custom configurations should only be used to tune performance.

<distributed-cache name="my-cache">
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <index-writer commit-interval="2000"
                  low-level-trace="false"
                  max-buffered-entries="32"
                  queue-count="1"
                  queue-size="10000"
                  ram-buffer-size="400"
                  thread-pool-size="2">
      <index-merge calibrate-by-deletes="true"
                   factor="3"
                   max-entries="2000"
                   min-size="10"
                   max-size="20"/>
    </index-writer>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>
Table 4.1. Index writer configuration attributes
AttributeDescription

commit-interval

Amount of time, in milliseconds, that index changes that are buffered in memory are flushed to the index storage and a commit is performed. Because operation is costly, small values should be avoided. The default is 1000 ms (1 second).

max-buffered-entries

Maximum number of entries that can be buffered in-memory before they are flushed to the index storage. Large values result in faster indexing but use more memory. When used in combination with the ram-buffer-size attribute, a flush occurs for whichever event happens first.

ram-buffer-size

Maximum amount of memory that can be used for buffering added entries and deletions before they are flushed to the index storage. Large values result in faster indexing but use more memory. For faster indexing performance you should set this attribute instead of max-buffered-entries. When used in combination with the max-buffered-entries attribute, a flush occurs for whichever event happens first.

thread-pool-size

Number of threads that execute write operations to the index.

queue-count

Number of internal queues to use for each indexed type. Each queue holds a batch of modifications that is applied to the index and queues are processed in parallel. Increasing the number of queues will lead to an increase of indexing throughput, but only if the bottleneck is CPU. For optimum results, do not set a value for queue-count that is larger than the value for thread-pool-size.

queue-size

Maximum number of elements each queue can hold. Increasing the queue-size value increases the amount of memory that is used during indexing operations. Setting a value that is too small can block indexing operations.

low-level-trace

Enables low-level trace information for indexing operations. Enabling this attribute substantially degrades performance. You should use this low-level tracing only as a last resource for troubleshooting.

To configure how Data Grid merges index segments, you use the index-merge sub-element.

Table 4.2. Index merge configuration attributes
AttributeDescription

max-entries

Maximum number of entries that an index segment can have before merging. Segments with more than this number of entries are not merged. Smaller values perform better on frequently changing indexes, larger values provide better search performance if the index does not change often.

factor

Number of segments that are merged at once. With smaller values, merging happens more often, which uses more resources, but the total number of segments will be lower on average, increasing search performance. Larger values (greater than 10) are best for heavy writing scenarios.

min-size

Minimum target size of segments, in MB, for background merges. Segments smaller than this size are merged more aggressively. Setting a value that is too large might result in expensive merge operations, even though they are less frequent.

max-size

Maximum size of segments, in MB, for background merges. Segments larger than this size are never merged in the background. Settings this to a lower value helps reduce memory requirements and avoids some merging operations at the cost of optimal search speed. This attribute is ignored when forcefully merging an index and max-forced-size applies instead.

max-forced-size

Maximum size of segments, in MB, for forced merges and overrides the max-size attribute. Set this to the same value as max-size or lower. However setting the value too low degrades search performance because documents are deleted.

calibrate-by-deletes

Whether the number of deleted entries in an index should be taken into account when counting the entries in the segment. Setting false will lead to more frequent merges caused by max-entries, but will more aggressively merge segments with many deleted documents, improving query performance.

Reference

For more information about indexing elements and attributes, refer to the Data Grid Configuration Schema.

4.1.4. Rebuilding Indexes

Rebuilding an index reconstructs it from the data stored in the cache. You rebuild indexes when you change things like the definitions of indexed types or analyzers. Likewise, you might also need to rebuild indexes if they are deleted for some reason.

Important

Rebuilding indexes can take a long time to complete because the process takes place for all data in the grid. While the rebuild operation is in progress, queries might also return fewer results.

Data Grid logs the following warning message when you rebuild an index:

WARN: Rebuilding indexes also affect queries, that can return less results than expected.

Procedure

  • Rebuild indexes on remote Data Grid servers by calling the reindexCache() method as follows:

    remoteCacheManager.administration().reindexCache("MyCache");
  • Rebuild indexes when using Data Grid as an embedded library as follows:

    Indexer indexer = Search.getIndexer(cache);
    CompletionStage<Void> future = index.run();

4.2. Creating Ickle Queries

Data Grid provides an Ickle query language that lets you create relational and full-text queries.

4.2.1. Ickle Query Example

To use the API, first obtain a QueryFactory to the cache and then call the .create() method, passing in the string to use in the query. Each QueryFactory instance is bound to the same Cache instance as the Search, but it is otherwise a stateless and thread-safe object that can be used for creating multiple queries in parallel.

For instance:

// Remote Query, using protobuf
QueryFactory qf = org.infinispan.client.hotrod.Search.getQueryFactory(remoteCache);
Query<Transaction> q = qf.create("from sample_bank_account.Transaction where amount > 20");

// Embedded Query using Java Objects
QueryFactory qf = org.infinispan.query.Search.getQueryFactory(cache);
Query<Transaction> q = qf.create("from org.infinispan.sample.Book where price > 20");

// Execute the query
QueryResult<Book> queryResult = q.execute();
Note

A query will always target a single entity type and is evaluated over the contents of a single cache. Running a query over multiple caches or creating queries that target several entity types (joins) is not supported.

Executing the query and fetching the results is as simple as invoking the execute() method of the Query object. Once executed, calling execute() on the same instance will re-execute the query.

Pagination

You can limit the number of returned results by using the Query.maxResults(int maxResults). This can be used in conjunction with Query.startOffset(long startOffset) to achieve pagination of the result set.

// sorted by year and match all books that have "clustering" in their title
// and return the third page of 10 results
Query<Book> query = queryFactory.create("FROM org.infinispan.sample.Book WHERE title like '%clustering%' ORDER BY year").startOffset(20).maxResults(10)

Number of Hits

The QueryResult object has the .hitCount() method to return the total number of results of the query, regardless of any pagination parameter. The hit count is only available for indexed queries for performance reasons.

Iteration

The Query object has the .iterator() method to obtain the results lazily. It returns an instance of CloseableIterator that must be closed after usage.

Note

The iteration support for Remote Queries is currently limited, as it will first fetch all entries to the client before iterating.

Named Query Parameters

Instead of building a new Query object for every execution it is possible to include named parameters in the query which can be substituted with actual values before execution. This allows a query to be defined once and be efficiently executed many times. Parameters can only be used on the right-hand side of an operator and are defined when the query is created by supplying an object produced by the org.infinispan.query.dsl.Expression.param(String paramName) method to the operator instead of the usual constant value. Once the parameters have been defined they can be set by invoking either Query.setParameter(parameterName, value) or Query.setParameters(parameterMap) as shown in the examples below. ⁠

QueryFactory queryFactory = Search.getQueryFactory(cache);
// Defining a query to search for various authors and publication years
Query<Book> query = queryFactory.create("SELECT title FROM org.infinispan.sample.Book WHERE author = :authorName AND publicationYear = :publicationYear").build();

// Set actual parameter values
query.setParameter("authorName", "Doe");
query.setParameter("publicationYear", 2010);

// Execute the query
List<Book> found = query.execute.list();

Alternatively, you can supply a map of actual parameter values to set multiple parameters at once: ⁠

Setting multiple named parameters at once

Map<String, Object> parameterMap = new HashMap<>();
parameterMap.put("authorName", "Doe");
parameterMap.put("publicationYear", 2010);

query.setParameters(parameterMap);

Note

A significant portion of the query parsing, validation and execution planning effort is performed during the first execution of a query with parameters. This effort is not repeated during subsequent executions leading to better performance compared to a similar query using constant values instead of query parameters.

4.2.2. Ickle Query Language Parser Syntax

The Ickle query language is small subset of the JPQL query language, with some extensions for full-text.

The parser syntax has some notable rules:

  • Whitespace is not significant.
  • Wildcards are not supported in field names.
  • A field name or path must always be specified, as there is no default field.
  • && and || are accepted instead of AND or OR in both full-text and JPA predicates.
  • ! may be used instead of NOT.
  • A missing boolean operator is interpreted as OR.
  • String terms must be enclosed with either single or double quotes.
  • Fuzziness and boosting are not accepted in arbitrary order; fuzziness always comes first.
  • != is accepted instead of <>.
  • Boosting cannot be applied to >,>=,<, operators. Ranges may be used to achieve the same result.
Filtering operators

Ickle support many filtering operators that can be used for both indexed and non-indexed fields.

OperatorDescriptionExample

in

Checks that the left operand is equal to one of the elements from the Collection of values given as argument.

FROM Book WHERE isbn IN ('ZZ', 'X1234')

like

Checks that the left argument (which is expected to be a String) matches a wildcard pattern that follows the JPA rules.

FROM Book WHERE title LIKE '%Java%'

=

Checks that the left argument is an exact match of the given value

FROM Book WHERE name = 'Programming Java'

!=

Checks that the left argument is different from the given value

FROM Book WHERE language != 'English'

>

Checks that the left argument is greater than the given value.

FROM Book WHERE price > 20

>=

Checks that the left argument is greater than or equal to the given value.

FROM Book WHERE price >= 20

<

Checks that the left argument is less than the given value.

FROM Book WHERE year < 2012

Checks that the left argument is less than or equal to the given value.

FROM Book WHERE price ⇐ 50

between

Checks that the left argument is between the given range limits.

FROM Book WHERE price BETWEEN 50 AND 100

Boolean conditions

Combining multiple attribute conditions with logical conjunction (and) and disjunction (or) operators in order to create more complex conditions is demonstrated in the following example. The well known operator precedence rule for boolean operators applies here, so the order of the operators is irrelevant. Here and operator still has higher priority than or even though or was invoked first.

# match all books that have "Data Grid" in their title
# or have an author named "Manik" and their description contains "clustering"

FROM org.infinispan.sample.Book WHERE title LIKE '%Data Grid%' OR author.name = 'Manik' AND description like '%clustering%'

Boolean negation has highest precedence among logical operators and applies only to the next simple attribute condition.

# match all books that do not have "Data Grid" in their title and are authored by "Manik"
FROM org.infinispan.sample.Book WHERE title != 'Data Grid' AND author.name = 'Manik'
Nested conditions

Changing the precedence of logical operators is achieved with parenthesis:

# match all books that have an author named "Manik" and their title contains
# "Data Grid" or their description contains "clustering"
FROM org.infinispan.sample.Book WHERE author.name = 'Manik' AND ( title like '%Data Grid%' OR description like '% clustering%')
Selecting attributes

In some use cases returning the whole domain object is overkill if only a small subset of the attributes are actually used by the application, especially if the domain entity has embedded entities. The query language allows you to specify a subset of attributes (or attribute paths) to return - the projection. If projections are used then the QueryResult.list() will not return the whole domain entity but will return a List of Object[], each slot in the array corresponding to a projected attribute.

# match all books that have "Data Grid" in their title or description
# and return only their title and publication year
SELECT title, publicationYear FROM org.infinispan.sample.Book WHERE title like '%Data Grid%' OR description like '%Data Grid%'
Sorting

Ordering the results based on one or more attributes or attribute paths is done with the ORDER BY clause. If multiple sorting criteria are specified, then the order will dictate their precedence.

# match all books that have "Data Grid" in their title or description
# and return them sorted by the publication year and title
FROM org.infinispan.sample.Book WHERE title like '%Data Grid%' ORDER BY publicationYear DESC, title ASC
Grouping and Aggregation

Data Grid has the ability to group query results according to a set of grouping fields and construct aggregations of the results from each group by applying an aggregation function to the set of values that fall into each group. Grouping and aggregation can only be applied to projection queries (queries with one or more field in the SELECT clause).

The supported aggregations are: avg, sum, count, max, and min.

The set of grouping fields is specified with the GROUP BY clause and the order used for defining grouping fields is not relevant. All fields selected in the projection must either be grouping fields or else they must be aggregated using one of the grouping functions described below. A projection field can be aggregated and used for grouping at the same time. A query that selects only grouping fields but no aggregation fields is legal. ⁠ Example: Grouping Books by author and counting them.

SELECT author, COUNT(title) FROM org.infinispan.sample.Book WHERE title LIKE '%engine%' GROUP BY author
Note

A projection query in which all selected fields have an aggregation function applied and no fields are used for grouping is allowed. In this case the aggregations will be computed globally as if there was a single global group.

Aggregations

You can apply the following aggregation functions to a field:

Table 4.3. Index merge attributes
Aggregation functionDescription

avg()

Computes the average of a set of numbers. Accepted values are primitive numbers and instances of java.lang.Number. The result is represented as java.lang.Double. If there are no non-null values the result is null instead.

count()

Counts the number of non-null rows and returns a java.lang.Long. If there are no non-null values the result is 0 instead.

max()

Returns the greatest value found. Accepted values must be instances of java.lang.Comparable. If there are no non-null values the result is null instead.

min()

Returns the smallest value found. Accepted values must be instances of java.lang.Comparable. If there are no non-null values the result is null instead.

sum()

Computes the sum of a set of Numbers. If there are no non-null values the result is null instead. The following table indicates the return type based on the specified field.

Table 4.4. Table sum return type
Field TypeReturn Type

Integral (other than BigInteger)

Long

Float or Double

Double

BigInteger

BigInteger

BigDecimal

BigDecimal

Evaluation of queries with grouping and aggregation

Aggregation queries can include filtering conditions, like usual queries. Filtering can be performed in two stages: before and after the grouping operation. All filter conditions defined before invoking the groupBy() method will be applied before the grouping operation is performed, directly to the cache entries (not to the final projection). These filter conditions can reference any fields of the queried entity type, and are meant to restrict the data set that is going to be the input for the grouping stage. All filter conditions defined after invoking the groupBy() method will be applied to the projection that results from the projection and grouping operation. These filter conditions can either reference any of the groupBy() fields or aggregated fields. Referencing aggregated fields that are not specified in the select clause is allowed; however, referencing non-aggregated and non-grouping fields is forbidden. Filtering in this phase will reduce the amount of groups based on their properties. Sorting can also be specified similar to usual queries. The ordering operation is performed after the grouping operation and can reference any of the groupBy() fields or aggregated fields.

4.3. Embedded Queries

Use embedded queries when you add Data Grid as a library to custom applications.

Protobuf mapping is not required with embedded queries. Indexing and querying are both done on top of Java objects.

4.3.1. Embedded Query Example

We’re going to store Book instances in an Data Grid cache called "books". Book instances will be indexed, so we enable indexing for the cache configuration:

infinispan.xml

<distributed-cache name="books">
  <indexing path="${user.home}/index">
    <indexed-entities>
      <indexed-entity>org.infinispan.sample.Book</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

Obtaining the cache:

import org.infinispan.Cache;
import org.infinispan.manager.DefaultCacheManager;
import org.infinispan.manager.EmbeddedCacheManager;

EmbeddedCacheManager manager = new DefaultCacheManager("infinispan.xml");
Cache<String, Book> cache = manager.getCache("books");

Each Book will be defined as in the following example; we have to choose which properties are indexed, and for each property we can optionally choose advanced indexing options using the annotations defined in the Hibernate Search project.

Book.java

package org.infinispan.sample;

import java.time.LocalDate;
import java.util.HashSet;
import java.util.Set;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.*;

// Annotate values with @Indexed to add them to indexes
// Annotate each fields according to how you want to index it
@Indexed
public class Book {
   @FullTextField
   String title;

   @FullTextField
   String description;

   @KeywordField
   String isbn;

   @GenericField
   LocalDate publicationDate;

   @IndexedEmbedded
   Set<Author> authors = new HashSet<Author>();
}

Author.java

package org.infinispan.sample;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;

public class Author {
   @FullTextField
   String name;

   @FullTextField
   String surname;
}

Now assuming we stored several Book instances in our Data Grid Cache, we can search them for any matching field as in the following example.

QueryExample.java

// Get the query factory from the cache
QueryFactory queryFactory = org.infinispan.query.Search.getQueryFactory(cache);

// Create an Ickle query that performs a full-text search using the ':' operator on the 'title' and 'authors.name' fields
// You can perform full-text search only on indexed caches
Query<Book> fullTextQuery = queryFactory.create("FROM org.infinispan.sample.Book b WHERE b.title:'infinispan' AND b.authors.name:'sanne'");

// Use the '=' operator to query fields in caches that are indexed or not
// Non full-text operators apply only to fields that are not analyzed
Query<Book> exactMatchQuery=queryFactory.create("FROM org.infinispan.sample.Book b WHERE b.isbn = '12345678' AND b.authors.name : 'sanne'");

// You can use full-text and non-full text operators in the same query
Query<Book> query=queryFactory.create("FROM org.infinispan.sample.Book b where b.authors.name : 'Stephen' and b.description : (+'dark' -'tower')");

// Get the results
List<Book> found=query.execute().list();

4.3.2. Mapping Entities

Data Grid relies on the API of Hibernate Search in order to define fine grained configuration for indexing at entity level. This configuration includes which fields are annotated, which analyzers should be used, how to map nested objects and so on. Detailed documentation is available at the Hibernate Search manual.

@DocumentId

Unlike Hibernate Search, using @DocumentId to mark a field as identifier does not apply to Data Grid values; in Data Grid the identifier for all @Indexed objects is the key used to store the value. You can still customize how the key is indexed using a combination of @Transformable , custom types and custom FieldBridge implementations.

@Transformable keys

The key for each value needs to be indexed as well, and the key instance must be transformed in a String. Data Grid includes some default transformation routines to encode common primitives, but to use a custom key you must provide an implementation of org.infinispan.query.Transformer .

Registering a key Transformer via annotations

You can annotate your key class with org.infinispan.query.Transformable and your custom transformer implementation will be picked up automatically:

@Transformable(transformer = CustomTransformer.class)
public class CustomKey {
   ...
}

public class CustomTransformer implements Transformer {
   @Override
   public Object fromString(String s) {
      ...
      return new CustomKey(...);
   }

   @Override
   public String toString(Object customType) {
      CustomKey ck = (CustomKey) customType;
      return ...
   }
}

Registering a key Transformer via the cache indexing configuration

Use the key-transformers xml element in both embedded and server config:

<replicated-cache name="test">
  <indexing auto-config="true">
    <key-transformers>
      <key-transformer key="com.mycompany.CustomKey"
                       transformer="com.mycompany.CustomTransformer"/>
    </key-transformers>
  </indexing>
</replicated-cache>

Alternatively, use the Java configuration API (embedded mode):

   ConfigurationBuilder builder = ...
   builder.indexing().enable()
         .addKeyTransformer(CustomKey.class, CustomTransformer.class);
Programmatic mapping

Instead of using annotations to map an entity to the index, it’s also possible to configure it programmatically.

In the following example we map an object Author which is to be stored in the grid and made searchable on two properties but without annotating the class.

import org.apache.lucene.search.Query;
import org.hibernate.search.cfg.Environment;
import org.hibernate.search.cfg.SearchMapping;
import org.hibernate.search.query.dsl.QueryBuilder;
import org.infinispan.Cache;
import org.infinispan.configuration.cache.Configuration;
import org.infinispan.configuration.cache.ConfigurationBuilder;
import org.infinispan.configuration.cache.Index;
import org.infinispan.manager.DefaultCacheManager;
import org.infinispan.query.CacheQuery;
import org.infinispan.query.Search;
import org.infinispan.query.SearchManager;

import java.io.IOException;
import java.lang.annotation.ElementType;
import java.util.Properties;

SearchMapping mapping = new SearchMapping();
mapping.entity(Author.class).indexed()
       .property("name", ElementType.METHOD).field()
       .property("surname", ElementType.METHOD).field();

Properties properties = new Properties();
properties.put(Environment.MODEL_MAPPING, mapping);
properties.put("hibernate.search.[other options]", "[...]");

Configuration infinispanConfiguration = new ConfigurationBuilder()
        .indexing().index(Index.NONE)
        .withProperties(properties)
        .build();

DefaultCacheManager cacheManager = new DefaultCacheManager(infinispanConfiguration);

Cache<Long, Author> cache = cacheManager.getCache();
SearchManager sm = Search.getSearchManager(cache);

Author author = new Author(1, "Manik", "Surtani");
cache.put(author.getId(), author);

QueryBuilder qb = sm.buildQueryBuilderForClass(Author.class).get();
Query q = qb.keyword().onField("name").matching("Manik").createQuery();
CacheQuery cq = sm.getQuery(q, Author.class);
assert cq.getResultSize() == 1;

4.4. Remote Queries

Use remote queries when you set up Data Grid Sever clusters that you access from clients.

To perform remote queries, data in the cache must use Google Protocol Buffers as an encoding for both over-the-wire transmission and storage. Additionally, remote queries require Protobuf schemas (.proto files) to define the data structure and indexing elements.

Note

The benefit of using Protobuf with remote queries is that it is language neutral and works with Hot Rod Java clients as well as REST, C++, C#, and Node.js clients.

4.4.1. Remote Query Example

An object called Book will be stored in a Data Grid cache called "books". Book instances will be indexed, so we enable indexing for the cache:

infinispan.xml

<replicated-cache name="books">
  <indexing>
    <indexed-entities>
      <indexed-entity>book_sample.Book</indexed-entity>
    </indexed-entities>
  </indexing>
</replicated-cache>

Alternatively, if the cache is not indexed, we configure the <encoding> as application/x-protostream to make sure the storage is queryable:

infinispan.xml

<replicated-cache name="books">
  <encoding media-type="application/x-protostream"/>
</replicated-cache>

Each Book will be defined as in the following example: we use @Protofield annotations to identify the message fields and the @ProtoDoc annotation on the fields to configure indexing attributes:

Book.java

import org.infinispan.protostream.annotations.ProtoDoc;
import org.infinispan.protostream.annotations.ProtoFactory;
import org.infinispan.protostream.annotations.ProtoField;

@ProtoDoc("@Indexed")
public class Book {

   @ProtoDoc("@Field(index=Index.YES, analyze = Analyze.YES, store = Store.NO)")
   @ProtoField(number = 1)
   final String title;

   @ProtoDoc("@Field(index=Index.YES, analyze = Analyze.YES, store = Store.NO)")
   @ProtoField(number = 2)
   final String description;

   @ProtoDoc("@Field(index=Index.YES, analyze = Analyze.YES, store = Store.NO)")
   @ProtoField(number = 3, defaultValue = "0")
   final int publicationYear;


   @ProtoFactory
   Book(String title, String description, int publicationYear) {
      this.title = title;
      this.description = description;
      this.publicationYear = publicationYear;
   }
   // public Getter methods omitted for brevity
}

During compilation, the annotations in the preceding example generate the artifacts necessary to read, write and query Book instances. To enable this generation, use the @AutoProtoSchemaBuilder annotation in a newly created class with empty constructor or interface:

RemoteQueryInitializer.java

import org.infinispan.protostream.SerializationContextInitializer;
import org.infinispan.protostream.annotations.AutoProtoSchemaBuilder;

@AutoProtoSchemaBuilder(
      includeClasses = {
            Book.class
      },
      schemaFileName = "book.proto",
      schemaFilePath = "proto/",
      schemaPackageName = "book_sample")
public interface RemoteQueryInitializer extends SerializationContextInitializer {
}

After compilation, a file book.proto file will be created in the configured schemaFilePath, along with an implementation RemoteQueryInitializerImpl.java of the annotated interface. This concrete class can be used directly in the Hot Rod client code to initialize the serialization context.

Putting all together:

RemoteQuery.java

package org.infinispan;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

import org.infinispan.client.hotrod.RemoteCache;
import org.infinispan.client.hotrod.RemoteCacheManager;
import org.infinispan.client.hotrod.Search;
import org.infinispan.client.hotrod.configuration.ConfigurationBuilder;
import org.infinispan.query.dsl.Query;
import org.infinispan.query.dsl.QueryFactory;
import org.infinispan.query.remote.client.ProtobufMetadataManagerConstants;

public class RemoteQuery {

   public static void main(String[] args) throws Exception {
      ConfigurationBuilder clientBuilder = new ConfigurationBuilder();
      // RemoteQueryInitializerImpl is generated
      clientBuilder.addServer().host("127.0.0.1").port(11222)
            .security().authentication().username("user").password("user")
            .addContextInitializers(new RemoteQueryInitializerImpl());

      RemoteCacheManager remoteCacheManager = new RemoteCacheManager(clientBuilder.build());

      // Grab the generated protobuf schema and registers in the server.
      Path proto = Paths.get(RemoteQuery.class.getClassLoader()
            .getResource("proto/book.proto").toURI());
      String protoBufCacheName = ProtobufMetadataManagerConstants.PROTOBUF_METADATA_CACHE_NAME;
      remoteCacheManager.getCache(protoBufCacheName).put("book.proto", Files.readString(proto));

      // Obtain the 'books' remote cache
      RemoteCache<Object, Object> remoteCache = remoteCacheManager.getCache("books");

      // Add some Books
      Book book1 = new Book("Infinispan in Action", "Learn Infinispan with using it", 2015);
      Book book2 = new Book("Cloud-Native Applications with Java and Quarkus", "Build robust and reliable cloud applications", 2019);

      remoteCache.put(1, book1);
      remoteCache.put(2, book2);

      // Execute a full-text query
      QueryFactory queryFactory = Search.getQueryFactory(remoteCache);
      Query<Book> query = queryFactory.create("FROM book_sample.Book WHERE title:'java'");

      List<Book> list = query.execute().list(); // Voila! We have our book back from the cache!
   }
}

4.4.2. Registering Protobuf Schemas

To query protobuf entities, you must provide the client and server with the relevant metadata about your entities in a Protobuf schema (.proto file).

The descriptors are stored in a dedicated ___protobuf_metadata cache on the server. Both keys and values in this cache are plain strings. Registering a new schema is therefore as simple as performing a put() operation on this cache using the schema name as key and the schema file itself as the value.

Note

If caches use authorization, users must have CREATE permissions to add entries in the ___protobuf_metadata cache. Assign users the deployer role at minimum if you use default authorization settings.

Alternatively you can use the schema command with the Data Grid CLI, Data Grid Console, REST endpoint /rest/v2/schemas, or the ProtobufMetadataManager MBean via JMX.

Note

Even if indexing is enabled for a cache no fields of Protobuf encoded entries will be indexed unless you use the @Indexed and @Field inside Protobuf schema documentation annotations (@ProtoDoc) to specify what fields need to get indexed.

4.4.3. Analysis

Analysis is a process that converts input data into one or more terms that you can index and query. While in Embedded Query mapping is done through Hibernate Search annotations, that supports a rich set of Lucene based analyzers, in client-server mode the analyzer definitions are declared in a platform neutral way.

Default Analyzers

Data Grid provides a set of default analyzers for remote query as follows:

DefinitionDescription

standard

Splits text fields into tokens, treating whitespace and punctuation as delimiters.

simple

Tokenizes input streams by delimiting at non-letters and then converting all letters to lowercase characters. Whitespace and non-letters are discarded.

whitespace

Splits text streams on whitespace and returns sequences of non-whitespace characters as tokens.

keyword

Treats entire text fields as single tokens.

stemmer

Stems English words using the Snowball Porter filter.

ngram

Generates n-gram tokens that are 3 grams in size by default.

filename

Splits text fields into larger size tokens than the standard analyzer, treating whitespace as a delimiter and converts all letters to lowercase characters.

These analyzer definitions are based on Apache Lucene and are provided "as-is". For more information about tokenizers, filters, and CharFilters, see the appropriate Lucene documentation.

Using Analyzer Definitions

To use analyzer definitions, reference them by name in the .proto schema file.

  1. Include the Analyze.YES attribute to indicate that the property is analyzed.
  2. Specify the analyzer definition with the @Analyzer annotation.

The following example shows referenced analyzer definitions:

/* @Indexed */
message TestEntity {

    /* @Field(store = Store.YES, analyze = Analyze.YES, analyzer = @Analyzer(definition = "keyword")) */
    optional string id = 1;

    /* @Field(store = Store.YES, analyze = Analyze.YES, analyzer = @Analyzer(definition = "simple")) */
    optional string name = 2;
}

If using Java classes annotated with @ProtoField, the declaration is similar:

@ProtoDoc("@Field(store = Store.YES, analyze = Analyze.YES, analyzer = @Analyzer(definition = \"keyword\"))")
@ProtoField(number = 1)
final String id;

@ProtoDoc("@Field(store = Store.YES, analyze = Analyze.YES, analyzer = @Analyzer(definition = \"simple\"))")
@ProtoField(number = 2)
final String description;
Creating Custom Analyzer Definitions

If you require custom analyzer definitions, do the following:

  1. Create an implementation of the ProgrammaticSearchMappingProvider interface packaged in a JAR file.
  2. Provide a file named org.infinispan.query.spi.ProgrammaticSearchMappingProvider in the META-INF/services/ directory of your JAR. This file should contain the fully qualified class name of your implementation.
  3. Copy the JAR to the lib/ directory of your Data Grid Server installation.

    Important

    Your JAR must be available to Data Grid Server during startup. You cannot add it to an already running server.

The following is an example implementation of the ProgrammaticSearchMappingProvider interface:

import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.core.StopFilterFactory;
import org.apache.lucene.analysis.standard.StandardFilterFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.hibernate.search.cfg.SearchMapping;
import org.infinispan.Cache;
import org.infinispan.query.spi.ProgrammaticSearchMappingProvider;

public final class MyAnalyzerProvider implements ProgrammaticSearchMappingProvider {

   @Override
   public void defineMappings(Cache cache, SearchMapping searchMapping) {
      searchMapping
            .analyzerDef("standard-with-stop", StandardTokenizerFactory.class)
               .filter(StandardFilterFactory.class)
               .filter(LowerCaseFilterFactory.class)
               .filter(StopFilterFactory.class);
   }
}

4.5. Continuous Queries

Continuous Queries allow an application to register a listener which will receive the entries that currently match a query filter, and will be continuously notified of any changes to the queried data set that result from further cache operations. This includes incoming matches, for values that have joined the set, updated matches, for matching values that were modified and continue to match, and outgoing matches, for values that have left the set. By using a Continuous Query the application receives a steady stream of events instead of having to repeatedly execute the same query to discover changes, resulting in a more efficient use of resources. For instance, all of the following use cases could utilize Continuous Queries:

  • Return all persons with an age between 18 and 25 (assuming the Person entity has an age property and is updated by the user application).
  • Return all transactions higher than $2000.
  • Return all times where the lap speed of F1 racers were less than 1:45.00s (assuming the cache contains Lap entries and that laps are entered live during the race).

4.5.1. Continuous Query Execution

A continuous query uses a listener that is notified when:

  • An entry starts matching the specified query, represented by a Join event.
  • A matching entry is updated and continues to match the query, represented by an Update vent.
  • An entry stops matching the query, represented by a Leave event.

When a client registers a continuous query listener it immediately begins to receive the results currently matching the query, received as Join events as described above. In addition, it will receive subsequent notifications when other entries begin matching the query, as Join events, or stop matching the query, as Leave events, as a consequence of any cache operations that would normally generate creation, modification, removal, or expiration events. Updated cache entries will generate Update events if the entry matches the query filter before and after the operation. To summarize, the logic used to determine if the listener receives a Join, Update or Leave event is:

  1. If the query on both the old and new values evaluate false, then the event is suppressed.
  2. If the query on the old value evaluates false and on the new value evaluates true, then a Join event is sent.
  3. If the query on both the old and new values evaluate true, then an Update event is sent.
  4. If the query on the old value evaluates true and on the new value evaluates false, then a Leave event is sent.
  5. If the query on the old value evaluates true and the entry is removed or expired, then a Leave event is sent.
Note

Continuous Queries can use all query capabilities except: grouping, aggregation, and sorting operations.

4.5.2. Creating Continuous Queries

To create a continuous query, do the following:

  1. Create a Query object.
  2. Obtain the ContinuousQuery (org.infinispan.query.api.continuous.ContinuousQuery object of your cache by calling the appropriate method:

    • org.infinispan.client.hotrod.Search.getContinuousQuery(RemoteCache<K, V> cache) for remote mode
    • org.infinispan.query.Search.getContinuousQuery(Cache<K, V> cache) for embedded mode
  3. Register the query and a continuous query listener (org.infinispan.query.api.continuous.ContinuousQueryListener) as follows:
continuousQuery.addContinuousQueryListener(query, listener);

The following example demonstrates a simple continuous query use case in embedded mode: ⁠

Registering a Continuous Query

import org.infinispan.query.api.continuous.ContinuousQuery;
import org.infinispan.query.api.continuous.ContinuousQueryListener;
import org.infinispan.query.Search;
import org.infinispan.query.dsl.QueryFactory;
import org.infinispan.query.dsl.Query;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

[...]

// We have a cache of Persons
Cache<Integer, Person> cache = ...

// We begin by creating a ContinuousQuery instance on the cache
ContinuousQuery<Integer, Person> continuousQuery = Search.getContinuousQuery(cache);

// Define our query. In this case we will be looking for any Person instances under 21 years of age.
QueryFactory queryFactory = Search.getQueryFactory(cache);
Query query = queryFactory.create("FROM Person p WHERE p.age < 21");

final Map<Integer, Person> matches = new ConcurrentHashMap<Integer, Person>();

// Define the ContinuousQueryListener
ContinuousQueryListener<Integer, Person> listener = new ContinuousQueryListener<Integer, Person>() {
    @Override
    public void resultJoining(Integer key, Person value) {
        matches.put(key, value);
    }

    @Override
    public void resultUpdated(Integer key, Person value) {
        // we do not process this event
    }

    @Override
    public void resultLeaving(Integer key) {
        matches.remove(key);
    }
};

// Add the listener and the query
continuousQuery.addContinuousQueryListener(query, listener);

[...]

// Remove the listener to stop receiving notifications
continuousQuery.removeContinuousQueryListener(listener);

As Person instances having an age less than 21 are added to the cache they will be received by the listener and will be placed into the matches map, and when these entries are removed from the cache or their age is modified to be greater or equal than 21 they will be removed from matches.

4.5.3. Removing Continuous Queries

To stop the query from further execution just remove the listener:

continuousQuery.removeContinuousQueryListener(listener);

4.5.4. Continuous Query Performance

Continuous queries are designed to provide a constant stream of updates to the application, potentially resulting in a very large number of events being generated for particularly broad queries. A new temporary memory allocation is made for each event. This behavior may result in memory pressure, potentially leading to OutOfMemoryErrors (especially in remote mode) if queries are not carefully designed. To prevent such issues it is strongly recommended to ensure that each query captures the minimal information needed both in terms of number of matched entries and size of each match (projections can be used to capture the interesting properties), and that each ContinuousQueryListener is designed to quickly process all received events without blocking and to avoid performing actions that will lead to the generation of new matching events from the cache it listens to.

4.6. Query Statistics

Data Grid exposes indexing and querying statistics through the Search entry point, which you can use programmatically as follows:

// Statistics for the local cluster member
SearchStatistics statistics = Search.getSearchStatistics(cache);

// Consolidated statistics for the whole cluster
CompletionStage<SearchStatisticsSnapshot> statistics = Search.getClusteredSearchStatistics(cache)

Statistics via REST

Data Grid Server also exposes index and query statistics through the REST endpoint.

Use the following invocation: GET /v2/caches/{cacheName}/search/stats

4.7. Query Performance Tuning

Data Grid exposes statistics for queries and provides configurable attributes so you can monitor and tune queries for optimal performance.

Check index usage statistics

Indexing caches improves query performance. However, in some cases, queries might only partially use an index such as when not all fields in the schema are annotated.

Start tuning query performance by checking the time it takes for each type of query to run. If your queries seem to be slow, you should make sure that queries are using the indexes for caches and that all entities and field mappings are indexed.

Indexing performance

Indexing can degrade write throughput for Data Grid clusters. The commit-interval attribute defines the interval, in milliseconds, between which index changes that are buffered in memory are flushed to the index storage and a commit is performed.

This operation is costly so you should avoid configuring an interval that is too small. The default is 1000 ms (1 second).

Querying performance

The refresh-interval attribute defines the interval, in milliseconds, between which the index reader is refreshed.

The default value is 0, which returns data in queries as soon as it is written to a cache.

A value greater than 0 results in some stale query results but substantially increases throughput, especially in write-heavy scenarios. If you do not need data returned in queries as soon as it is written, you should adjust the refresh interval to improve query performance.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.