Chapter 1. Indexing Data Grid caches

1.1. Configuring Data Grid to index caches
Copy link

Enable indexing in your cache configuration and specify which entities Data Grid should include when creating indexes.

You should always configure Data Grid to index caches when using queries. Indexing provides a significant performance boost to your queries, allowing you to get faster insights into your data.

Procedure

Enable indexing in your cache configuration.
```
<distributed-cache>
  <indexing>
    
  </indexing>
</distributed-cache>
```
```
<distributed-cache>
  <indexing>
    
  </indexing>
</distributed-cache>
```
Copy to Clipboard Toggle word wrap
Tip
Adding an indexing element to your configuration enables indexing without the need to include the enabled=true attribute.
For remote caches adding this element also implicitly configures encoding as ProtoStream.

Specify the entities to index with the indexed-entity element.

<distributed-cache>
  <indexing>
    <indexed-entities>
      <indexed-entity>...</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing>
    <indexed-entities>
      <indexed-entity>...</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

Protobuf messages

Specify the message declared in the schema as the value of the indexed-entity element, for example:

<distributed-cache>
  <indexing>
    <indexed-entities>
      <indexed-entity>org.infinispan.sample.Car</indexed-entity>
      <indexed-entity>org.infinispan.sample.Truck</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing>
    <indexed-entities>
      <indexed-entity>org.infinispan.sample.Car</indexed-entity>
      <indexed-entity>org.infinispan.sample.Truck</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

This configuration indexes the Book message in a schema with the book_sample package name.

package book_sample;

/* @Indexed */
message Book {

    /* @Field(store = Store.YES, analyze = Analyze.YES) */
    optional string title = 1;

    /* @Field(store = Store.YES, analyze = Analyze.YES) */
    optional string description = 2;
    optional int32 publicationYear = 3; // no native Date type available in Protobuf

    repeated Author authors = 4;
}

message Author {
    optional string name = 1;
    optional string surname = 2;
}

package book_sample;

/* @Indexed */
message Book {

    /* @Field(store = Store.YES, analyze = Analyze.YES) */
    optional string title = 1;

    /* @Field(store = Store.YES, analyze = Analyze.YES) */
    optional string description = 2;
    optional int32 publicationYear = 3; // no native Date type available in Protobuf

    repeated Author authors = 4;
}

message Author {
    optional string name = 1;
    optional string surname = 2;
}

Copy to Clipboard

Toggle word wrap

Java objects

Specify the fully qualified name (FQN) of each class that includes the @Indexed annotation.

XML

<distributed-cache>
  <indexing>
    <indexed-entities>
      <indexed-entity>book_sample.Book</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing>
    <indexed-entities>
      <indexed-entity>book_sample.Book</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

ConfigurationBuilder

import org.infinispan.configuration.cache.*;

ConfigurationBuilder config=new ConfigurationBuilder();
config.indexing().enable().storage(FILESYSTEM).path("/some/folder").addIndexedEntity(Book.class);

import org.infinispan.configuration.cache.*;

ConfigurationBuilder config=new ConfigurationBuilder();
config.indexing().enable().storage(FILESYSTEM).path("/some/folder").addIndexedEntity(Book.class);

Copy to Clipboard

Toggle word wrap

1.1.1. Index configuration
Copy link

Data Grid configuration controls how indexes are stored and constructed.

1.1.1.1. Index storage
Copy link

You can configure how Data Grid stores indexes:

On the host file system, which is the default and persists indexes between restarts.
In JVM heap memory, which means that indexes do not survive restarts.
You should store indexes in JVM heap memory only for small datasets.

File system

<distributed-cache>
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Indexing configuration goes here. -->
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Indexing configuration goes here. -->
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

JVM heap memory

<distributed-cache>
  <indexing storage="local-heap">
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing storage="local-heap">
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

1.1.1.2. Index reader
Copy link

The index reader is an internal component that provides access to the indexes to perform queries. As the index content changes, Data Grid needs to refresh the reader so that search results are up to date. You can configure the refresh interval for the index reader. By default Data Grid reads the index before each query if the index changed since the last refresh.

<distributed-cache>
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Sets an interval of one second for the index reader. -->
    <index-reader refresh-interval="1000"/>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Sets an interval of one second for the index reader. -->
    <index-reader refresh-interval="1000"/>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

1.1.1.3. Index writer
Copy link

The index writer is an internal component that constructs an index composed of one or more segments (sub-indexes) that can be merged over time to improve performance. Fewer segments usually means less overhead during a query because index reader operations need to take into account all segments.

Data Grid uses Apache Lucene internally and indexes entries in two tiers: memory and storage. New entries go to the memory index first and then, when a flush happens, to the configured index storage. Periodic commit operations occur that create segments from the previously flushed data and make all the index changes permanent.

Note

The index-writer configuration is optional. The defaults should work for most cases and custom configurations should only be used to tune performance.

<distributed-cache>
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <index-writer commit-interval="2000"
                  low-level-trace="false"
                  max-buffered-entries="32"
                  queue-count="1"
                  queue-size="10000"
                  ram-buffer-size="400"
                  thread-pool-size="2">
      <index-merge calibrate-by-deletes="true"
                   factor="3"
                   max-entries="2000"
                   min-size="10"
                   max-size="20"/>
    </index-writer>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

<distributed-cache>
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <index-writer commit-interval="2000"
                  low-level-trace="false"
                  max-buffered-entries="32"
                  queue-count="1"
                  queue-size="10000"
                  ram-buffer-size="400"
                  thread-pool-size="2">
      <index-merge calibrate-by-deletes="true"
                   factor="3"
                   max-entries="2000"
                   min-size="10"
                   max-size="20"/>
    </index-writer>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

Copy to Clipboard

Toggle word wrap

Expand

Table 1.1. Index writer configuration attributes
Attribute	Description
`commit-interval`	Amount of time, in milliseconds, that index changes that are buffered in memory are flushed to the index storage and a commit is performed. Because operation is costly, small values should be avoided. The default is 1000 ms (1 second).
`max-buffered-entries`	Maximum number of entries that can be buffered in-memory before they are flushed to the index storage. Large values result in faster indexing but use more memory. When used in combination with the `ram-buffer-size` attribute, a flush occurs for whichever event happens first.
`ram-buffer-size`	Maximum amount of memory that can be used for buffering added entries and deletions before they are flushed to the index storage. Large values result in faster indexing but use more memory. For faster indexing performance you should set this attribute instead of `max-buffered-entries`. When used in combination with the `max-buffered-entries` attribute, a flush occurs for whichever event happens first.
`thread-pool-size`	Number of threads that execute write operations to the index.
`queue-count`	Number of internal queues to use for each indexed type. Each queue holds a batch of modifications that is applied to the index and queues are processed in parallel. Increasing the number of queues will lead to an increase of indexing throughput, but only if the bottleneck is CPU. For optimum results, do not set a value for `queue-count` that is larger than the value for `thread-pool-size`.
`queue-size`	Maximum number of elements each queue can hold. Increasing the `queue-size` value increases the amount of memory that is used during indexing operations. Setting a value that is too small can block indexing operations.
`low-level-trace`	Enables low-level trace information for indexing operations. Enabling this attribute substantially degrades performance. You should use this low-level tracing only as a last resource for troubleshooting.

To configure how Data Grid merges index segments, you use the index-merge sub-element.

Expand

Table 1.2. Index merge configuration attributes
Attribute	Description
`max-entries`	Maximum number of entries that an index segment can have before merging. Segments with more than this number of entries are not merged. Smaller values perform better on frequently changing indexes, larger values provide better search performance if the index does not change often.
`factor`	Number of segments that are merged at once. With smaller values, merging happens more often, which uses more resources, but the total number of segments will be lower on average, increasing search performance. Larger values (greater than 10) are best for heavy writing scenarios.
`min-size`	Minimum target size of segments, in MB, for background merges. Segments smaller than this size are merged more aggressively. Setting a value that is too large might result in expensive merge operations, even though they are less frequent.
`max-size`	Maximum size of segments, in MB, for background merges. Segments larger than this size are never merged in the background. Settings this to a lower value helps reduce memory requirements and avoids some merging operations at the cost of optimal search speed. This attribute is ignored when forcefully merging an index and `max-forced-size` applies instead.
`max-forced-size`	Maximum size of segments, in MB, for forced merges and overrides the `max-size` attribute. Set this to the same value as `max-size` or lower. However setting the value too low degrades search performance because documents are deleted.
`calibrate-by-deletes`	Whether the number of deleted entries in an index should be taken into account when counting the entries in the segment. Setting `false` will lead to more frequent merges caused by `max-entries`, but will more aggressively merge segments with many deleted documents, improving query performance.

1.2. Indexing annotations
Copy link

When you enable indexing in caches, you configure Data Grid to create indexes. You also need to provide Data Grid with a structured representation of the entities in your caches so it can actually index them.

There are two annotations that control the entities and fields that Data Grid indexes:

@Indexed

Indicates entities, or Protobuf message types, that Data Grid indexes.

@Field

Indicates fields that Data Grid indexes and has the following attributes:

Expand

Attribute	Description	Values
`index`	Controls if Data Grid includes fields in indexes.	`Index.YES` or `Index.NO`
`store`	Allows Data Grid to store fields in indexes so you can use them for projections.	`Store.YES` or `Store.NO`. Use `Store.YES` and set `sortable = true` for fields that need to be used for sorting.
`analyze`	Includes fields in full-text searches.	`Analyze.NO` or specifies an analyzer definition

Remote caches

You can provide Data Grid with indexing annotations for remote caches in two ways:

Annotate your Java classes directly with @ProtoDoc("@Indexed") and @ProtoDoc("@Field(…)").
You then generate Protobuf schema, .proto files, before uploading them to Data Grid Server.
Annotate Protobuf schema directly with @Indexed and @Field(…).
You then upload your Protobuf schema to Data Grid Server.
For example, the following schema uses the @Field annotation:
```
/**
   * @Field(analyze = Analyze.YES, store = Store.YES, sortable = true)
   */
required string street = 1;
```
```
/**
   * @Field(analyze = Analyze.YES, store = Store.YES, sortable = true)
   */
required string street = 1;
```
Copy to Clipboard Toggle word wrap
By including store = Store.YES and sortable = true in the @Field annotation, you can use the street field for sorting queries without encountering warning messages or unexpected results.

Embedded caches

For embedded caches, you add indexing annotations to your Java classes according to how Data Grid stores your entries.

Use the @Indexed and @Field annotations, along with other Hibernate Search annotations such as @FullTextField.

1.3. Rebuilding indexes
Copy link

Rebuilding an index reconstructs it from the data stored in the cache. You should rebuild indexes when you change things like the definitions of indexed types or analyzers. Likewise, you can rebuild indexes after you delete them for whatever reason.

Important

Rebuilding indexes can take a long time to complete because the process takes place for all data in the grid. While the rebuild operation is in progress, queries might also return fewer results.

Procedure

Rebuild indexes in one of the following ways:

Call the reindexCache() method to programmatically rebuild an index from a Hot Rod Java client:
```
remoteCacheManager.administration().reindexCache("MyCache");
```
```
remoteCacheManager.administration().reindexCache("MyCache");
```
Copy to Clipboard Toggle word wrap
Tip
For remote caches you can also rebuild indexes from Data Grid Console.

Call the index.run() method to rebuild indexes for embedded caches as follows:

Indexer indexer = Search.getIndexer(cache);
CompletionStage<Void> future = index.run();

Indexer indexer = Search.getIndexer(cache);
CompletionStage<Void> future = index.run();

Copy to Clipboard

Toggle word wrap

1.4. Non-indexed queries
Copy link

Data Grid recommends indexing caches for the best performance for queries. However you can query caches that are non-indexed.

For embedded caches, you can perform non-indexed queries on Plain Old Java Objects (POJOs).
For remote caches, you must use ProtoStream encoding with the application/x-protostream media type to perform non-indexed queries.

1.1. Configuring Data Grid to index caches
Copy link

Protobuf messages

Java objects

1.1.1. Index configuration
Copy link

1.1.1.1. Index storage
Copy link

1.1.1.2. Index reader
Copy link

1.1.1.3. Index writer
Copy link

1.2. Indexing annotations
Copy link

Remote caches

Embedded caches

1.3. Rebuilding indexes
Copy link

1.4. Non-indexed queries
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 1. Indexing Data Grid caches

1.1. Configuring Data Grid to index cachesCopy linkLink copied to clipboard!

Protobuf messages

Java objects

1.1.1. Index configurationCopy linkLink copied to clipboard!

1.1.1.1. Index storageCopy linkLink copied to clipboard!

1.1.1.2. Index readerCopy linkLink copied to clipboard!

1.1.1.3. Index writerCopy linkLink copied to clipboard!

1.2. Indexing annotationsCopy linkLink copied to clipboard!

Remote caches

Embedded caches

1.3. Rebuilding indexesCopy linkLink copied to clipboard!

1.4. Non-indexed queriesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Configuring Data Grid to index caches
Copy link

1.1.1. Index configuration
Copy link

1.1.1.1. Index storage
Copy link

1.1.1.2. Index reader
Copy link

1.1.1.3. Index writer
Copy link

1.2. Indexing annotations
Copy link

1.3. Rebuilding indexes
Copy link

1.4. Non-indexed queries
Copy link