이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 1. Indexing Data Grid caches
Data Grid can create indexes of values in your caches to improve query performance, providing faster results than non-indexed queries. Indexing also lets you use full-text search capabilities in your queries.
Data Grid uses Apache Lucene technology to index values in caches.
1.1. Configuring Data Grid to index caches 링크 복사링크가 클립보드에 복사되었습니다!
Enable indexing in your cache configuration and specify which entities Data Grid should include when creating indexes.
You should always configure Data Grid to index caches when using queries. Indexing provides a significant performance boost to your queries, allowing you to get faster insights into your data.
Procedure
Enable indexing in your cache configuration.
<distributed-cache> <indexing> <!-- Indexing configuration goes here. --> </indexing> </distributed-cache>TipAdding an
indexingelement to your configuration enables indexing without the need to include theenabled=trueattribute.For remote caches adding this element also implicitly configures encoding as ProtoStream.
Specify the entities to index with the
indexed-entityelement.<distributed-cache> <indexing> <indexed-entities> <indexed-entity>...</indexed-entity> </indexed-entities> </indexing> </distributed-cache>
Protobuf messages
Specify the message declared in the schema as the value of the
indexed-entityelement, for example:<distributed-cache> <indexing> <indexed-entities> <indexed-entity>org.infinispan.sample.Car</indexed-entity> <indexed-entity>org.infinispan.sample.Truck</indexed-entity> </indexed-entities> </indexing> </distributed-cache>This configuration indexes the
Bookmessage in a schema with thebook_samplepackage name.package book_sample; /* @Indexed */ message Book { /* @Text(projectable = true) */ optional string title = 1; /* @Text(projectable = true) */ optional string description = 2; // no native Date type available in Protobuf optional int32 publicationYear = 3; repeated Author authors = 4; } message Author { optional string name = 1; optional string surname = 2; }
Java objects
-
Specify the fully qualified name (FQN) of each class that includes the
@Indexedannotation.
XML
<distributed-cache>
<indexing>
<indexed-entities>
<indexed-entity>book_sample.Book</indexed-entity>
</indexed-entities>
</indexing>
</distributed-cache>
ConfigurationBuilder
import org.infinispan.configuration.cache.*;
ConfigurationBuilder config=new ConfigurationBuilder();
config.indexing().enable().storage(FILESYSTEM).path("/some/folder").addIndexedEntity(Book.class);
1.1.1. Index configuration 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid configuration controls how indexes are stored and constructed.
Index storage
You can configure how Data Grid stores indexes:
- On the host file system, which is the default and persists indexes between restarts.
-
In JVM heap memory, which means that indexes do not survive restarts.
You should store indexes in JVM heap memory only for small datasets.
File system
<distributed-cache>
<indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
<!-- Indexing configuration goes here. -->
</indexing>
</distributed-cache>
JVM heap memory
<distributed-cache>
<indexing storage="local-heap">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Index path
Specifies a filesystem path for the index when storage is 'filesystem'. The value can be a relative or absolute path. Relative paths are created relative to the configured global persistent location, or to the current working directory when global state is disabled.
By default, the cache name is used as a relative path for index path.
When setting a custom value, ensure that there are no conflicts between caches using the same indexed entities.
Index startup mode
When Data Grid starts caches it can perform operations to ensure the index is consistent with data in the cache. By default, it:
Checks the existing index file format.
- If it is incompatible or corrupt, it is deleted and the cache is automatically reindexed.
Automatically clear (purge) or reindex the cache.
- If data is volatile and the index is persistent then Data Grid performs the clear (purge) the indexes when it starts.
- If data is persistent and the index is volatile then Data Grid reindex the cache when it starts.
The purge operation is performed synchronously, since it is usually very fast. So by the time the cache finishes to start, the operation will be completed. The cache becomes available only when the purge completes.
The reindex operation is performed asynchronously, since it might take a longer time to complete, depending on the size of the cache. If an indexed query is performed during the reindex the result could be partial. It is always possible to check if a reindex is ongoing accessing to the query statistics.
But you can manually configure it to:
- Purge the index when the cache starts.
- Reindex the cache when it starts.
- No indexing operation takes place when a cache starts
In the case of a manual configuration can lead to possible inconsistencies, a log message will be presented when the cache starts.
Clear the index when the cache starts
<distributed-cache>
<indexing storage="filesystem" startup-mode="purge">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Rebuild the index when the cachin this case
a warning message will be logged when the cache is startede starts
<distributed-cache>
<indexing storage="local-heap" startup-mode="reindex">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
1.2. Data Grid native indexing annotations 링크 복사링크가 클립보드에 복사되었습니다!
When you enable indexing in caches, you configure Data Grid to create indexes. You also need to provide Data Grid with a structured representation of the entities in your caches so it can actually index them.
1.2.1. Overview of the Data Grid indexing annotations 링크 복사링크가 클립보드에 복사되었습니다!
- @Indexed
- Indicates entities, or Protobuf message types, that Data Grid indexes.
To indicate the fields that Data Grid indexes use the indexing annotations. You can use these annotations the same way for both embedded and remote queries.
- @Basic
-
Supports any type of field. Use the
@Basicannotation for numbers and short strings that don’t require any transformation or processing. - @Decimal
- Use this annotation for fields that represent decimal values.
- @Keyword
- Use this annotation for fields that are strings and intended for exact matching. Keyword fields are not analyzed or tokenized during indexing.
- @Text
- Use this annotation for fields that contain textual data and are intended for full-text search capabilities. You can use the analyzer to process the text and to generate individual tokens.
- @Embedded
-
Use this annotation to mark a field as an embedded object within the parent entity. The
NESTEDstructure preserves the original object relationship structure while theFLATTENEDstructure makes the leaf fields multivalued of the parent entity. The default structure used by@EmbeddedisNESTED.
NESTED embedded can be used in nested objects joins.
Each of the annotations supports a set of attributes that you can use to further describe how the entity is indexed.
| Annotation | Supported attributes |
|---|---|
| @Basic | searchable, sortable, projectable, aggregable, indexNullAs |
| @Decimal | searchable, sortable, projectable, aggregable, indexNullAs, decimalScale |
| @Keyword | searchable, sortable, projectable, aggregable, indexNullAs, normalizer, norms |
| @Text | searchable, projectable, norms, analyzer, searchAnalyzer |
Using Data Grid annotations
You can provide Data Grid with indexing annotations in two ways:
-
Annotate your Java classes or fields directly using the Data Grid annotations.
You then generate or update your Protobuf schema,.protofiles, before uploading them to Data Grid Server. Annotate Protobuf schema directly with
@Indexedand@Basic,@Keywordor@Text.
You then upload your Protobuf schema to Data Grid Server.For example, the following schema uses the
@Textannotation:/** * @Text(projectable = true) */ required string street = 1;
1.3. Rebuilding indexes 링크 복사링크가 클립보드에 복사되었습니다!
Rebuilding an index reconstructs it from the data stored in the cache. You should rebuild indexes when you change things like the definitions of indexed types or analyzers. Likewise, you can rebuild indexes after you delete them for whatever reason.
Rebuilding indexes can take a long time to complete because the process takes place for all data in the grid. While the rebuild operation is in progress, queries might also return fewer results.
Procedure
Rebuild indexes in one of the following ways:
Call the
reindexCache()method to programmatically rebuild an index from a Hot Rod Java client:remoteCacheManager.administration().reindexCache("MyCache");TipFor remote caches you can also rebuild indexes from Data Grid Console.
Call the
index.run()method to rebuild indexes for embedded caches as follows:Indexer indexer = Search.getIndexer(cache); CompletionStage<Void> future = index.run();-
Check the status of reindexing operation with the
reindexingattribute of the index statistics.
-
Check the status of reindexing operation with the
1.4. Updating index schema 링크 복사링크가 클립보드에 복사되었습니다!
The update index schema operation lets you add schema changes with a minimal downtime. Instead of removing previously indexed data and recreating the index schema, Data Grid adds new fields to the existing schema. Updating index schema is much faster than rebuilding the index but you can update schema only when your changes do not affect fields that were already indexed.
You can update index schema only when your changes does not affect previously indexed fields. When you change index field definitions or when you delete fields, you must rebuild the index.
Procedure
Update index schema for a given cache:
Call the
updateIndexSchema()method to programmatically update the index schema from a Hot Rod Java client:remoteCacheManager.administration().updateIndexSchema("MyCache");TipFor remote caches, you can update index schema from the Data Grid Console or using the REST API.
Additional resources
1.5. Non-indexed queries 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid recommends indexing caches for the best performance for queries. However you can query caches that are non-indexed.
- For embedded caches, you can perform non-indexed queries on Plain Old Java Objects (POJOs).
-
For remote caches, you must use ProtoStream encoding with the
application/x-protostreammedia type to perform non-indexed queries.