第 4 章查询缓存中的值

借助 Data Grid，您可以执行查询来高效和快速查找数据集中的值，包括嵌入式数据网格集群或远程网格服务器集群。

注意

您可以索引和查询作为 Plain Old Java 对象(POJO)存储的缓存值，或仅编码为协议缓冲的对象。

4.1. 配置 Data Grid to Index Caches

在缓存中创建值索引，以提高查询性能并使用全文本搜索功能。

注意

Data Grid 使用 Apache Lucene 技术在缓存中索引值。

流程

在缓存配置中启用索引。
```
<distributed-cache name="my-cache">
  <indexing>
    
  </indexing>
</distributed-cache>
```
添加 & lt;indexing& gt; 元素会自动启用索引。您不需要在配置 schema 中包括 enabled 属性，即使 enabled 属性的默认值是 false。

指定您要索引的每个实体作为 indexed-entity 元素的值。

<distributed-cache name="my-cache">
  <indexing>
    <indexed-entities>
      <indexed-entity>...</indexed-entity>
    </indexed-entities>
  </indexing>
</distributed-cache>

普通 Old Java 对象

对于存储 POJO 的缓存，您可以指定以 @Indexed 注解的完全限定类名称，例如：

<indexed-entities>
  <indexed-entity>org.infinispan.sample.Car</indexed-entity>
  <indexed-entity>org.infinispan.sample.Truck</indexed-entity>
</indexed-entities>

protobuf

对于存储 Protobuf 编码条目的缓存，您可以在 Protobuf 模式中指定声明的消息。

例如，您可以使用以下 Protobuf 模式：

package book_sample;

/* @Indexed */
message Book {

    /* @Field(store = Store.YES, analyze = Analyze.YES */
    optional string title = 1;

    /* @Field(store = Store.YES, analyze = Analyze.YES */
    optional string description = 2;
    optional int32 publicationYear = 3; // no native Date type available in Protobuf

    repeated Author authors = 4;
}

message Author {
    optional string name = 1;
    optional string surname = 2;
}

然后，您应该为 indexed-entity 元素指定以下值：

<indexed-entities>
  <indexed-entity>book_sample.Book</indexed-entity>
</indexed-entities>

4.1.1. 以编程方式启用缓存索引

通过 Data Grid API 以编程方式为缓存配置索引。

流程

当使用 Data Grid 作为嵌入式库时，使用 IndexingConfigurationBuilder 类为缓存启用和配置索引，如下例所示：

import org.infinispan.configuration.cache.*;

ConfigurationBuilder config=new ConfigurationBuilder();
config.indexing().enable().storage(FILESYSTEM).path("/some/folder").addIndexedEntity(Book.class);

参考

org.infinispan.configuration.cache.IndexingConfigurationBuilder

4.1.2. 索引注解

当您在 Data Grid 缓存中启用索引时，您可以使用以下注解：

@ indexed 表示您要索引的 Java 对象。
@field 控制对象中的字段如何索引。

对于 Data Grid 作为嵌入式库，您可以添加这些注解您的 Java 类。

对于 Data Grid Server，您可以定义包含这些注解的 Protobuf 模式 .proto 文件。

4.1.3. 索引配置

Data Grid 配置控制索引的存储和构建方式。

4.1.3.1. 索引存储

您可以配置 Data Grid 存储索引的方式：

在主机文件系统中，这是默认的，并在重启之间保留索引。
在 JVM 堆内存中，这意味着索引重启后不会保留。
您应该只针对小数据集将索引存储在 JVM 堆内存中。

File system

<distributed-cache name="my-cache">
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Indexing configuration goes here. -->
  </indexing>
</distributed-cache>

JVM 堆内存

<distributed-cache name="my-cache">
  <indexing storage="local-heap">
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

4.1.3.2. index Reader

索引读取器是一个内部组件，提供对索引的访问来执行查询。随着索引内容的变化，Data Grid 需要刷新读取器，以便搜索结果最新。您可以为索引读取器配置刷新间隔。默认情况下，Data Grid 会在索引自上次刷新后更改索引前读取索引。

<distributed-cache name="my-cache">
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <!-- Sets an interval of one second for the index reader. -->
    <index-reader refresh-interval="1000"/>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

4.1.3.3. index Writer

索引写器是一个内部组件，它构造由一个或多个片段(sub-indexes)组成的索引，可随着时间的推移合并以提高性能。较少的片段通常意味着查询期间的开销较少，因为索引读取器操作需要考虑所有片段。

Data Grid 在内部使用 Apache Lucene，并在两个层（内存和存储）中索引条目。新条目会首先进入内存索引，然后在出现 flush 时，到配置的索引存储。发生定期提交操作，从之前清空的数据中创建片段，并将所有索引更改永久更改。

注意

index-writer 配置是可选的。默认值应该只适用于大多数情况，自定义配置应该只用于调优性能。

<distributed-cache name="my-cache">
  <indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
    <index-writer commit-interval="2000"
                  low-level-trace="false"
                  max-buffered-entries="32"
                  queue-count="1"
                  queue-size="10000"
                  ram-buffer-size="400"
                  thread-pool-size="2">
      <index-merge calibrate-by-deletes="true"
                   factor="3"
                   max-entries="2000"
                   min-size="10"
                   max-size="20"/>
    </index-writer>
    <!-- Additional indexing configuration goes here. -->
  </indexing>
</distributed-cache>

表 4.1. 索引写器配置属性
属性	描述
`commit-interval`	以毫秒为单位，索引更改会清除到索引存储，并且执行提交的时间（以毫秒为单位）。因为操作非常昂贵，因此应该避免小的值。默认值为 1000 ms (1 秒)。
`max-buffered-entries`	在将缓冲的内存中刷新到索引存储前可以缓冲的最大条目数。较大的值会导致索引速度更快，但使用更多内存。当与 `ram-buffer-size` 属性结合使用时，会首先发生对事件进行刷新。
`ram-buffer-size`	在清除索引存储前，可用于缓冲添加的条目和删除的最大内存量。较大的值会导致索引速度更快，但使用更多内存。为了加快索引性能，您应该设置此属性而不是 `max-buffered-entries`。当与 `max-buffered-entries` 属性结合使用时，会在首先发生事件时清空。
`thread-pool-size`	对索引执行写入操作的线程数量。
`queue-count`	用于每个索引类型的内部队列数量。每个队列包含应用于索引和队列的修改批量处理。增加队列数量将导致增加索引吞吐量，但只有在瓶颈是 CPU 时。对于 optimum 结果，请不要为 `queue-count` 设置值，它大于 `thread-pool-size` 的值。
`queue-size`	每个队列可以容纳的最大元素数。增加 `queue-size` 值会增加索引操作期间使用的内存量。设置太小的值可阻止索引操作。
`low-level-trace`	为索引操作启用低级追踪信息。启用此属性可显著提高性能。您应该只使用此低级追踪作为故障排除的最后资源。

要配置数据网格如何合并索引片段，您可以使用 index-merge 子元素。

表 4.2. 索引合并配置属性
属性	描述
`max-entries`	索引片段在合并前可以具有的最大条目数。有超过这个条目数量的片段不会被合并。较小的值在频繁更改索引时执行更好，如果索引没有经常更改，较大的值可以提供更好的搜索性能。
`factor`	一次合并的片段数量。使用较小的值时，合并的频率会更频繁地使用更多资源，但片段总数平均会降低，从而提高了搜索性能。较大的值（超过 10）最适合编写大量情况。
`min-size`	后台合并的最小目标大小（以 MB 为单位）。小于这个大小的片段会更积极地合并。设置太大的值可能会导致昂贵的合并操作，即使它们比较频繁。
`max-size`	后台合并的最大片段大小（以 MB 为单位）。大于这个大小的片段永远不会在后台合并。把它设置为较低值有助于降低内存要求，并避免在最佳搜索速度下一些合并操作。当强制合并索引和 `max-forced-size` 应用时，会忽略此属性。
`max-forced-size`	强制合并并覆盖 `max-size` 属性的最大片段大小（以 MB 为单位）。把它设置为与 `max-size` 或 lower 的值相同。但是，设置值太低的降级搜索性能，因为文档已被删除。
`calibrate-by-deletes`	在计算网段中的条目时，是否应该考虑索引中已删除条目的数量。设置 `false` 将导致被 `max-entries` 导致的更频繁的合并，但会更积极地将片段与删除的文档合并，从而提高查询性能。

参考

有关索引元素和属性的更多信息，请参阅 Data Grid Configuration Schema。

4.1.4. 重建索引

从缓存中存储的数据重建索引。当您更改索引类型或分析器定义等内容时，您可以重建索引。同样，如果出于某种原因删除了索引，您可能还需要重建索引。

重要

重建索引可能需要很长时间才能完成，因为这个过程需要完成网格中的所有数据。在重建操作进行时，查询可能会返回较少的结果。

在重建索引时，Data Grid 会记录以下警告信息：

WARN: Rebuilding indexes also affect queries, that can return less results than expected.

流程

通过调用 reindexCache （） 方法在远程 Data Grid 服务器上重建索引，如下所示：
```
remoteCacheManager.administration().reindexCache("MyCache");
```

在将 Data Grid 用作嵌入式库时重建索引，如下所示：

Indexer indexer = Search.getIndexer(cache);
CompletionStage<Void> future = index.run();

第 4 章查询缓存中的值

4.1. 配置 Data Grid to Index Caches

4.1.1. 以编程方式启用缓存索引

4.1.2. 索引注解

4.1.3. 索引配置

4.1.3.1. 索引存储

4.1.3.2. index Reader

4.1.3.3. index Writer

4.1.4. 重建索引

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links

第 4 章 查询缓存中的值

4.1. 配置 Data Grid to Index Caches

4.1.1. 以编程方式启用缓存索引

4.1.2. 索引注解

4.1.3. 索引配置

4.1.3.1. 索引存储

4.1.3.2. index Reader

4.1.3.3. index Writer

4.1.4. 重建索引

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links

第 4 章查询缓存中的值