Este conteúdo não está disponível no idioma selecionado.

Chapter 6. Configuring persistent storage


Data Grid uses cache stores and loaders to interact with persistent storage.

Durability
Adding cache stores allows you to persist data to non-volatile storage so it survives restarts.
Write-through caching
Configuring Data Grid as a caching layer in front of persistent storage simplifies data access for applications because Data Grid handles all interactions with the external storage.
Data overflow
Using eviction and passivation techniques ensures that Data Grid keeps only frequently used data in-memory and writes older entries to persistent storage.

6.1. Segmented cache stores

Cache stores can organize data into hash space segments to which keys map. Stores are segmented by default.

Segmented stores increase read performance for bulk operations; for example, streaming over data (Cache.size, Cache.entrySet.stream), pre-loading the cache, and doing state transfer operations.

Important

If you change the numSegments parameter in the configuration after you add a segmented cache store, Data Grid cannot read data from that cache store.

6.2. Shared cache stores

Data Grid cache stores can be local to a given node or shared across all nodes in the cluster. By default, cache stores are local (shared="false").

  • Local cache stores are unique to each node; for example, a file-based cache store that persists data to the host filesystem.

    Local cache stores should use "purge on startup" to avoid loading stale entries from persistent storage.

  • Shared cache stores allow multiple nodes to use the same persistent storage; for example, a JDBC cache store that allows multiple nodes to access the same database.

    Shared cache stores ensure that only the primary owner write to persistent storage, instead of backup nodes performing write operations for every modification.

Important

Purging deletes data, which is not typically the desired behavior for persistent storage.

Local cache store

<persistence>
  <store shared="false"
         purge="true"/>
</persistence>
Copy to Clipboard Toggle word wrap

Shared cache store

<persistence>
  <store shared="true"
         purge="false"/>
</persistence>
Copy to Clipboard Toggle word wrap

6.3. Transactions with persistent cache stores

Data Grid supports transactional operations with JDBC-based cache stores only. To configure caches as transactional, you set transactional=true to keep data in persistent storage synchronized with data in memory.

For all other cache stores, Data Grid does not enlist cache loaders in transactional operations. This can result in data inconsistency if transactions succeed in modifying data in memory but do not completely apply changes to data in the cache store. In these cases manual recovery is not possible with cache stores.

6.4. Write-through cache stores

Write-through is a cache writing mode where writes to memory and writes to cache stores are synchronous. When a client application updates a cache entry, in most cases by invoking Cache.put(), Data Grid does not return the call until it updates the cache store. This cache writing mode results in updates to the cache store concluding within the boundaries of the client thread.

The primary advantage of write-through mode is that the cache and cache store are updated simultaneously, which ensures that the cache store is always consistent with the cache.

However, write-through mode can potentially decrease performance because the need to access and update cache stores directly adds latency to cache operations.

Write-through configuration

Data Grid uses write-through mode unless you explicitly add write-behind configuration to your caches. There is no separate element or method for configuring write-through mode.

For example, the following configuration adds a file-based store to the cache that implicitly uses write-through mode:

<distributed-cache>
  <persistence passivation="false">
    <file-store>
      <index path="path/to/index" />
      <data path="path/to/data" />
    </file-store>
  </persistence>
</distributed-cache>
Copy to Clipboard Toggle word wrap

6.5. Write-behind cache stores

Write-behind is a cache writing mode where writes to memory are synchronous and writes to cache stores are asynchronous.

When clients send write requests, Data Grid adds those operations to a modification queue. Data Grid processes operations as they join the queue so that the calling thread is not blocked and the operation completes immediately.

If the number of write operations in the modification queue increases beyond the size of the queue, Data Grid adds those additional operations to the queue. However, those operations do not complete until Data Grid processes operations that are already in the queue.

For example, calling Cache.putAsync returns immediately and the Stage also completes immediately if the modification queue is not full. If the modification queue is full, or if Data Grid is currently processing a batch of write operations, then Cache.putAsync returns immediately and the Stage completes later.

Write-behind mode provides a performance advantage over write-through mode because cache operations do not need to wait for updates to the underlying cache store to complete. However, data in the cache store remains inconsistent with data in the cache until the modification queue is processed. For this reason, write-behind mode is suitable for cache stores with low latency, such as unshared and local file-based cache stores, where the time between the write to the cache and the write to the cache store is as small as possible.

Write-behind configuration

XML

<distributed-cache>
  <persistence>
    <table-jdbc-store xmlns="urn:infinispan:config:store:sql:16.0"
                      dialect="H2"
                      shared="true"
                      table-name="books">
      <connection-pool connection-url="jdbc:h2:mem:infinispan"
                       username="sa"
                       password="changeme"
                       driver="org.h2.Driver"/>
      <write-behind modification-queue-size="2048"
                    fail-silently="true"/>
    </table-jdbc-store>
  </persistence>
</distributed-cache>
Copy to Clipboard Toggle word wrap

JSON

{
  "distributed-cache": {
    "persistence" : {
      "table-jdbc-store": {
        "dialect": "H2",
        "shared": "true",
        "table-name": "books",
        "connection-pool": {
          "connection-url": "jdbc:h2:mem:infinispan",
          "driver": "org.h2.Driver",
          "username": "sa",
          "password": "changeme"
        },
        "write-behind" : {
          "modification-queue-size" : "2048",
          "fail-silently" : true
        }
      }
    }
  }
}
Copy to Clipboard Toggle word wrap

YAML

distributedCache:
  persistence:
    tableJdbcStore:
      dialect: "H2"
      shared: "true"
      tableName: "books"
      connectionPool:
        connectionUrl: "jdbc:h2:mem:infinispan"
        driver: "org.h2.Driver"
        username: "sa"
        password: "changeme"
      writeBehind:
        modificationQueueSize: "2048"
        failSilently: "true"
Copy to Clipboard Toggle word wrap

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.persistence()
       .async()
       .modificationQueueSize(2048)
       .failSilently(true);
Copy to Clipboard Toggle word wrap

Failing silently

Write-behind configuration includes a fail-silently parameter that controls what happens when either the cache store is unavailable or the modification queue is full.

  • If fail-silently="true" then Data Grid logs WARN messages and rejects write operations.
  • If fail-silently="false" then Data Grid throws exceptions if it detects the cache store is unavailable during a write operation. Likewise if the modification queue becomes full, Data Grid throws an exception.

    In some cases, data loss can occur if Data Grid restarts and write operations exist in the modification queue. For example the cache store goes offline but, during the time it takes to detect that the cache store is unavailable, write operations are added to the modification queue because it is not full. If Data Grid restarts or otherwise becomes unavailable before the cache store comes back online, then the write operations in the modification queue are lost because they were not persisted.

6.6. Passivation

Passivation configures Data Grid to write entries to cache stores when it evicts those entries from memory. In this way, passivation prevents unnecessary and potentially expensive writes to persistent storage.

Activation is the process of restoring entries to memory from the cache store when there is an attempt to access passivated entries. For this reason, when you enable passivation, you must configure a cache store that implements the NonBlockingStore interface. The store’s characteristics() method must also indicate that it supports both read and write operations.

When Data Grid evicts an entry from the cache, it notifies cache listeners that the entry is passivated then stores the entry in the cache store. When Data Grid gets an access request for an evicted entry, it lazily loads the entry from the cache store into memory and then notifies cache listeners that the entry is activated while keeping the value still in the store.

Note
  • Passivation uses the first cache loader in the Data Grid configuration and ignores all others.
  • Passivation is not supported with:

    • Transactional stores. Passivation writes and removes entries from the store outside the scope of the actual Data Grid commit boundaries.
    • Shared stores. Shared cache stores require entries to always exist in the store for other owners. For this reason, passivation is not supported because entries cannot be removed.

If you enable passivation with transactional stores or shared stores, Data Grid throws an exception.

6.6.1. How passivation works

Passivation disabled

Writes to data in memory result in writes to persistent storage.

If Data Grid evicts data from memory, then data in persistent storage includes entries that are evicted from memory. In this way persistent storage is a superset of the in-memory cache. This is recommended when you require highest consistency as the store will be able to be read again after a crash.

If you do not configure eviction, then data in persistent storage provides a copy of data in memory.

Passivation enabled

Data Grid adds data to persistent storage only when it evicts data from memory, an entry is removed or upon shutting down the node.

When Data Grid activates entries, it restores data in memory but keeps the data in the store still. This allows for writes to be just as fast as without a store, and still maintains consistency. When an entry is created or updated only the in memory will be updated and thus the store will be outdated for the time being.

Note

Passivation is not supported when a store is also configured as shared. This is due to entries can become out of sync between nodes depending on when a write is evicted versus read.

To gurantee data consistency any store that is not shared should always have purgeOnStartup enabled. This is true for both passivation enabled or disabled since a store could hold an outdated entry while down and resurrect it at a later point.

The following table shows data in memory and in persistent storage after a series of operations:

Expand
OperationPassivation disabledPassivation enabled

Insert k1.

Memory: k1
Disk: k1

Memory: k1
Disk: -

Insert k2.

Memory: k1, k2
Disk: k1, k2

Memory: k1, k2
Disk: -

Eviction thread runs and evicts k1.

Memory: k2
Disk: k1, k2

Memory: k2
Disk: k1

Read k1.

Memory: k1, k2
Disk: k1, k2

Memory: k1, k2
Disk: k1

Eviction thread runs and evicts k2.

Memory: k1
Disk: k1, k2

Memory: k1
Disk: k1, k2

Remove k2.

Memory: k1
Disk: k1

Memory: k1
Disk: k1

6.7. Global persistent location

Data Grid preserves global state so that it can restore cluster topology and cached data after restart.

Data Grid uses file locking to prevent concurrent access to the global persistent location. The lock is acquired on startup and released on a node shutdown. The presence of a dangling lock file indicates that the node was not shutdown cleanly, either because of a crash or external termination. In the default configuration, Data Grid will refuse to start up to avoid data corruption with the following message:

ISPN000693: Dangling lock file '%s' in persistent global state, probably left behind by an unclean shutdown
Copy to Clipboard Toggle word wrap

The behavior can be changed by configuring the global state unclean-shutdown-action setting to one of the following:

  • FAIL: Prevents startup of the cache manager if a dangling lock file is found in the persistent global state. This is the default behavior.
  • PURGE: Clears the persistent global state if a dangling lock file is found in the persistent global state.
  • IGNORE: Ignores the presence of a dangling lock file in the persistent global state.

Remote caches

Data Grid Server saves cluster state to the $RHDG_HOME/server/data directory.

Important

You should never delete or modify the server/data directory or its content. Data Grid restores cluster state from this directory when you restart your server instances.

Changing the default configuration or directly modifying the server/data directory can cause unexpected behavior and lead to data loss.

Embedded caches

Data Grid defaults to the user.dir system property as the global persistent location. In most cases this is the directory where your application starts.

For clustered embedded caches, such as replicated or distributed, you should always enable and configure a global persistent location to restore cluster topology.

When using a file-based cache store, you should always configure a global persistent location. You should never configure an absolute path for a file-based cache store that is outside this location. For more details, see Configuring the Global Persistent Location. If you do, Data Grid writes the following exception to logs:

ISPN000558: "The store location 'foo' is not a child of the global persistent location 'bar'"
Copy to Clipboard Toggle word wrap

6.7.1. Configuring the global persistent location

Enable and configure the location where Data Grid stores global state for clustered embedded caches.

Note

Data Grid Server enables global persistence and configures a default location. You should not disable global persistence or change the default configuration for remote caches.

Prerequisites

  • Add Data Grid to your project.

Procedure

  1. Enable global state in one of the following ways:

    • Add the global-state element to your Data Grid configuration.
    • Call the globalState().enable() methods in the GlobalConfigurationBuilder API.
  2. Define whether the global persistent location is unique to each node or shared between the cluster.

    Expand
    Location typeConfiguration

    Unique to each node

    persistent-location element or persistentLocation() method

    Shared between the cluster

    shared-persistent-location element or sharedPersistentLocation(String) method

  3. Set the path where Data Grid stores cluster state.

    For example, file-based cache stores the path is a directory on the host filesystem.

    Values can be:

    • Absolute and contain the full location including the root.
    • Relative to a root location.
  4. If you specify a relative value for the path, you must also specify a system property that resolves to a root location.

    For example, on a Linux host system you set global/state as the path. You also set the my.data property that resolves to the /opt/data root location. In this case Data Grid uses /opt/data/global/state as the global persistent location.

Global persistent location configuration

XML

<infinispan>
  <cache-container>
    <global-state>
      <persistent-location path="global/state" relative-to="my.data"/>
    </global-state>
  </cache-container>
</infinispan>
Copy to Clipboard Toggle word wrap

JSON

{
  "infinispan" : {
    "cache-container" : {
      "global-state": {
        "persistent-location" : {
          "path" : "global/state",
          "relative-to" : "my.data"
        }
      }
    }
  }
}
Copy to Clipboard Toggle word wrap

YAML

cacheContainer:
  globalState:
      persistentLocation:
        path: "global/state"
        relativeTo : "my.data"
Copy to Clipboard Toggle word wrap

GlobalConfigurationBuilder

new GlobalConfigurationBuilder().globalState()
                                .enable()
                                .persistentLocation("global/state", "my.data");
Copy to Clipboard Toggle word wrap

6.8. File-based cache stores

File-based cache stores provide persistent storage on the local host filesystem where Data Grid is running. For clustered caches, file-based cache stores are unique to each Data Grid node.

Warning

Never use filesystem-based cache stores on shared file systems, such as an NFS or Samba share, because they do not provide file locking capabilities and data corruption can occur.

Additionally if you attempt to use transactional caches with shared file systems, unrecoverable failures can happen when writing to files during the commit phase.

Soft-Index File Stores

SoftIndexFileStore is the default implementation for file-based cache stores and stores data in a set of append-only files.

When append-only files:

  • Reach their maximum size, Data Grid creates a new file and starts writing to it.
  • Reach the compaction threshold of less than 50% usage, Data Grid overwrites the entries to a new file and then deletes the old file.
Note

Using SoftIndexFileStore in a clustered cache should enable purge on startup to ensure stale entries are not resurrected.

B+ trees

To improve performance, append-only files in a SoftIndexFileStore are indexed using a B+ Tree that can be stored both on disk and in memory. The in-memory index uses Java soft references to ensure it can be rebuilt if removed by Garbage Collection (GC) then requested again.

Because SoftIndexFileStore uses Java soft references to keep indexes in memory, it helps prevent out-of-memory exceptions. GC removes indexes before they consume too much memory while still falling back to disk.

SoftIndexFileStore creates a B+ tree per configured cache segment. This provides an additional "index" as it only has so many elements and provides additional parallelism for index updates.

Each entry in the B+ tree is a node. By default, the size of each node is limited to 4096 bytes. SoftIndexFileStore throws an exception if keys are longer after serialization occurs.

File limits

SoftIndexFileStore will use two plus the configured openFilesLimit amount of files at a given time. The two additional file pointers are reserved for the log appender for newly updated data and another for the compactor which writes compacted entries into a new file.

The amount of open allocated files allocated for indexing is one tenth of the total number of the configured openFilesLimit. This number has a minimum of 1 or the number of cache segments. Any number remaning from configured limit is allocated for open data files themselves.

Segmentation

Soft-index file stores by default are segmented. The append log(s) are not directly segmented and segmentation is handled directly by the index. It is possible to disable segmentation, which will effectively change the store to store a single index for all contents of the cache as if the number of cache segments was set to 1.

Note that this store can be segmented even when not running in a clustered cache such as distributed. In that case the clustering segments configuration value is still used to configure how many segments are to be used.

Expiration

The SoftIndexFileStore has full support for expired entries and their requirements.

6.8.1. Configuring file-based cache stores

Add file-based cache stores to Data Grid to persist data on the host filesystem.

Prerequisites

  • Enable global state and configure a global persistent location if you are configuring embedded caches.

Procedure

  1. Add the persistence element to your cache configuration.
  2. Optionally specify true as the value for the passivation attribute to write to the file-based cache store only when data is evicted from memory.
  3. Include the file-store element and configure attributes as appropriate.
  4. Specify false as the value for the shared attribute.

    File-based cache stores should always be unique to each Data Grid instance. If you want to use the same persistent across a cluster, configure shared storage such as a JDBC string-based cache store .

  5. Configure the index and data elements to specify the location where Data Grid creates indexes and stores data.
  6. Include the write-behind element if you want to configure the cache store with write-behind mode.
File-based cache store configuration

XML

<distributed-cache>
  <persistence passivation="true">
     <file-store shared="false">
        <data path="data"/>
        <index path="index"/>
        <write-behind modification-queue-size="2048" />
     </file-store>
  </persistence>
</distributed-cache>
Copy to Clipboard Toggle word wrap

JSON

{
  "distributed-cache": {
    "persistence": {
      "passivation": true,
      "file-store" : {
        "shared": false,
        "data": {
          "path": "data"
        },
        "index": {
          "path": "index"
        },
        "write-behind": {
          "modification-queue-size": "2048"
        }
      }
    }
  }
}
Copy to Clipboard Toggle word wrap

YAML

distributedCache:
  persistence:
    passivation: "true"
    fileStore:
      shared: "false"
      data:
        path: "data"
      index:
        path: "index"
      writeBehind:
        modificationQueueSize: "2048"
Copy to Clipboard Toggle word wrap

ConfigurationBuilder

ConfigurationBuilder builder = new ConfigurationBuilder();
builder.persistence().passivation(true)
       .addSoftIndexFileStore()
          .shared(false)
          .dataLocation("data")
          .indexLocation("index")
          .modificationQueueSize(2048);
Copy to Clipboard Toggle word wrap

ò :leveloffset: +1

Red Hat logoGithubredditYoutubeTwitter

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Ajudamos os usuários da Red Hat a inovar e atingir seus objetivos com nossos produtos e serviços com conteúdo em que podem confiar. Explore nossas atualizações recentes.

Tornando o open source mais inclusivo

A Red Hat está comprometida em substituir a linguagem problemática em nosso código, documentação e propriedades da web. Para mais detalhes veja o Blog da Red Hat.

Sobre a Red Hat

Fornecemos soluções robustas que facilitam o trabalho das empresas em plataformas e ambientes, desde o data center principal até a borda da rede.

Theme

© 2026 Red Hat
Voltar ao topo