Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 6. Configuring persistent storage
Data Grid uses cache stores and loaders to interact with persistent storage.
- Durability
- Adding cache stores allows you to persist data to non-volatile storage so it survives restarts.
- Write-through caching
- Configuring Data Grid as a caching layer in front of persistent storage simplifies data access for applications because Data Grid handles all interactions with the external storage.
- Data overflow
- Using eviction and passivation techniques ensures that Data Grid keeps only frequently used data in-memory and writes older entries to persistent storage.
6.1. Segmented cache stores Copier lienLien copié sur presse-papiers!
Cache stores can organize data into hash space segments to which keys map. Stores are segmented by default.
Segmented stores increase read performance for bulk operations; for example, streaming over data (Cache.size, Cache.entrySet.stream), pre-loading the cache, and doing state transfer operations.
If you change the numSegments parameter in the configuration after you add a segmented cache store, Data Grid cannot read data from that cache store.
6.3. Transactions with persistent cache stores Copier lienLien copié sur presse-papiers!
Data Grid supports transactional operations with JDBC-based cache stores only. To configure caches as transactional, you set transactional=true to keep data in persistent storage synchronized with data in memory.
For all other cache stores, Data Grid does not enlist cache loaders in transactional operations. This can result in data inconsistency if transactions succeed in modifying data in memory but do not completely apply changes to data in the cache store. In these cases manual recovery is not possible with cache stores.
6.4. Write-through cache stores Copier lienLien copié sur presse-papiers!
Write-through is a cache writing mode where writes to memory and writes to cache stores are synchronous. When a client application updates a cache entry, in most cases by invoking Cache.put(), Data Grid does not return the call until it updates the cache store. This cache writing mode results in updates to the cache store concluding within the boundaries of the client thread.
The primary advantage of write-through mode is that the cache and cache store are updated simultaneously, which ensures that the cache store is always consistent with the cache.
However, write-through mode can potentially decrease performance because the need to access and update cache stores directly adds latency to cache operations.
Write-through configuration
Data Grid uses write-through mode unless you explicitly add write-behind configuration to your caches. There is no separate element or method for configuring write-through mode.
For example, the following configuration adds a file-based store to the cache that implicitly uses write-through mode:
6.5. Write-behind cache stores Copier lienLien copié sur presse-papiers!
Write-behind is a cache writing mode where writes to memory are synchronous and writes to cache stores are asynchronous.
When clients send write requests, Data Grid adds those operations to a modification queue. Data Grid processes operations as they join the queue so that the calling thread is not blocked and the operation completes immediately.
If the number of write operations in the modification queue increases beyond the size of the queue, Data Grid adds those additional operations to the queue. However, those operations do not complete until Data Grid processes operations that are already in the queue.
For example, calling Cache.putAsync returns immediately and the Stage also completes immediately if the modification queue is not full. If the modification queue is full, or if Data Grid is currently processing a batch of write operations, then Cache.putAsync returns immediately and the Stage completes later.
Write-behind mode provides a performance advantage over write-through mode because cache operations do not need to wait for updates to the underlying cache store to complete. However, data in the cache store remains inconsistent with data in the cache until the modification queue is processed. For this reason, write-behind mode is suitable for cache stores with low latency, such as unshared and local file-based cache stores, where the time between the write to the cache and the write to the cache store is as small as possible.
Write-behind configuration
XML
JSON
YAML
ConfigurationBuilder
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.persistence()
.async()
.modificationQueueSize(2048)
.failSilently(true);
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.persistence()
.async()
.modificationQueueSize(2048)
.failSilently(true);
Failing silently
Write-behind configuration includes a fail-silently parameter that controls what happens when either the cache store is unavailable or the modification queue is full.
-
If
fail-silently="true"then Data Grid logs WARN messages and rejects write operations. If
fail-silently="false"then Data Grid throws exceptions if it detects the cache store is unavailable during a write operation. Likewise if the modification queue becomes full, Data Grid throws an exception.In some cases, data loss can occur if Data Grid restarts and write operations exist in the modification queue. For example the cache store goes offline but, during the time it takes to detect that the cache store is unavailable, write operations are added to the modification queue because it is not full. If Data Grid restarts or otherwise becomes unavailable before the cache store comes back online, then the write operations in the modification queue are lost because they were not persisted.
6.6. Passivation Copier lienLien copié sur presse-papiers!
Passivation configures Data Grid to write entries to cache stores when it evicts those entries from memory. In this way, passivation prevents unnecessary and potentially expensive writes to persistent storage.
Activation is the process of restoring entries to memory from the cache store when there is an attempt to access passivated entries. For this reason, when you enable passivation, you must configure a cache store that implements the NonBlockingStore interface. The store’s characteristics() method must also indicate that it supports both read and write operations.
When Data Grid evicts an entry from the cache, it notifies cache listeners that the entry is passivated then stores the entry in the cache store. When Data Grid gets an access request for an evicted entry, it lazily loads the entry from the cache store into memory and then notifies cache listeners that the entry is activated while keeping the value still in the store.
- Passivation uses the first cache loader in the Data Grid configuration and ignores all others.
Passivation is not supported with:
- Transactional stores. Passivation writes and removes entries from the store outside the scope of the actual Data Grid commit boundaries.
- Shared stores. Shared cache stores require entries to always exist in the store for other owners. For this reason, passivation is not supported because entries cannot be removed.
If you enable passivation with transactional stores or shared stores, Data Grid throws an exception.
6.6.1. How passivation works Copier lienLien copié sur presse-papiers!
Passivation disabled
Writes to data in memory result in writes to persistent storage.
If Data Grid evicts data from memory, then data in persistent storage includes entries that are evicted from memory. In this way persistent storage is a superset of the in-memory cache. This is recommended when you require highest consistency as the store will be able to be read again after a crash.
If you do not configure eviction, then data in persistent storage provides a copy of data in memory.
Passivation enabled
Data Grid adds data to persistent storage only when it evicts data from memory, an entry is removed or upon shutting down the node.
When Data Grid activates entries, it restores data in memory but keeps the data in the store still. This allows for writes to be just as fast as without a store, and still maintains consistency. When an entry is created or updated only the in memory will be updated and thus the store will be outdated for the time being.
Passivation is not supported when a store is also configured as shared. This is due to entries can become out of sync between nodes depending on when a write is evicted versus read.
To gurantee data consistency any store that is not shared should always have purgeOnStartup enabled. This is true for both passivation enabled or disabled since a store could hold an outdated entry while down and resurrect it at a later point.
The following table shows data in memory and in persistent storage after a series of operations:
| Operation | Passivation disabled | Passivation enabled |
|---|---|---|
| Insert k1. |
Memory: k1 |
Memory: k1 |
| Insert k2. |
Memory: k1, k2 |
Memory: k1, k2 |
| Eviction thread runs and evicts k1. |
Memory: k2 |
Memory: k2 |
| Read k1. |
Memory: k1, k2 |
Memory: k1, k2 |
| Eviction thread runs and evicts k2. |
Memory: k1 |
Memory: k1 |
| Remove k2. |
Memory: k1 |
Memory: k1 |
6.7. Global persistent location Copier lienLien copié sur presse-papiers!
Data Grid preserves global state so that it can restore cluster topology and cached data after restart.
Data Grid uses file locking to prevent concurrent access to the global persistent location. The lock is acquired on startup and released on a node shutdown. The presence of a dangling lock file indicates that the node was not shutdown cleanly, either because of a crash or external termination. In the default configuration, Data Grid will refuse to start up to avoid data corruption with the following message:
ISPN000693: Dangling lock file '%s' in persistent global state, probably left behind by an unclean shutdown
ISPN000693: Dangling lock file '%s' in persistent global state, probably left behind by an unclean shutdown
The behavior can be changed by configuring the global state unclean-shutdown-action setting to one of the following:
-
FAIL: Prevents startup of the cache manager if a dangling lock file is found in the persistent global state. This is the default behavior. -
PURGE: Clears the persistent global state if a dangling lock file is found in the persistent global state. -
IGNORE: Ignores the presence of a dangling lock file in the persistent global state.
Remote caches
Data Grid Server saves cluster state to the $RHDG_HOME/server/data directory.
You should never delete or modify the server/data directory or its content. Data Grid restores cluster state from this directory when you restart your server instances.
Changing the default configuration or directly modifying the server/data directory can cause unexpected behavior and lead to data loss.
Embedded caches
Data Grid defaults to the user.dir system property as the global persistent location. In most cases this is the directory where your application starts.
For clustered embedded caches, such as replicated or distributed, you should always enable and configure a global persistent location to restore cluster topology.
When using a file-based cache store, you should always configure a global persistent location. You should never configure an absolute path for a file-based cache store that is outside this location. For more details, see Configuring the Global Persistent Location. If you do, Data Grid writes the following exception to logs:
ISPN000558: "The store location 'foo' is not a child of the global persistent location 'bar'"
ISPN000558: "The store location 'foo' is not a child of the global persistent location 'bar'"
6.7.1. Configuring the global persistent location Copier lienLien copié sur presse-papiers!
Enable and configure the location where Data Grid stores global state for clustered embedded caches.
Data Grid Server enables global persistence and configures a default location. You should not disable global persistence or change the default configuration for remote caches.
Prerequisites
- Add Data Grid to your project.
Procedure
Enable global state in one of the following ways:
-
Add the
global-stateelement to your Data Grid configuration. -
Call the
globalState().enable()methods in theGlobalConfigurationBuilderAPI.
-
Add the
Define whether the global persistent location is unique to each node or shared between the cluster.
Expand Location type Configuration Unique to each node
persistent-locationelement orpersistentLocation()methodShared between the cluster
shared-persistent-locationelement orsharedPersistentLocation(String)methodSet the path where Data Grid stores cluster state.
For example, file-based cache stores the path is a directory on the host filesystem.
Values can be:
- Absolute and contain the full location including the root.
- Relative to a root location.
If you specify a relative value for the path, you must also specify a system property that resolves to a root location.
For example, on a Linux host system you set
global/stateas the path. You also set themy.dataproperty that resolves to the/opt/dataroot location. In this case Data Grid uses/opt/data/global/stateas the global persistent location.
Global persistent location configuration
XML
JSON
YAML
cacheContainer:
globalState:
persistentLocation:
path: "global/state"
relativeTo : "my.data"
cacheContainer:
globalState:
persistentLocation:
path: "global/state"
relativeTo : "my.data"
GlobalConfigurationBuilder
new GlobalConfigurationBuilder().globalState()
.enable()
.persistentLocation("global/state", "my.data");
new GlobalConfigurationBuilder().globalState()
.enable()
.persistentLocation("global/state", "my.data");
6.8. File-based cache stores Copier lienLien copié sur presse-papiers!
File-based cache stores provide persistent storage on the local host filesystem where Data Grid is running. For clustered caches, file-based cache stores are unique to each Data Grid node.
Never use filesystem-based cache stores on shared file systems, such as an NFS or Samba share, because they do not provide file locking capabilities and data corruption can occur.
Additionally if you attempt to use transactional caches with shared file systems, unrecoverable failures can happen when writing to files during the commit phase.
Soft-Index File Stores
SoftIndexFileStore is the default implementation for file-based cache stores and stores data in a set of append-only files.
When append-only files:
- Reach their maximum size, Data Grid creates a new file and starts writing to it.
- Reach the compaction threshold of less than 50% usage, Data Grid overwrites the entries to a new file and then deletes the old file.
Using SoftIndexFileStore in a clustered cache should enable purge on startup to ensure stale entries are not resurrected.
B+ trees
To improve performance, append-only files in a SoftIndexFileStore are indexed using a B+ Tree that can be stored both on disk and in memory. The in-memory index uses Java soft references to ensure it can be rebuilt if removed by Garbage Collection (GC) then requested again.
Because SoftIndexFileStore uses Java soft references to keep indexes in memory, it helps prevent out-of-memory exceptions. GC removes indexes before they consume too much memory while still falling back to disk.
SoftIndexFileStore creates a B+ tree per configured cache segment. This provides an additional "index" as it only has so many elements and provides additional parallelism for index updates.
Each entry in the B+ tree is a node. By default, the size of each node is limited to 4096 bytes. SoftIndexFileStore throws an exception if keys are longer after serialization occurs.
File limits
SoftIndexFileStore will use two plus the configured openFilesLimit amount of files at a given time. The two additional file pointers are reserved for the log appender for newly updated data and another for the compactor which writes compacted entries into a new file.
The amount of open allocated files allocated for indexing is one tenth of the total number of the configured openFilesLimit. This number has a minimum of 1 or the number of cache segments. Any number remaning from configured limit is allocated for open data files themselves.
Segmentation
Soft-index file stores by default are segmented. The append log(s) are not directly segmented and segmentation is handled directly by the index. It is possible to disable segmentation, which will effectively change the store to store a single index for all contents of the cache as if the number of cache segments was set to 1.
Note that this store can be segmented even when not running in a clustered cache such as distributed. In that case the clustering segments configuration value is still used to configure how many segments are to be used.
Expiration
The SoftIndexFileStore has full support for expired entries and their requirements.
6.8.1. Configuring file-based cache stores Copier lienLien copié sur presse-papiers!
Add file-based cache stores to Data Grid to persist data on the host filesystem.
Prerequisites
- Enable global state and configure a global persistent location if you are configuring embedded caches.
Procedure
-
Add the
persistenceelement to your cache configuration. -
Optionally specify
trueas the value for thepassivationattribute to write to the file-based cache store only when data is evicted from memory. -
Include the
file-storeelement and configure attributes as appropriate. Specify
falseas the value for thesharedattribute.File-based cache stores should always be unique to each Data Grid instance. If you want to use the same persistent across a cluster, configure shared storage such as a JDBC string-based cache store .
-
Configure the
indexanddataelements to specify the location where Data Grid creates indexes and stores data. -
Include the
write-behindelement if you want to configure the cache store with write-behind mode.
File-based cache store configuration
XML
JSON
YAML
ConfigurationBuilder
ò :leveloffset: +1