이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Data Grid Guide to Cross-Site Replication
Back up data across global Data Grid clusters
Abstract
Red Hat Data Grid 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid is a high-performance, distributed in-memory data store.
- Schemaless data structure
- Flexibility to store different objects as key-value pairs.
- Grid-based data storage
- Designed to distribute and replicate data across clusters.
- Elastic scaling
- Dynamically adjust the number of nodes to meet demand without service disruption.
- Data interoperability
- Store, retrieve, and query data in the grid from different endpoints.
Data Grid documentation 링크 복사링크가 클립보드에 복사되었습니다!
Documentation for Data Grid is available on the Red Hat customer portal.
Data Grid downloads 링크 복사링크가 클립보드에 복사되었습니다!
Access the Data Grid Software Downloads on the Red Hat customer portal.
You must have a Red Hat account to access and download Data Grid software.
Making open source more inclusive 링크 복사링크가 클립보드에 복사되었습니다!
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.
Chapter 1. Data Grid Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Cross-site replication allows you to back up data from one Data Grid cluster to another. Learn the concepts to understand how Data Grid cross-site replication works before you configure your clusters.
1.1. Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid clusters running in different locations can discover and communicate with each other.
Sites are typically data centers in various geographic locations. Cross-site replication bridges Data Grid clusters in sites to form global clusters, as in the following diagram:
LON is a datacenter in London, England.
NYC is a datacenter in New York City, USA.
Data Grid can form global clusters across two or more sites.
For example, configure a third Data Grid cluster running in San Francisco, SFO, as backup location for LON and NYC.
1.1.1. Site Masters 링크 복사링크가 클립보드에 복사되었습니다!
Site masters are the nodes in Data Grid clusters that are responsible for sending and receiving requests from backup locations.
If a node is not a site master, it must forward backup requests to a local site master. Only site masters can send requests to backup locations.
For optimal performance, you should configure all nodes as site masters. This increases the speed of backup requests because each node in the cluster can backup to remote sites directly without having to forward backup requests to site masters.
Diagrams in this document illustrate Data Grid clusters with one site master because this is the default for the JGroups RELAY2 protocol. Likewise, a single site master is easier to illustrate because each site master in a cluster communicates with each site master in the remote cluster.
1.2. Adding Backups to Caches 링크 복사링크가 클립보드에 복사되었습니다!
Name remote sites as backup locations in your cache definitions.
For example, the following diagram shows three caches, "customers", "eu-orders", and "us-orders":
- In LON, "customers" names NYC as a backup location.
- In NYC, "customers" names LON as a backup location.
- "eu-orders" and "us-orders" do not have backups and are local to the respective cluster.
1.3. Backup Strategies 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid clusters can use different strategies for backing up data to remote sites.
Data Grid replicates across sites at the same time that writes to local caches occur. For example, if a client writes "k1" to LON, Data Grid backs up "k1" to NYC at the same time.
1.3.1. Synchronous Backups 링크 복사링크가 클립보드에 복사되었습니다!
When Data Grid replicates data to backup locations, it waits until the operation completes before writing to the local cache.
You can control how Data Grid handles writes to the local cache if backup operations fail. For example, you can configure Data Grid to attempt to abort local writes and throw exceptions if backups to remote sites fail.
Synchronous backups also support two-phase commits with caches that participate in optimistic transactions. The first phase of the backup acquires a lock. The second phase commits the modification.
Two-phase commit with cross-site replication has a significant performance impact because it requires two round-trips across the network.
1.3.2. Asynchronous Backups 링크 복사링크가 클립보드에 복사되었습니다!
When Data Grid replicates data to backup locations, it does not wait until the operation completes before writing to the local cache.
Asynchronous backup operations and writes to the local cache are independent of each other. If backup operations fail, write operations to the local cache continue and no exceptions occur.
1.3.3. Synchronous vs Asynchronous Backups 링크 복사링크가 클립보드에 복사되었습니다!
Synchronous backups offer the strongest guarantee of data consistency across sites. If strategy=sync, when cache.put() calls return you know the value is up to date in the local cache and in the backup locations.
The trade-off for this consistency is performance. Synchronous backups have much greater latency in comparison to asynchronous backups.
Asynchronous backups, on the other hand, do not add latency to client requests so they have no performance impact. However, if strategy=async, when cache.put() calls return you cannot be sure of the value in the backup locations is the same as in the local cache.
1.4. Automatically Taking Backups Offline 링크 복사링크가 클립보드에 복사되었습니다!
You can configure backup locations to go offline automatically when the remote sites become unavailable. This prevents Data Grid nodes from continuously attempting to replicate data to offline backup locations, which results in error messages and consumes resources.
Timeout for backup operations
Backup configurations include timeout values for operations to replicate data. If operations do not complete before the timeout occurs, Infinispan records them as failures.
<backup site="NYC" strategy="ASYNC" timeout="10000"> ... </backup>
<backup site="NYC" strategy="ASYNC" timeout="10000">
...
</backup>
- 1
- Operations to replicate data to NYC are recorded as failures if they do not complete after 10 seconds.
Number of failures
You can specify the number of consecutive failures that can occur before backup locations go offline.
For example, the following configuration for NYC sets five as the number of failed operations before it goes offline:
<backup site="NYC" strategy="ASYNC" timeout="10000"> <take-offline after-failures="5"/> </backup>
<backup site="NYC" strategy="ASYNC" timeout="10000">
<take-offline after-failures="5"/>
</backup>
- 1
- If a cluster attempts to replicate data to NYC and five consecutive operations fail, Data Grid automatically takes the backup offline.
Time to wait
You can also specify how long to wait before taking sites offline when backup operations fail. If a backup request succeeds before the wait time runs out, Data Grid does not take the site offline.
<backup site="NYC" strategy="ASYNC" timeout="10000">
<take-offline after-failures="5"
min-wait="15000"/>
</backup>
<backup site="NYC" strategy="ASYNC" timeout="10000">
<take-offline after-failures="5"
min-wait="15000"/>
</backup>
- 1
- If a cluster attempts to replicate data to NYC and there are five consecutive failures and 15 seconds elapse after the first failed operation, Data Grid automatically takes the backup offline.
Set a negative or zero value for the after-failures attribute if you want to use only a minimum time to wait to take sites offline.
<take-offline after-failures="-1" min-wait="10000"/>
<take-offline after-failures="-1" min-wait="10000"/>
1.5. State Transfer 링크 복사링크가 클립보드에 복사되었습니다!
State transfer is an administrative operation that synchronizes data between sites.
For example, LON goes offline and NYC starts handling client requests. When you bring LON back online, the Data Grid cluster in LON does not have the same data as the cluster in NYC.
To ensure the data is consistent between LON and NYC, you can push state from NYC to LON.
- State transfer is bidirectional. For example, you can push state from NYC to LON or from LON to NYC.
- Pushing state to offline sites brings them back online.
State transfer overwrites only data that exists on both sites, the originating site and the receiving site. Data Grid does not delete data.
For example, "k2" exists on LON and NYC. "k2" is removed from NYC while LON is offline. When you bring LON back online, "k2" still exists at that location. If you push state from NYC to LON, the transfer does not affect "k2" on LON.
TipTo ensure contents of the cache are identical after state transfer, remove all data from the cache on the receiving site before pushing state. Use the
clear()method.State transfer does not overwrite updates to data that occur after you initiate the push.
For example, "k1,v1" exists on LON and NYC. LON goes offline so you push state transfer to LON from NYC, which brings LON back online. Before state transfer completes, a client puts "k1,v2" on LON.
In this case the state transfer from NYC does not overwrite "k1,v2" because that modification happened after you initiated the push.
Reference
- org.infinispan.Cache.clear()
- Tip
Run
help clearcachefrom the CLI for command details and examples. - Clearing Caches with the REST API
1.6. Client Connections Across Sites 링크 복사링크가 클립보드에 복사되었습니다!
Clients can write to Data Grid clusters in either an Active/Passive or Active/Active configuration.
Active/Passive
The following diagram illustrates Active/Passive where Data Grid handles client requests from one site only:
In the preceding image:
- Client connects to the Data Grid cluster at LON.
- Client writes "k1" to the cache.
- The site master at LON, "n1", sends the request to replicate "k1" to the site master at NYC, "nA".
With Active/Passive, NYC provides data redundancy. If the Data Grid cluster at LON goes offline for any reason, clients can start sending requests to NYC. When you bring LON back online you can synchronize data with NYC and then switch clients back to LON.
Active/Active
The following diagram illustrates Active/Active where Data Grid handles client requests at two sites:
In the preceding image:
- Client A connects to the Data Grid cluster at LON.
- Client A writes "k1" to the cache.
- Client B connects to the Data Grid cluster at NYC.
- Client B writes "k2" to the cache.
- Site masters at LON and NYC send requests so that "k1" is replicated to NYC and "k2" is replicated to LON.
With Active/Active both NYC and LON replicate data to remote caches while handling client requests. If either NYC or LON go offline, clients can start sending requests to the online site. You can then bring offline sites back online, push state to synchronize data, and switch clients as required.
1.6.1. Concurrent Writes and Conflicting Entries 링크 복사링크가 클립보드에 복사되었습니다!
Conflicting entries can occur with Active/Active site configurations if clients write to the same entries at the same time but at different sites.
For example, client A writes to "k1" in LON at the same time that client B writes to "k1" in NYC. In this case, "k1" has a different value in LON than in NYC. After replication occurs, there is no guarantee which value for "k1" exists at which site.
To ensure data consistency, Data Grid uses a vector clock algorithm to detect conflicting entries during backup operations, as in the following illustration:
Vector clocks are timestamp metadata that increment with each write to an entry. In the preceding example, 0,0 represents the initial value for the vector clock on "k1".
A client puts "k1=2" in LON and the vector clock is 1,0, which Data Grid replicates to NYC. A client then puts "k1=3" in NYC and the vector clock updates to 1,1, which Data Grid replicates to LON.
However if a client puts "k1=5" in LON at the same time that a client puts "k1=8" in NYC, Data Grid detects a conflicting entry because the vector value for "k1" is not strictly greater or less between LON and NYC.
When it finds conflicting entries, Data Grid uses the Java compareTo(String anotherString) method to compare site names. To determine which key takes priority, Data Grid selects the site name that is lexicographically less than the other. Keys from a site named AAA take priority over keys from a site named AAB and so on.
Following the same example, to resolve the conflict for "k1", Data Grid uses the value for "k1" that originates from LON. This results in "k1=5" in both LON and NYC after Data Grid resolves the conflict and replicates the value.
Prepend site names with numbers as a simple way to represent the order of priority for resolving conflicting entries; for example, 1LON and 2NYC.
Reference
1.7. Expiration and Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid expiration controls how long entries remain in the cache.
-
lifespanexpiration is suitable for cross-site replication. When entries reach the maximum lifespan, Data Grid expires them independently of the remote sites. -
max-idleexpiration does not work with cross-site replication. Data Grid cannot determine when cache entries reach the idle timeout in remote sites.
Chapter 2. Configuring Data Grid for Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Configuring Data Grid to replicate data across sites, you first set up cluster transport so Data Grid clusters can discover each other and site masters can communicate. You then add backup locations to cache definitions in your Data Grid configuration.
2.1. Configuring Cluster Transport for Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Add JGroups RELAY2 to your transport layer so that Data Grid clusters can communicate with backup locations.
Procedure
-
Open
infinispan.xmlfor editing. Add the RELAY2 protocol to a JGroups stack, for example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configure Data Grid cluster transport to use the stack, as in the following example:
<cache-container name="default" statistics="true"> <transport cluster="${cluster.name}" stack="xsite"/> </cache-container><cache-container name="default" statistics="true"> <transport cluster="${cluster.name}" stack="xsite"/> </cache-container>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Save and close
infinispan.xml.
2.1.1. JGroups RELAY2 Stacks 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid clusters use JGroups RELAY2 for inter-cluster discovery and communication.
- 1
- Defines a stack named "xsite" that declares which protocols to use for your Data Grid cluster transport.
- 2
- Uses the default JGroups UDP stack for intra-cluster traffic.
- 3
- Adds
RELAY2to the stack for inter-cluster transport. - 4
- Names the local site. Data Grid replicates data in caches from this site to backup locations.
- 5
- Configures a maximum of 1000 site masters for the local cluster. You should set
max_site_masters>= the number of nodes in the Data Grid cluster for optimal performance with backup requests. - 6
- Specifies all site names and uses the default JGroups TCP stack for inter-cluster transport.
- 7
- Names each remote site as a backup location.
2.1.2. Custom JGroups RELAY2 Stacks 링크 복사링크가 클립보드에 복사되었습니다!
You can also reference externally defined JGroups stack files as follows:
<stack-file name="relay-global" path="jgroups-relay.xml"/>
<stack-file name="relay-global" path="jgroups-relay.xml"/>
In the preceding configuration jgroups-relay.xml provides a JGroups stack such as this:
2.2. Adding Backup Locations to Caches 링크 복사링크가 클립보드에 복사되었습니다!
Specify the names of remote sites so Data Grid can back up data to those locations.
Procedure
-
Add the
backupselement to your cache definition. Specify the name of each remote site with the
backupelement.As an example, in the LON configuration, specify NYC as the remote site.
- Repeat the preceding steps so that each site is a backup for all other sites. For example, you cannot add LON as a backup for NYC without adding NYC as a backup for LON.
Cache configurations can be different across sites and use different backup strategies. Data Grid replicates data based on cache names.
Example "customers" configuration in LON
<replicated-cache name="customers">
<backups>
<backup site="NYC" strategy="ASYNC" />
</backups>
</replicated-cache>
<replicated-cache name="customers">
<backups>
<backup site="NYC" strategy="ASYNC" />
</backups>
</replicated-cache>
Example "customers" configuration in NYC
<distributed-cache name="customers">
<backups>
<backup site="LON" strategy="SYNC" />
</backups>
</distributed-cache>
<distributed-cache name="customers">
<backups>
<backup site="LON" strategy="SYNC" />
</backups>
</distributed-cache>
Reference
2.3. Backing Up to Caches with Different Names 링크 복사링크가 클립보드에 복사되었습니다!
By default, Data Grid replicates data between caches that have the same name.
Procedure
-
Use
backup-forto replicate data from a remote site into a cache with a different name on the local site.
For example, the following configuration backs up the "customers" cache on LON to the "eu-customers" cache on NYC.
2.4. Verifying Cross-Site Views 링크 복사링크가 클립보드에 복사되었습니다!
After you configure Data Grid for cross-site replication, you should verify that Data Grid clusters successfully form cross-site views.
Procedure
-
Check log messages for
ISPN000439: Received new x-site viewmessages.
For example, if the Data Grid cluster in LON has formed a cross-site view with the Data Grid cluster in NYC, it provides the following messages:
INFO [org.infinispan.XSITE] (jgroups-5,${server.hostname}) ISPN000439: Received new x-site view: [NYC]
INFO [org.infinispan.XSITE] (jgroups-7,${server.hostname}) ISPN000439: Received new x-site view: [NYC, LON]
INFO [org.infinispan.XSITE] (jgroups-5,${server.hostname}) ISPN000439: Received new x-site view: [NYC]
INFO [org.infinispan.XSITE] (jgroups-7,${server.hostname}) ISPN000439: Received new x-site view: [NYC, LON]
2.5. Configuring Hot Rod Clients for Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Configure Hot Rod clients to use Data Grid clusters at different sites.
hotrod-client.properties
Servers at the active site Servers at the backup site
# Servers at the active site
infinispan.client.hotrod.server_list = LON_host1:11222,LON_host2:11222,LON_host3:11222
# Servers at the backup site
infinispan.client.hotrod.cluster.NYC = NYC_hostA:11222,NYC_hostB:11222,NYC_hostC:11222,NYC_hostD:11222
ConfigurationBuilder
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.addServers("LON_host1:11222;LON_host2:11222;LON_host3:11222")
.addCluster("NYC")
.addClusterNodes("NYC_hostA:11222;NYC_hostB:11222;NYC_hostC:11222;NYC_hostD:11222")
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.addServers("LON_host1:11222;LON_host2:11222;LON_host3:11222")
.addCluster("NYC")
.addClusterNodes("NYC_hostA:11222;NYC_hostB:11222;NYC_hostC:11222;NYC_hostD:11222")
Use the following methods to switch Hot Rod clients to the default cluster or to a cluster at a different site:
-
RemoteCacheManager.switchToDefaultCluster() -
RemoteCacheManager.switchToCluster(${site.name})
Chapter 3. Performing Cross-Site Replication Operations 링크 복사링크가 클립보드에 복사되었습니다!
Bring sites online and offline. Transfer cache state to remote sites.
3.1. Performing Cross-Site Operations with the CLI 링크 복사링크가 클립보드에 복사되었습니다!
The Data Grid command line interface lets you remotely connect to Data Grid servers, manage sites, and push state transfer to backup locations.
Prerequisites
- Start the Data Grid CLI.
- Connect to a running Data Grid cluster.
3.1.1. Bringing Backup Locations Offline and Online 링크 복사링크가 클립보드에 복사되었습니다!
Take backup locations offline manually and bring them back online.
Procedure
- Create a CLI connection to Data Grid.
Check if backup locations are online or offline with the
site statuscommand://containers/default]> site status --cache=cacheName --site=NYC
//containers/default]> site status --cache=cacheName --site=NYCCopy to Clipboard Copied! Toggle word wrap Toggle overflow Note--siteis an optional argument. If not set, the CLI returns all backup locations.Manage backup locations as follows:
Bring backup locations online with the
bring-onlinecommand://containers/default]> site bring-online --cache=customers --site=NYC
//containers/default]> site bring-online --cache=customers --site=NYCCopy to Clipboard Copied! Toggle word wrap Toggle overflow Take backup locations offline with the
take-offlinecommand://containers/default]> site take-offline --cache=customers --site=NYC
//containers/default]> site take-offline --cache=customers --site=NYCCopy to Clipboard Copied! Toggle word wrap Toggle overflow
For more information and examples, run the help site command.
3.1.2. Pushing State to Backup Locations 링크 복사링크가 클립보드에 복사되었습니다!
Transfer cache state to remote backup locations.
Procedure
- Create a CLI connection to Data Grid.
Use the
sitecommand to push state transfer, as in the following example://containers/default]> site push-site-state --cache=cacheName --site=NYC
//containers/default]> site push-site-state --cache=cacheName --site=NYCCopy to Clipboard Copied! Toggle word wrap Toggle overflow
For more information and examples, run the help site command.
Reference
3.2. Performing Cross-Site Operations with the REST API 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid servers provide a REST API that allows you to perform cross-site operations.
3.2.1. Getting Status of All Backup Locations 링크 복사링크가 클립보드에 복사되었습니다!
Retrieve the status of all backup locations with GET requests.
GET /v2/caches/{cacheName}/x-site/backups/
GET /v2/caches/{cacheName}/x-site/backups/
Data Grid responds with the status of each backup location in JSON format, as in the following example:
{
"NYC": "online",
"LON": "offline"
}
{
"NYC": "online",
"LON": "offline"
}
| Value | Description |
|---|---|
|
| All nodes in the local cluster have a cross-site view with the backup location. |
|
| No nodes in the local cluster have a cross-site view with the backup location. |
|
| Some nodes in the local cluster have a cross-site view with the backup location, other nodes in the local cluster do not have a cross-site view. The response indicates status for each node. |
3.2.2. Getting Status of Specific Backup Locations 링크 복사링크가 클립보드에 복사되었습니다!
Retrieve the status of a backup location with GET requests.
GET /v2/caches/{cacheName}/x-site/backups/{siteName}
GET /v2/caches/{cacheName}/x-site/backups/{siteName}
Data Grid responds with the status of each node in the site in JSON format, as in the following example:
{
"NodeA":"offline",
"NodeB":"online"
}
{
"NodeA":"offline",
"NodeB":"online"
}
| Value | Description |
|---|---|
|
| The node is online. |
|
| The node is offline. |
|
| Not possible to retrieve status. The remote cache could be shutting down or a network error occurred during the request. |
3.2.3. Taking Backup Locations Offline 링크 복사링크가 클립보드에 복사되었습니다!
Take backup locations offline with POST requests and the ?action=take-offline parameter.
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=take-offline
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=take-offline
3.2.4. Bringing Backup Locations Online 링크 복사링크가 클립보드에 복사되었습니다!
Bring backup locations online with the ?action=bring-online parameter.
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=bring-online
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=bring-online
3.2.5. Pushing State to Backup Locations 링크 복사링크가 클립보드에 복사되었습니다!
Push cache state to a backup location with the ?action=start-push-state parameter.
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=start-push-state
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=start-push-state
3.2.6. Canceling State Transfer 링크 복사링크가 클립보드에 복사되었습니다!
Cancel state transfer operations with the ?action=cancel-push-state parameter.
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=cancel-push-state
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=cancel-push-state
3.2.7. Getting State Transfer Status 링크 복사링크가 클립보드에 복사되었습니다!
Retrieve status of state transfer operations with the ?action=push-state-status parameter.
GET /v2/caches/{cacheName}/x-site/backups?action=push-state-status
GET /v2/caches/{cacheName}/x-site/backups?action=push-state-status
Data Grid responds with the status of state transfer for each backup location in JSON format, as in the following example:
{
"NYC":"CANCELED",
"LON":"OK"
}
{
"NYC":"CANCELED",
"LON":"OK"
}
| Value | Description |
|---|---|
|
| State transfer to the backup location is in progress. |
|
| State transfer completed successfully. |
|
| An error occurred with state transfer. Check log files. |
|
| State transfer cancellation is in progress. |
3.2.8. Clearing State Transfer Status 링크 복사링크가 클립보드에 복사되었습니다!
Clear state transfer status for sending sites with the ?action=clear-push-state-status parameter.
POST /v2/caches/{cacheName}/x-site/local?action=clear-push-state-status
POST /v2/caches/{cacheName}/x-site/local?action=clear-push-state-status
3.2.9. Modifying Take Offline Conditions 링크 복사링크가 클립보드에 복사되었습니다!
Sites go offline if certain conditions are met. Modify the take offline parameters to control when backup locations automatically go offline.
Procedure
Check configured take offline parameters with
GETrequests and thetake-offline-configparameter.GET /v2/caches/{cacheName}/x-site/backups/{siteName}/take-offline-configGET /v2/caches/{cacheName}/x-site/backups/{siteName}/take-offline-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow The Data Grid response includes
after_failuresandmin_waitfields as follows:{ "after_failures": 2, "min_wait": 1000 }{ "after_failures": 2, "min_wait": 1000 }Copy to Clipboard Copied! Toggle word wrap Toggle overflow Modify take offline parameters in the body of
PUTrequests.PUT /v2/caches/{cacheName}/x-site/backups/{siteName}/take-offline-configPUT /v2/caches/{cacheName}/x-site/backups/{siteName}/take-offline-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow
3.2.10. Canceling State Transfer from Receiving Sites 링크 복사링크가 클립보드에 복사되었습니다!
If the connection between two backup locations breaks, you can cancel state transfer on the site that is receiving the push.
Cancel state transfer from a remote site and keep the current state of the local cache with the ?action=cancel-receive-state parameter.
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=cancel-receive-state
POST /v2/caches/{cacheName}/x-site/backups/{siteName}?action=cancel-receive-state
3.2.11. Getting Status of Backup Locations 링크 복사링크가 클립보드에 복사되었습니다!
Retrieve the status of all backup locations from Cache Managers with GET requests.
GET /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/
GET /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/
Data Grid responds with status in JSON format, as in the following example:
| Value | Description |
|---|---|
|
| All nodes in the local cluster have a cross-site view with the backup location. |
|
| No nodes in the local cluster have a cross-site view with the backup location. |
|
| Some nodes in the local cluster have a cross-site view with the backup location, other nodes in the local cluster do not have a cross-site view. The response indicates status for each node. |
3.2.12. Taking Backup Locations Offline 링크 복사링크가 클립보드에 복사되었습니다!
Take backup locations offline with the ?action=take-offline parameter.
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=take-offline
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=take-offline
3.2.13. Bringing Backup Locations Online 링크 복사링크가 클립보드에 복사되었습니다!
Bring backup locations online with the ?action=bring-online parameter.
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=bring-online
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=bring-online
3.2.14. Starting State Transfer 링크 복사링크가 클립보드에 복사되었습니다!
Push state of all caches to remote sites with the ?action=start-push-state parameter.
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=start-push-state
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=start-push-state
3.2.15. Canceling State Transfer 링크 복사링크가 클립보드에 복사되었습니다!
Cancel ongoing state transfer operations with the ?action=cancel-push-state parameter.
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=cancel-push-state
POST /rest/v2/cache-managers/{cacheManagerName}/x-site/backups/{siteName}?action=cancel-push-state
3.3. Performing Cross-Site Operations with JMX 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid provides JMX tooling to perfrom cross-site operations such as pushing state transfer and bringing sites online.
3.3.1. Configuring Data Grid to Register JMX MBeans 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid can register JMX MBeans that you can use to collect statistics and perform administrative operations. However, you must enable statistics separately to JMX otherwise Data Grid provides 0 values for all statistic attributes.
Procedure
- Enable JMX declaratively or programmatically.
Declaratively
<cache-container> <jmx enabled="true" /> </cache-container>
<cache-container>
<jmx enabled="true" />
</cache-container>
- 1
- Registers Data Grid JMX MBeans.
Programmatically
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder() .jmx().enable() .build();
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
.jmx().enable()
.build();
- 1
- Registers Data Grid JMX MBeans.
3.3.2. Performing Cross-Site Operations 링크 복사링크가 클립보드에 복사되었습니다!
Perform cross-site operations via JMX clients.
Prerequisites
- Configure Data Grid to register JMX MBeans
Procedure
- Connect to Data Grid with any JMX client.
Invoke operations from the following MBeans:
-
XSiteAdminprovides cross-site operations for caches. GlobalXSiteAdminOperationsprovides cross-site operations for Cache Managers.For example, to bring sites back online, invoke
bringSiteOnline(siteName).
-
See the Data Grid JMX Components documentation for details about available cross-site operations.
Chapter 4. Monitoring and Troubleshooting Global Data Grid Clusters 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid provides statistics for cross-site replication operations via JMX or the /metrics endpoint for Data Grid server.
Cross-site replication statistics are available at cache level so you must explicitly enable statistics for your caches. Likewise, if you want to collect statistics via JMX you must configure Data Grid to register MBeans.
Data Grid also includes an org.infinispan.XSITE logging category so you can monitor and troubleshoot common issues with networking and state transfer operations.
4.1. Enabling Data Grid Statistics 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid lets you enable statistics for Cache Managers and caches. However, enabling statistics for a Cache Manager does not enable statistics for the caches that it controls. You must explicitly enable statistics for your caches.
Data Grid server enables statistics for Cache Managers by default.
Procedure
- Enable statistics declaratively or programmatically.
Declaratively
<cache-container statistics="true"> <local-cache name="mycache" statistics="true"/> </cache-container>
<cache-container statistics="true">
<local-cache name="mycache" statistics="true"/>
</cache-container>
Programmatically
4.2. Enabling Data Grid Metrics 링크 복사링크가 클립보드에 복사되었습니다!
Configure Data Grid to export gauges and histograms.
Procedure
- Configure metrics declaratively or programmatically.
Declaratively
<cache-container statistics="true"> <metrics gauges="true" histograms="true" /> </cache-container>
<cache-container statistics="true">
<metrics gauges="true" histograms="true" />
</cache-container>
Programmatically
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder() .statistics().enable() .metrics().gauges(true).histograms(true) .build();
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
.statistics().enable()
.metrics().gauges(true).histograms(true)
.build();
4.2.1. Collecting Data Grid Metrics 링크 복사링크가 클립보드에 복사되었습니다!
Collect Data Grid metrics with monitoring tools such as Prometheus.
Prerequisites
-
Enable statistics. If you do not enable statistics, Data Grid provides
0and-1values for metrics. - Optionally enable histograms. By default Data Grid generates gauges but not histograms.
Procedure
Get metrics in Prometheus (OpenMetrics) format:
curl -v http://localhost:11222/metrics
$ curl -v http://localhost:11222/metricsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get metrics in MicroProfile JSON format:
curl --header "Accept: application/json" http://localhost:11222/metrics
$ curl --header "Accept: application/json" http://localhost:11222/metricsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Next steps
Configure monitoring applications to collect Data Grid metrics. For example, add the following to prometheus.yml:
static_configs:
- targets: ['localhost:11222']
static_configs:
- targets: ['localhost:11222']
Reference
- Prometheus Configuration
- Enabling Data Grid Statistics
4.3. Configuring Data Grid to Register JMX MBeans 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid can register JMX MBeans that you can use to collect statistics and perform administrative operations. However, you must enable statistics separately to JMX otherwise Data Grid provides 0 values for all statistic attributes.
Procedure
- Enable JMX declaratively or programmatically.
Declaratively
<cache-container> <jmx enabled="true" /> </cache-container>
<cache-container>
<jmx enabled="true" />
</cache-container>
- 1
- Registers Data Grid JMX MBeans.
Programmatically
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder() .jmx().enable() .build();
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
.jmx().enable()
.build();
- 1
- Registers Data Grid JMX MBeans.
4.3.1. JMX MBeans for Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Data Grid provides JMX MBeans for cross-site replication that let you gather statistics and perform remote operations.
The org.infinispan:type=Cache component provides the following JMX MBeans:
-
XSiteAdminexposes cross-site operations that apply to specific cache instances. -
StateTransferManagerprovides statistics for state transfer operations. -
InboundInvocationHandlerprovides statistics and operations for asynchronous and synchronous cross-site requests.
The org.infinispan:type=CacheManager component includes the following JMX MBean:
-
GlobalXSiteAdminOperationsexposes cross-site operations that apply to all caches in a cache container.
For details about JMX MBeans along with descriptions of available operations and statistics, see the Data Grid JMX Components documentation.
Reference
4.4. Collecting Logs and Troubleshooting Cross-Site Replication 링크 복사링크가 클립보드에 복사되었습니다!
Diagnose and resolve issues related to Data Grid cross-site replication. Use the Data Grid Command Line Interface (CLI) to adjust log levels at run-time and perform cross-site troubleshooting.
Procedure
-
Open a terminal in
$RHDG_HOME. - Create a Data Grid CLI connection.
Adjust run-time logging levels to capture DEBUG messages if necessary.
For example, the following command enables DEBUG log messages for the org.infinispan.XSITE category:
[//containers/default]> logging set --level=DEBUG org.infinispan.XSITE
[//containers/default]> logging set --level=DEBUG org.infinispan.XSITECopy to Clipboard Copied! Toggle word wrap Toggle overflow You can then check the Data Grid log files for cross-site messages in the
${rhdg.server.root}/logdirectory.-
Use the
sitecommand to view status for backup locations and perform troubleshooting.
For example, check the status of the "customers" cache that uses "LON" as a backup location:
[//containers/default]> site status --cache=customers
{
"LON" : "online"
}
[//containers/default]> site status --cache=customers
{
"LON" : "online"
}
Another scenario for using the Data Grid CLI to troubleshoot is when the network connection between backup locations is broken during a state transfer operation.
If this occurs, Data Grid clusters that receive state transfer continually wait for the operation to complete. In this case you should cancel the state transfer to the receiving site to return it to a normal operational state.
For example, cancel state transfer for "NYC" as follows:
[//containers/default]> site cancel-receive-state --cache=mycache --site=NYC`
[//containers/default]> site cancel-receive-state --cache=mycache --site=NYC`
4.4.1. Cross-Site Log Messages 링크 복사링크가 클립보드에 복사되었습니다!
Find user actions for log messages related to cross-site replication.
| Log level | Identifier | Message | Description |
|---|---|---|---|
| DEBUG | ISPN000400 | Node null was suspected | Data Grid prints this message when it cannot reach backup locations. Ensure that sites are online and check network status. |
| INFO | ISPN000439 | Received new x-site view: ${site.name} | Data Grid prints this message when sites join and leave the global cluster. |
| INFO | ISPN100005 | Site ${site.name} is online. | Data Grid prints this message when a site comes online. |
| INFO | ISPN100006 | Site ${site.name} is offline. | Data Grid prints this message when a site goes offline. If you did not take the site offline manually, this message could indicate a failure has occurred. Check network status and try to bring the site back online. |
| WARN | ISPN000202 | Problems backing up data for cache ${cache.name} to site ${site.name}: | Data Grid prints this message when issues occur with state transfer operations along with the exception. If necessary adjust Data Grid logging to get more fine-grained logging messages. |
| WARN | ISPN000289 | Unable to send X-Site state chunk to ${site.name}. | Indicates that Data Grid cannot transfer a batch of cache entries during a state transfer operation. Ensure that sites are online and check network status. |
| WARN | ISPN000291 | Unable to apply X-Site state chunk. | Indicates that Data Grid cannot apply a batch of cache entries during a state transfer operation. Ensure that sites are online and check network status. |
| WARN | ISPN000322 | Unable to re-start x-site state transfer to site ${site.name} | Indicates that Data Grid cannot resume a state transfer operation to a backup location. Ensure that sites are online and check network status. |
| ERROR | ISPN000477 | Unable to perform operation ${operation.name} for site ${site.name} | Indicates that Data Grid cannot successfully complete an operation on a backup location. If necessary adjust Data Grid logging to get more fine-grained logging messages. |
| FATAL | ISPN000449 | XSite state transfer timeout must be higher or equals than 1 (one). |
Results when the value of the |
| FATAL | ISPN000450 | XSite state transfer waiting time between retries must be higher or equals than 1 (one). |
Results when the value of the |
| FATAL | ISPN000576 | Cross-site Replication not available for local cache. | Cross-site replication does not work with the local cache mode. Either remove the backup configuration from the local cache definition or use a distributed or replicated cache mode. |