Chapter 13. Setting up cross-site replication
Ensure availability with Data Grid Operator by configuring geographically distributed clusters as a unified service.
You can configure clusters to perform cross-site replication with:
- Connections that Data Grid Operator manages.
- Connections that you configure and manage.
You can use both managed and manual connections for Data Grid clusters in the same Infinispan
CR. You must ensure that Data Grid clusters establish connections in the same way at each site.
13.1. Cross-site replication expose types
You can use a NodePort
service, a LoadBalancer
service, or an OpenShift Route
to handle network traffic for backup operations between Data Grid clusters. Before you start setting up cross-site replication you should determine what expose type is available for your Red Hat OpenShift cluster. In some cases you may require an administrator to provision services before you can configure an expose type.
NodePort
A NodePort
is a service that accepts network traffic at a static port, in the 30000
to 32767
range, on an IP address that is available externally to the OpenShift cluster.
To use a NodePort
as the expose type for cross-site replication, an administrator must provision external IP addresses for each OpenShift node. In most cases, an administrator must also configure DNS routing for those external IP addresses.
LoadBalancer
A LoadBalancer
is a service that directs network traffic to the correct node in the OpenShift cluster.
Whether you can use a LoadBalancer
as the expose type for cross-site replication depends on the host platform. AWS supports network load balancers (NLB) while some other cloud platforms do not. To use a LoadBalancer
service, an administrator must first create an ingress controller backed by an NLB.
Route
An OpenShift Route
allows Data Grid clusters to connect with each other through a public secure URL.
Data Grid uses TLS with the SNI header to send backup requests between clusters through an OpenShift Route
. To do this you must add a keystore with TLS certificates so that Data Grid can encrypt network traffic for cross-site replication.
When you specify Route
as the expose type for cross-site replication, Data Grid Operator creates a route with TLS passthrough encryption for each Data Grid cluster that it manages. You can specify a hostname for the Route
but you cannot specify a Route
that you have already created.
Additional resources
13.2. Managed cross-site replication
Data Grid Operator can discover Data Grid clusters running in different data centers to form global clusters.
When you configure managed cross-site connections, Data Grid Operator creates router pods in each Data Grid cluster. Data Grid pods use the <cluster_name>-site
service to connect to these router pods and send backup requests.
Router pods maintain a record of all pod IP addresses and parse RELAY message headers to forward backup requests to the correct Data Grid cluster. If a router pod crashes then all Data Grid pods start using any other available router pod until OpenShift restores it.
To manage cross-site connections, Data Grid Operator uses the Kubernetes API. Each OpenShift cluster must have network access to the remote Kubernetes API and a service account token for each backup cluster.
Data Grid clusters do not start running until Data Grid Operator discovers all backup locations that you configure.
13.2.1. Creating service account tokens for managed cross-site connections
Generate service account tokens on OpenShift clusters that allow Data Grid Operator to automatically discover Data Grid clusters and manage cross-site connections.
Prerequisites
Ensure all OpenShift clusters have access to the Kubernetes API.
Data Grid Operator uses this API to manage cross-site connections.NoteData Grid Operator does not modify remote Data Grid clusters. The service account tokens provide read-only access through the Kubernetes API.
Procedure
- Log in to an OpenShift cluster.
Create a service account.
For example, create a service account at LON:
oc create sa -n <namespace> lon
Add the view role to the service account with the following command:
oc policy add-role-to-user view -n <namespace> -z lon
If you use a
NodePort
service to expose Data Grid clusters on the network, you must also add thecluster-reader
role to the service account:oc adm policy add-cluster-role-to-user cluster-reader -z lon -n <namespace>
- Repeat the preceding steps on your other OpenShift clusters.
- Exchange service account tokens on each OpenShift cluster.
13.2.2. Exchanging service account tokens
Generate service account tokens on your OpenShift clusters and add them into secrets at each backup location. The tokens that you generate in this procedure do not expire. For bound service account tokens, see Exchanging bound service account tokens.
Prerequisites
- You have created a service account.
Procedure
- Log in to your OpenShift cluster.
Create a service account token secret file as follows:
sa-token.yaml
apiVersion: v1 kind: Secret metadata: name: ispn-xsite-sa-token 1 annotations: kubernetes.io/service-account.name: "<service-account>" 2 type: kubernetes.io/service-account-token
Create the secret in your OpenShift cluster:
oc -n <namespace> create -f sa-token.yaml
Retrieve the service account token:
oc -n <namespace> get secrets ispn-xsite-sa-token -o jsonpath="{.data.token}" | base64 -d
The command prints the token in the terminal.
- Copy the token for deployment in the backup OpenShift cluster.
- Log in to the backup OpenShift cluster.
Add the service account token for a backup location:
oc -n <namespace> create secret generic <token-secret> --from-literal=token=<token>
The
<token-secret>
is the name of the secret configured in theInfinispan
CR.
Next steps
- Repeat the preceding steps on your other OpenShift clusters.
Additional resources
13.2.3. Exchanging bound service account tokens
Create service account tokens with a limited lifespan and add them into secrets at each backup location. You must refresh the token periodically to prevent Data Grid Operator from losing access to the remote OpenShift cluster. For non-expiring tokens, see Exchanging service account tokens.
Prerequisites
- You have created a service account.
Procedure
- Log in to your OpenShift cluster.
Create a bound token for the service account:
oc -n <namespace> create token <service-account>
NoteBy default, service account tokens are valid for one hour. Use the command option
--duration
to specify the lifespan in seconds..The command prints the token in the terminal.
- Copy the token for deployment in the backup OpenShift cluster(s).
- Log in to the backup OpenShift cluster.
Add the service account token for a backup location:
oc -n <namespace> create secret generic <token-secret> --from-literal=token=<token>
The
<token-secret>
is the name of the secret configured in theInfinispan
CR.- Repeat the steps on other OpenShift clusters.
Deleting expired tokens
When a token expires, delete the expired token secret, and then repeat the procedure to generate and exchange a new one.
- Log in to the backup OpenShift cluster.
Delete the expired secret
<token-secret>
:oc -n <namespace> delete secrets <token-secret>
-
Repeat the procedure to create a new token and generate a new
<token-secret>
.
Additional resources
13.2.4. Configuring managed cross-site connections
Configure Data Grid Operator to establish cross-site views with Data Grid clusters.
Prerequisites
-
Determine a suitable expose type for cross-site replication.
If you use an OpenShiftRoute
you must add a keystore with TLS certificates and secure cross-site connections. - Create and exchange Red Hat OpenShift service account tokens for each Data Grid cluster.
Procedure
-
Create an
Infinispan
CR for each Data Grid cluster. -
Specify the name of the local site with
spec.service.sites.local.name
. Configure the expose type for cross-site replication.
Set the value of the
spec.service.sites.local.expose.type
field to one of the following:-
NodePort
-
LoadBalancer
-
Route
-
Optionally specify a port or custom hostname with the following fields:
-
spec.service.sites.local.expose.nodePort
if you use aNodePort
service. -
spec.service.sites.local.expose.port
if you use aLoadBalancer
service. -
spec.service.sites.local.expose.routeHostName
if you use an OpenShiftRoute
.
-
Specify the number of pods that can send RELAY messages with the
service.sites.local.maxRelayNodes
field.TipConfigure all pods in your cluster to send
RELAY
messages for better performance. If all pods send backup requests directly, then no pods need to forward backup requests.-
Provide the name, URL, and secret for each Data Grid cluster that acts as a backup location with
spec.service.sites.locations
. If Data Grid cluster names or namespaces at the remote site do not match the local site, specify those values with the
clusterName
andnamespace
fields.The following are example
Infinispan
CR definitions for LON and NYC:LON
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: infinispan spec: replicas: 3 version: <Data Grid_version> service: type: DataGrid sites: local: name: LON expose: type: LoadBalancer port: 65535 maxRelayNodes: 1 locations: - name: NYC clusterName: <nyc_cluster_name> namespace: <nyc_cluster_namespace> url: openshift://api.rhdg-nyc.openshift-aws.myhost.com:6443 secretName: nyc-token logging: categories: org.jgroups.protocols.TCP: error org.jgroups.protocols.relay.RELAY2: error
NYC
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: nyc-cluster spec: replicas: 2 version: <Data Grid_version> service: type: DataGrid sites: local: name: NYC expose: type: LoadBalancer port: 65535 maxRelayNodes: 1 locations: - name: LON clusterName: infinispan namespace: rhdg-namespace url: openshift://api.rhdg-lon.openshift-aws.myhost.com:6443 secretName: lon-token logging: categories: org.jgroups.protocols.TCP: error org.jgroups.protocols.relay.RELAY2: error
ImportantBe sure to adjust logging categories in your
Infinispan
CR to decrease log levels for JGroups TCP and RELAY2 protocols. This prevents a large number of log files from uses container storage.spec: logging: categories: org.jgroups.protocols.TCP: error org.jgroups.protocols.relay.RELAY2: error
-
Configure your
Infinispan
CRs with any other Data Grid service resources and then apply the changes. Verify that Data Grid clusters form a cross-site view.
Retrieve the
Infinispan
CR.oc get infinispan -o yaml
-
Check for the
type: CrossSiteViewFormed
condition.
Next steps
If your clusters have formed a cross-site view, you can start adding backup locations to caches.
Additional resources
13.3. Manually configuring cross-site connections
You can specify static network connection details to perform cross-site replication with Data Grid clusters running outside OpenShift. Manual cross-site connections are necessary in any scenario where access to the Kubernetes API is not available outside the OpenShift cluster where Data Grid runs.
Prerequisites
-
Determine a suitable expose type for cross-site replication.
If you use an OpenShiftRoute
you must add a keystore with TLS certificates and secure cross-site connections. Ensure you have the correct host names and ports for each Data Grid cluster and each
<cluster-name>-site
service.Manually connecting Data Grid clusters to form cross-site views requires predictable network locations for Data Grid services, which means you need to know the network locations before they are created.
Procedure
-
Create an
Infinispan
CR for each Data Grid cluster. -
Specify the name of the local site with
spec.service.sites.local.name
. Configure the expose type for cross-site replication.
Set the value of the
spec.service.sites.local.expose.type
field to one of the following:-
NodePort
-
LoadBalancer
-
Route
-
Optionally specify a port or custom hostname with the following fields:
-
spec.service.sites.local.expose.nodePort
if you use aNodePort
service. -
spec.service.sites.local.expose.port
if you use aLoadBalancer
service. -
spec.service.sites.local.expose.routeHostName
if you use an OpenShiftRoute
.
-
Provide the name and static URL for each Data Grid cluster that acts as a backup location with
spec.service.sites.locations
, for example:LON
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: infinispan spec: replicas: 3 version: <Data Grid_version> service: type: DataGrid sites: local: name: LON expose: type: LoadBalancer port: 65535 maxRelayNodes: 1 locations: - name: NYC url: infinispan+xsite://infinispan-nyc.myhost.com:7900 logging: categories: org.jgroups.protocols.TCP: error org.jgroups.protocols.relay.RELAY2: error
NYC
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: infinispan spec: replicas: 2 version: <Data Grid_version> service: type: DataGrid sites: local: name: NYC expose: type: LoadBalancer port: 65535 maxRelayNodes: 1 locations: - name: LON url: infinispan+xsite://infinispan-lon.myhost.com logging: categories: org.jgroups.protocols.TCP: error org.jgroups.protocols.relay.RELAY2: error
ImportantBe sure to adjust logging categories in your
Infinispan
CR to decrease log levels for JGroups TCP and RELAY2 protocols. This prevents a large number of log files from uses container storage.spec: logging: categories: org.jgroups.protocols.TCP: error org.jgroups.protocols.relay.RELAY2: error
-
Configure your
Infinispan
CRs with any other Data Grid service resources and then apply the changes. Verify that Data Grid clusters form a cross-site view.
Retrieve the
Infinispan
CR.oc get infinispan -o yaml
-
Check for the
type: CrossSiteViewFormed
condition.
Next steps
If your clusters have formed a cross-site view, you can start adding backup locations to caches.
Additional resources
13.4. Allocating CPU and memory for Gossip router pod
Allocate CPU and memory resources to Data Grid Gossip router.
Prerequisite
-
Have Gossip router enabled. The
service.sites.local.discovery.launchGossipRouter
property must be set totrue
, which is the default value.
Procedure
-
Allocate the number of CPU units using the
service.sites.local.discovery.cpu
field. Allocate the amount of memory, in bytes, using the
service.sites.local.discovery.memory
field.The
cpu
andmemory
fields have values in the format of<limit>:<requests>
. For example,cpu: "2000m:1000m"
limits pods to a maximum of2000m
of CPU and requests1000m
of CPU for each pod at startup. Specifying a single value sets both the limit and request.-
Apply your
Infinispan
CR.
spec: service: type: DataGrid sites: local: name: LON discovery: launchGossipRouter: true memory: "2Gi:1Gi" cpu: "2000m:1000m"
13.5. Disabling local Gossip router and service
The Data Grid Operator starts a Gossip router on each site, but you only need a single Gossip router to manage traffic between the Data Grid cluster members. You can disable the additional Gossip routers to save resources.
For example, you have Data Grid clusters in LON and NYC sites. The following procedure shows how you can disable Gossip router in LON site and connect to NYC that has the Gossip router enabled.
Procedure
-
Create an
Infinispan
CR for each Data Grid cluster. -
Specify the name of the local site with the
spec.service.sites.local.name
field. -
For the LON cluster, set
false
as the value for thespec.service.sites.local.discovery.launchGossipRouter
field. -
For the LON cluster, specify the
url
with thespec.service.sites.locations.url
to connect to the NYC. In the NYC configuration, do not specify the
spec.service.sites.locations.url
.LON
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: infinispan spec: replicas: 3 service: type: DataGrid sites: local: name: LON discovery: launchGossipRouter: false locations: - name: NYC url: infinispan+xsite://infinispan-nyc.myhost.com:7900
NYC
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: infinispan spec: replicas: 3 service: type: DataGrid sites: local: name: NYC locations: - name: LON
If you have three or more sites, Data Grid recommends to keep the Gossip router enabled on all the remote sites. When you have multiple Gossip routers and one of them becomes unavailable, the remaining routers continue exchanging messages. If a single Gossip router is defined, and it becomes unavailable, the connection between the remote sites breaks.
Next steps
If your clusters have formed a cross-site view, you can start adding backup locations to caches.
Additional resources
13.6. Resources for configuring cross-site replication
The following tables provides fields and descriptions for cross-site resources.
Field | Description |
---|---|
| Data Grid supports cross-site replication with Data Grid service clusters only. |
Field | Description |
---|---|
| Names the local site where a Data Grid cluster runs. |
|
Specifies the maximum number of pods that can send RELAY messages for cross-site replication. The default value is |
|
If |
|
Allocates the amount of memory in bytes. It uses the following format |
|
Allocates the number of CPU units. It uses the following format |
|
Specifies the network service for cross-site replication. Data Grid clusters use this service to communicate and perform backup operations. You can set the value to |
|
Specifies a static port within the default range of |
|
Specifies the network port for the service if you expose Data Grid through a |
|
Specifies a custom hostname if you expose Data Grid through an OpenShift |
Field | Description |
---|---|
| Provides connection information for all backup locations. |
|
Specifies a backup location that matches |
| Specifies the URL of the Kubernetes API for managed connections or a static URL for manual connections.
Use
Note that the
Use the |
| Specifies the secret that contains the service account token for the backup site. |
| Specifies the cluster name at the backup location if it is different to the cluster name at the local site. |
| Specifies the namespace of the Data Grid cluster at the backup location if it does not match the namespace at the local site. |
Managed cross-site connections
spec: service: type: DataGrid sites: local: name: LON expose: type: LoadBalancer maxRelayNodes: 1 locations: - name: NYC clusterName: <nyc_cluster_name> namespace: <nyc_cluster_namespace> url: openshift://api.site-b.devcluster.openshift.com:6443 secretName: nyc-token
Manual cross-site connections
spec: service: type: DataGrid sites: local: name: LON expose: type: LoadBalancer port: 65535 maxRelayNodes: 1 locations: - name: NYC url: infinispan+xsite://infinispan-nyc.myhost.com:7900
13.7. Securing cross-site connections
Add keystores and trust stores so that Data Grid clusters can secure cross-site replication traffic.
You must add a keystore to use an OpenShift Route
as the expose type for cross-site replication. Securing cross-site connections is optional if you use a NodePort
or LoadBalancer
as the expose type.
Cross-site replication does not support the OpenShift CA service. You must provide your own certificates.
Prerequisites
Have a PKCS12 keystore that Data Grid can use to encrypt and decrypt RELAY messages.
You must provide a keystore for relay pods and router pods to secure cross-site connections.
The keystore can be the same for relay pods and router pods or you can provide separate keystores for each.
You can also use the same keystore for each Data Grid cluster or a unique keystore for each cluster.- Have a PKCS12 trust store that contains part of the certificate chain or root CA certificate that verifies public certificates for Data Grid relay pods and router pods.
Procedure
Create cross-site encryption secrets.
- Create keystore secrets.
- Create trust store secrets.
-
Modify the
Infinispan
CR for each Data Grid cluster to specify the secret name for theencryption.transportKeyStore.secretName
andencryption.routerKeyStore.secretName
fields. Configure any other fields to encrypt RELAY messages as required and then apply the changes.
apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: infinispan spec: replicas: 2 version: <Data Grid_version> expose: type: LoadBalancer service: type: DataGrid sites: local: name: SiteA # ... encryption: protocol: TLSv1.3 transportKeyStore: secretName: transport-tls-secret alias: transport filename: keystore.p12 routerKeyStore: secretName: router-tls-secret alias: router filename: keystore.p12 trustStore: secretName: truststore-tls-secret filename: truststore.p12 locations: # ...
13.7.1. Resources for configuring cross-site encryption
The following tables provides fields and descriptions for encrypting cross-site connections.
Field | Description |
---|---|
|
Specifies the TLS protocol to use for cross-site connections. The default value is |
| Configures a keystore secret for relay pods. |
| Configures a keystore secret for router pods. |
| Configures a trust store secret for relay pods and router pods. |
Field | Description |
---|---|
| Specifies the secret that contains a keystore that relay pods can use to encrypt and decrypt RELAY messages. This field is required. |
|
Optionally specifies the alias of the certificate in the keystore. The default value is |
|
Optionally specifies the filename of the keystore. The default value is |
Field | Description |
---|---|
| Specifies the secret that contains a keystore that router pods can use to encrypt and decrypt RELAY messages. This field is required. |
|
Optionally specifies the alias of the certificate in the keystore. The default value is |
|
Optionally specifies the filename of the keystore. The default value is |
Field | Description |
---|---|
| Specifies the secret that contains a trust store to verify public certificates for relay pods and router pods. This field is required. |
|
Optionally specifies the filename of the trust store. The default value is |
13.7.2. Cross-site encryption secrets
Cross-site replication encryption secrets add keystores and trust store for securing cross-site connections.
Cross-site encryption secrets
apiVersion: v1 kind: Secret metadata: name: tls-secret type: Opaque stringData: password: changeme type: pkcs12 data: <file-name>: "MIIKDgIBAzCCCdQGCSqGSIb3DQEHA..."
Field | Description |
---|---|
| Specifies the password for the keystore or trust store. |
|
Optionally specifies the keystore or trust store type. The default value is |
| Adds a base64-encoded keystore or trust store. |
13.8. Configuring sites in the same OpenShift cluster
For evaluation and demonstration purposes, you can configure Data Grid to back up between pods in the same OpenShift cluster.
Using ClusterIP
as the expose type for cross-site replication is intended for demonstration purposes only. It would be appropriate to use this expose type only to perform a temporary proof-of-concept deployment on a laptop or something of that nature.
Procedure
-
Create an
Infinispan
CR for each Data Grid cluster. -
Specify the name of the local site with
spec.service.sites.local.name
. -
Set
ClusterIP
as the value of thespec.service.sites.local.expose.type
field. -
Provide the name of the Data Grid cluster that acts as a backup location with
spec.service.sites.locations.clusterName
. If both Data Grid clusters have the same name, specify the namespace of the backup location with
spec.service.sites.locations.namespace
.apiVersion: infinispan.org/v1 kind: Infinispan metadata: name: example-clustera spec: replicas: 1 expose: type: LoadBalancer service: type: DataGrid sites: local: name: SiteA expose: type: ClusterIP maxRelayNodes: 1 locations: - name: SiteB clusterName: example-clusterb namespace: cluster-namespace
-
Configure your
Infinispan
CRs with any other Data Grid service resources and then apply the changes. Verify that Data Grid clusters form a cross-site view.
Retrieve the
Infinispan
CR.oc get infinispan -o yaml
-
Check for the
type: CrossSiteViewFormed
condition.