此内容没有您所选择的语言版本。
Chapter 18. Backing up and restoring Data Grid clusters
Data Grid Operator lets you back up and restore Data Grid cluster state for disaster recovery and to migrate Data Grid resources between clusters.
18.1. Backup and Restore CRs
Backup
and Restore
CRs save in-memory data at runtime so you can easily recreate Data Grid clusters.
Applying a Backup
or Restore
CR creates a new pod that joins the Data Grid cluster as a zero-capacity member, which means it does not require cluster rebalancing or state transfer to join.
For backup operations, the pod iterates over cache entries and other resources and creates an archive, a .zip
file, in the /opt/infinispan/backups
directory on the persistent volume (PV).
Performing backups does not significantly impact performance because the other pods in the Data Grid cluster only need to respond to the backup pod as it iterates over cache entries.
For restore operations, the pod retrieves Data Grid resources from the archive on the PV and applies them to the Data Grid cluster.
When either the backup or restore operation completes, the pod leaves the cluster and is terminated.
Reconciliation
Data Grid Operator does not reconcile Backup
and Restore
CRs which mean that backup and restore operations are "one-time" events.
Modifying an existing Backup
or Restore
CR instance does not perform an operation or have any effect. If you want to update .spec
fields, you must create a new instance of the Backup
or Restore
CR.
18.2. Backing up Data Grid clusters
Create a backup file that stores Data Grid cluster state to a persistent volume.
Prerequisites
-
Create an
Infinispan
CR withspec.service.type: DataGrid
. Ensure there are no active client connections to the Data Grid cluster.
Data Grid backups do not provide snapshot isolation and data modifications are not written to the archive after the cache is backed up.
To archive the exact state of the cluster, you should always disconnect any clients before you back it up.
Procedure
-
Name the
Backup
CR with themetadata.name
field. -
Specify the Data Grid cluster to backup with the
spec.cluster
field. Configure the persistent volume claim (PVC) that adds the backup archive to the persistent volume (PV) with the
spec.volume.storage
andspec.volume.storage.storageClassName
fields.apiVersion: infinispan.org/v2alpha1 kind: Backup metadata: name: my-backup spec: cluster: source-cluster volume: storage: 1Gi storageClassName: my-storage-class
Optionally include
spec.resources
fields to specify which Data Grid resources you want to back up.If you do not include any
spec.resources
fields, theBackup
CR creates an archive that contains all Data Grid resources. If you do specifyspec.resources
fields, theBackup
CR creates an archive that contains those resources only.spec: ... resources: templates: - distributed-sync-prod - distributed-sync-dev caches: - cache-one - cache-two counters: - counter-name protoSchemas: - authors.proto - books.proto tasks: - wordStream.js
You can also use the
*
wildcard character as in the following example:spec: ... resources: caches: - "*" protoSchemas: - "*"
Apply your
Backup
CR.oc apply -f my-backup.yaml
Verification
Check that the
status.phase
field has a status ofSucceeded
in theBackup
CR and that Data Grid logs have the following message:ISPN005044: Backup file created 'my-backup.zip'
Run the following command to check that the backup is successfully created:
oc describe Backup my-backup
18.3. Restoring Data Grid clusters
Restore Data Grid cluster state from a backup archive.
Prerequisites
-
Create a
Backup
CR on a source cluster. Create a target Data Grid cluster of Data Grid service pods.
NoteIf you restore an existing cache, the operation overwrites the data in the cache but not the cache configuration.
For example, you back up a distributed cache named
mycache
on the source cluster. You then restoremycache
on a target cluster where it already exists as a replicated cache. In this case, the data from the source cluster is restored andmycache
continues to have a replicated configuration on the target cluster.Ensure there are no active client connections to the target Data Grid cluster you want to restore.
Cache entries that you restore from a backup can overwrite more recent cache entries.
For example, a client performs acache.put(k=2)
operation and you then restore a backup that containsk=1
.
Procedure
-
Name the
Restore
CR with themetadata.name
field. -
Specify a
Backup
CR to use with thespec.backup
field. Specify the Data Grid cluster to restore with the
spec.cluster
field.apiVersion: infinispan.org/v2alpha1 kind: Restore metadata: name: my-restore spec: backup: my-backup cluster: target-cluster
Optionally add the
spec.resources
field to restore specific resources only.spec: ... resources: templates: - distributed-sync-prod - distributed-sync-dev caches: - cache-one - cache-two counters: - counter-name protoSchemas: - authors.proto - books.proto tasks: - wordStream.js
Apply your
Restore
CR.oc apply -f my-restore.yaml
Verification
Check that the
status.phase
field has a status ofSucceeded
in theRestore
CR and that Data Grid logs have the following message:ISPN005045: Restore 'my-backup' complete
You should then open the Data Grid Console or establish a CLI connection to verify data and Data Grid resources are restored as expected.
18.4. Backup and restore status
Backup
and Restore
CRs include a status.phase
field that provides the status for each phase of the operation.
Status | Description |
---|---|
| The system has accepted the request and the controller is preparing the underlying resources to create the pod. |
| The controller has prepared all underlying resources successfully. |
| The pod is created and the operation is in progress on the Data Grid cluster. |
| The operation has completed successfully on the Data Grid cluster and the pod is terminated. |
| The operation did not successfully complete and the pod is terminated. |
| The controller cannot obtain the status of the pod or determine the state of the operation. This condition typically indicates a temporary communication error with the pod. |
18.4.1. Handling failed backup and restore operations
If the status.phase
field of the Backup
or Restore
CR is Failed
, you should examine pod logs to determine the root cause before you attempt the operation again.
Procedure
Examine the logs for the pod that performed the failed operation.
Pods are terminated but remain available until you delete the
Backup
orRestore
CR.oc logs <backup|restore_pod_name>
- Resolve any error conditions or other causes of failure as indicated by the pod logs.
-
Create a new instance of the
Backup
orRestore
CR and attempt the operation again.