Chapter 14. Backing up and restoring Data Grid clusters


Data Grid Operator lets you back up and restore Data Grid cluster state for disaster recovery and to migrate Data Grid resources between clusters.

14.1. Backup and Restore CRs

Backup and Restore CRs save in-memory data at runtime so you can easily recreate Data Grid clusters.

Applying a Backup or Restore CR creates a new pod that joins the Data Grid cluster as a zero-capacity member, which means it does not require cluster rebalancing or state transfer to join.

For backup operations, the pod iterates over cache entries and other resources and creates an archive, a .zip file, in the /opt/infinispan/backups directory on the persistent volume (PV).

Note

Performing backups does not significantly impact performance because the other pods in the Data Grid cluster only need to respond to the backup pod as it iterates over cache entries.

For restore operations, the pod retrieves Data Grid resources from the archive on the PV and applies them to the Data Grid cluster.

When either the backup or restore operation completes, the pod leaves the cluster and is terminated.

Reconciliation

Data Grid Operator does not reconcile Backup and Restore CRs which mean that backup and restore operations are "one-time" events.

Modifying an existing Backup or Restore CR instance does not perform an operation or have any effect. If you want to update .spec fields, you must create a new instance of the Backup or Restore CR.

14.2. Backing up Data Grid clusters

Create a backup file that stores Data Grid cluster state to a persistent volume.

Prerequisites

  • Create an Infinispan CR with spec.service.type: DataGrid.
  • Ensure there are no active client connections to the Data Grid cluster.

    Data Grid backups do not provide snapshot isolation and data modifications are not written to the archive after the cache is backed up.
    To archive the exact state of the cluster, you should always disconnect any clients before you back it up.

Procedure

  1. Name the Backup CR with the metadata.name field.
  2. Specify the Data Grid cluster to backup with the spec.cluster field.
  3. Configure the persistent volume claim (PVC) that adds the backup archive to the persistent volume (PV) with the spec.volume.storage and spec.volume.storage.storageClassName fields.

    apiVersion: infinispan.org/v2alpha1
    kind: Backup
    metadata:
      name: my-backup
    spec:
      cluster: source-cluster
      volume:
        storage: 1Gi
        storageClassName: my-storage-class
    Copy to Clipboard Toggle word wrap
  4. Optionally include spec.resources fields to specify which Data Grid resources you want to back up.

    If you do not include any spec.resources fields, the Backup CR creates an archive that contains all Data Grid resources. If you do specify spec.resources fields, the Backup CR creates an archive that contains those resources only.

    spec:
      ...
      resources:
        templates:
          - distributed-sync-prod
          - distributed-sync-dev
        caches:
          - cache-one
          - cache-two
        counters:
          - counter-name
        protoSchemas:
          - authors.proto
          - books.proto
        tasks:
          - wordStream.js
    Copy to Clipboard Toggle word wrap

    You can also use the * wildcard character as in the following example:

    spec:
      ...
      resources:
        caches:
          - "*"
        protoSchemas:
          - "*"
    Copy to Clipboard Toggle word wrap
  5. Apply your Backup CR.

    $ oc apply -f my-backup.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Check that the status.phase field has a status of Succeeded in the Backup CR and that Data Grid logs have the following message:

    ISPN005044: Backup file created 'my-backup.zip'
    Copy to Clipboard Toggle word wrap
  2. Run the following command to check that the backup is successfully created:

    $ oc describe Backup my-backup
    Copy to Clipboard Toggle word wrap

14.3. Restoring Data Grid clusters

Restore Data Grid cluster state from a backup archive.

Prerequisites

  • Create a Backup CR on a source cluster.
  • Create a target Data Grid cluster of Data Grid service pods.

    Note

    If you restore an existing cache, the operation overwrites the data in the cache but not the cache configuration.

    For example, you back up a distributed cache named mycache on the source cluster. You then restore mycache on a target cluster where it already exists as a replicated cache. In this case, the data from the source cluster is restored and mycache continues to have a replicated configuration on the target cluster.

  • Ensure there are no active client connections to the target Data Grid cluster you want to restore.

    Cache entries that you restore from a backup can overwrite more recent cache entries.
    For example, a client performs a cache.put(k=2) operation and you then restore a backup that contains k=1.

Procedure

  1. Name the Restore CR with the metadata.name field.
  2. Specify a Backup CR to use with the spec.backup field.
  3. Specify the Data Grid cluster to restore with the spec.cluster field.

    apiVersion: infinispan.org/v2alpha1
    kind: Restore
    metadata:
      name: my-restore
    spec:
      backup: my-backup
      cluster: target-cluster
    Copy to Clipboard Toggle word wrap
  4. Optionally add the spec.resources field to restore specific resources only.

    spec:
      ...
      resources:
        templates:
          - distributed-sync-prod
          - distributed-sync-dev
        caches:
          - cache-one
          - cache-two
        counters:
          - counter-name
        protoSchemas:
          - authors.proto
          - books.proto
        tasks:
          - wordStream.js
    Copy to Clipboard Toggle word wrap
  5. Apply your Restore CR.

    $ oc apply -f my-restore.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • Check that the status.phase field has a status of Succeeded in the Restore CR and that Data Grid logs have the following message:

    ISPN005045: Restore 'my-backup' complete
    Copy to Clipboard Toggle word wrap

You should then open the Data Grid Console or establish a CLI connection to verify data and Data Grid resources are restored as expected.

14.4. Backup and restore status

Backup and Restore CRs include a status.phase field that provides the status for each phase of the operation.

Expand
StatusDescription

Initializing

The system has accepted the request and the controller is preparing the underlying resources to create the pod.

Initialized

The controller has prepared all underlying resources successfully.

Running

The pod is created and the operation is in progress on the Data Grid cluster.

Succeeded

The operation has completed successfully on the Data Grid cluster and the pod is terminated.

Failed

The operation did not successfully complete and the pod is terminated.

Unknown

The controller cannot obtain the status of the pod or determine the state of the operation. This condition typically indicates a temporary communication error with the pod.

If the status.phase field of the Backup or Restore CR is Failed, you should examine pod logs to determine the root cause before you attempt the operation again.

Procedure

  1. Examine the logs for the pod that performed the failed operation.

    Pods are terminated but remain available until you delete the Backup or Restore CR.

    $ oc logs <backup|restore_pod_name>
    Copy to Clipboard Toggle word wrap
  2. Resolve any error conditions or other causes of failure as indicated by the pod logs.
  3. Create a new instance of the Backup or Restore CR and attempt the operation again.
Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat