Chapter 32. Cluster recovery from persistent volumes


You can recover a Kafka cluster from persistent volumes (PVs) if they are still present.

32.1. Cluster recovery scenarios

Recovering from PVs is possible in the following scenarios:

  • Unintentional deletion of a namespace
  • Loss of an entire OpenShift cluster while PVs remain in the infrastructure

The recovery procedure for both scenarios is to recreate the original PersistentVolumeClaim (PVC) resources.

32.1.1. Recovering from namespace deletion

When you delete a namespace, all resources within that namespace—including PVCs, pods, and services—are deleted. If the reclaimPolicy for the PV resource specification is set to Retain, the PV retains its data and is not deleted. This configuration allows you to recover from namespace deletion.

PV configuration to retain data

apiVersion: v1
kind: PersistentVolume
# ...
spec:
  # ...
  persistentVolumeReclaimPolicy: Retain
Copy to Clipboard Toggle word wrap

Alternatively, PVs can inherit the reclaim policy from an associated storage class. Storage classes are used for dynamic volume allocation.

By configuring the reclaimPolicy property for the storage class, PVs created with this class use the specified reclaim policy. The storage class is assigned to the PV using the storageClassName property.

Storage class configuration to retain data

apiVersion: v1
kind: StorageClass
metadata:
  name: gp2-retain
parameters:
  # ...
# ...
reclaimPolicy: Retain
Copy to Clipboard Toggle word wrap

Storage class specified for PV

apiVersion: v1
kind: PersistentVolume
# ...
spec:
  # ...
  storageClassName: gp2-retain
Copy to Clipboard Toggle word wrap

Note

When using Retain as the reclaim policy, you must manually delete PVs if you intend to delete the entire cluster.

32.1.2. Recovering from cluster loss

If you lose the entire OpenShift cluster, all resources—including PVs, PVCs, and namespaces—are lost. However, it’s possible to recover if the physical storage backing the PVs remains intact.

To recover, you need to set up a new OpenShift cluster and manually reconfigure the PVs to use the existing storage.

32.2. Recovering a deleted Kafka cluster

This procedure describes how to recover a deleted cluster from persistent volumes (PVs) by recreating the original PersistentVolumeClaim (PVC) resources.

If the Topic Operator and User Operator are deployed, you can recover KafkaTopic and KafkaUser resources by recreating them. It is important that you recreate the KafkaTopic resources with the same configurations, or the Topic Operator will try to update them in Kafka. This procedure shows how to recreate both resources.

Warning

If the User Operator is enabled and Kafka users are not recreated, users are deleted from the Kafka cluster immediately after recovery.

Before you begin

In this procedure, it is essential that PVs are mounted into the correct PVC to avoid data corruption. A volumeName is specified for the PVC and this must match the name of the PV.

For more information, see Persistent storage.

Procedure

  1. Check information on the PVs in the cluster:

    oc get pv
    Copy to Clipboard Toggle word wrap

    Information is presented for PVs with data.

    Example PV output

    NAME                                          RECLAIMPOLICY  CLAIM
    pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ...  Retain ...     myproject/data-my-cluster-zookeeper-1
    pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-my-cluster-zookeeper-0
    pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-my-cluster-zookeeper-2
    pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ...  Retain ...     myproject/data-0-my-cluster-kafka-0
    pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ...  Retain ...     myproject/data-0-my-cluster-kafka-1
    pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-0-my-cluster-kafka-2
    Copy to Clipboard Toggle word wrap

    • NAME is the name of each PV.
    • RECLAIMPOLICY shows that PVs are retained, meaning that the PV is not automatically deleted when the PVC is deleted.
    • CLAIM shows the link to the original PVCs.
  2. Recreate the original namespace:

    oc create namespace my-project
    Copy to Clipboard Toggle word wrap

    Here, we recreate the my-project namespace.

  3. Recreate the original PVC resource specifications, linking the PVCs to the appropriate PV:

    Example PVC resource specification

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data-0-my-cluster-kafka-0
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: gp2-retain
      volumeMode: Filesystem
      volumeName: pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c
    Copy to Clipboard Toggle word wrap

  4. Edit the PV specifications to delete the claimRef properties that bound the original PVC.

    Example PV specification

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      annotations:
        kubernetes.io/createdby: aws-ebs-dynamic-provisioner
        pv.kubernetes.io/bound-by-controller: "yes"
        pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
      creationTimestamp: "<date>"
      finalizers:
      - kubernetes.io/pv-protection
      labels:
        failure-domain.beta.kubernetes.io/region: eu-west-1
        failure-domain.beta.kubernetes.io/zone: eu-west-1c
      name: pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      resourceVersion: "39431"
      selfLink: /api/v1/persistentvolumes/pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      uid: 7efe6b0d-3317-11ea-a650-06e1eadd9a4c
    spec:
      accessModes:
      - ReadWriteOnce
      awsElasticBlockStore:
        fsType: xfs
        volumeID: aws://eu-west-1c/vol-09db3141656d1c258
      capacity:
        storage: 100Gi
      claimRef:
        apiVersion: v1
        kind: PersistentVolumeClaim
        name: data-0-my-cluster-kafka-2
        namespace: myproject
        resourceVersion: "39113"
        uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
              - eu-west-1c
            - key: failure-domain.beta.kubernetes.io/region
              operator: In
              values:
              - eu-west-1
      persistentVolumeReclaimPolicy: Retain
      storageClassName: gp2-retain
      volumeMode: Filesystem
    Copy to Clipboard Toggle word wrap

    In the example, the following properties are deleted:

    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: data-0-my-cluster-kafka-2
      namespace: myproject
      resourceVersion: "39113"
      uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
    Copy to Clipboard Toggle word wrap
  5. Deploy the Cluster Operator:

    oc create -f install/cluster-operator -n my-project
    Copy to Clipboard Toggle word wrap
  6. Recreate all KafkaTopic resources by applying the KafkaTopic resource configuration:

    oc apply -f <topic_configuration_file>
    Copy to Clipboard Toggle word wrap
  7. Recreate all KafkaUser resources:

    1. If user passwords and certificates need to be retained, recreate the user secrets before recreating the KafkaUser resources.

      If the secrets are not recreated, the User Operator will generate new credentials automatically. Ensure that the recreated secrets have exactly the same name, labels, and fields as the original secrets.

    2. Apply the KafkaUser resource configuration:

      oc apply -f <user_configuration_file>
      Copy to Clipboard Toggle word wrap
  8. Deploy the Kafka cluster using the original configuration for the Kafka resource:

    oc apply -f <kafka_resource_configuration>.yaml -n my-project
    Copy to Clipboard Toggle word wrap
  9. Verify the recovery of the KafkaTopic resources:

    oc get kafkatopics -o wide -w -n my-project
    Copy to Clipboard Toggle word wrap

    Kafka topic status

    NAME         CLUSTER     PARTITIONS  REPLICATION FACTOR READY
    my-topic-1   my-cluster  10          3                  True
    my-topic-2   my-cluster  10          3                  True
    my-topic-3   my-cluster  10          3                  True
    Copy to Clipboard Toggle word wrap

    KafkaTopic custom resource creation is successful when the READY output shows True.

  10. Verify the recovery of the KafkaUser resources:

    oc get kafkausers -o wide -w -n my-project
    Copy to Clipboard Toggle word wrap

    Kafka user status

    NAME       CLUSTER     AUTHENTICATION  AUTHORIZATION READY
    my-user-1  my-cluster  tls             simple        True
    my-user-2  my-cluster  tls             simple        True
    my-user-3  my-cluster  tls             simple        True
    Copy to Clipboard Toggle word wrap

    KafkaUser custom resource creation is successful when the READY output shows True.

Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat