이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 34. Cluster recovery from persistent volumes


You can recover a Kafka cluster from persistent volumes (PVs) if they are still present.

34.1. Cluster recovery scenarios

Recovering from PVs is possible in the following scenarios:

  • Unintentional deletion of a namespace
  • Loss of an entire OpenShift cluster while PVs remain in the infrastructure

The recovery procedure for both scenarios is to recreate the original PersistentVolumeClaim (PVC) resources.

34.1.1. Recovering from namespace deletion

When you delete a namespace, all resources within that namespace—including PVCs, pods, and services—are deleted. If the reclaimPolicy for the PV resource specification is set to Retain, the PV retains its data and is not deleted. This configuration allows you to recover from namespace deletion.

PV configuration to retain data

apiVersion: v1
kind: PersistentVolume
# ...
spec:
  # ...
  persistentVolumeReclaimPolicy: Retain
Copy to Clipboard Toggle word wrap

Alternatively, PVs can inherit the reclaim policy from an associated storage class. Storage classes are used for dynamic volume allocation.

By configuring the reclaimPolicy property for the storage class, PVs created with this class use the specified reclaim policy. The storage class is assigned to the PV using the storageClassName property.

Storage class configuration to retain data

apiVersion: v1
kind: StorageClass
metadata:
  name: gp2-retain
parameters:
  # ...
# ...
reclaimPolicy: Retain
Copy to Clipboard Toggle word wrap

Storage class specified for PV

apiVersion: v1
kind: PersistentVolume
# ...
spec:
  # ...
  storageClassName: gp2-retain
Copy to Clipboard Toggle word wrap

Note

When using Retain as the reclaim policy, you must manually delete PVs if you intend to delete the entire cluster.

34.1.2. Recovering from cluster loss

If you lose the entire OpenShift cluster, all resources—including PVs, PVCs, and namespaces—are lost. However, it’s possible to recover if the physical storage backing the PVs remains intact.

To recover, you need to set up a new OpenShift cluster and manually reconfigure the PVs to use the existing storage.

34.2. Recovering a deleted KRaft-based Kafka cluster

This procedure describes how to recover a deleted Kafka cluster operating in KRaft mode from persistent volumes (PVs) by recreating the original PersistentVolumeClaim (PVC) resources.

If the Topic Operator and User Operator are deployed, you can recover KafkaTopic and KafkaUser resources by recreating them. It is important that you recreate the KafkaTopic resources with the same configurations, or the Topic Operator will try to update them in Kafka. This procedure shows how to recreate both resources.

Warning

If the User Operator is enabled and Kafka users are not recreated, users are deleted from the Kafka cluster immediately after recovery.

Before you begin

In this procedure, it is essential that PVs are mounted into the correct PVC to avoid data corruption. A volumeName is specified for the PVC and this must match the name of the PV.

For more information, see Section 10.5, “Configuring Kafka storage”.

Procedure

  1. Check information on the PVs in the cluster:

    oc get pv
    Copy to Clipboard Toggle word wrap

    Information is presented for PVs with data.

    Example PV output

    NAME                                          RECLAIMPOLICY  CLAIM
    pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ...  Retain ...     myproject/data-0-my-cluster-broker-0
    pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-0-my-cluster-broker-1
    pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-0-my-cluster-broker-2
    pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ...  Retain ...     myproject/data-0-my-cluster-controller-3
    pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ...  Retain ...     myproject/data-0-my-cluster-controller-4
    pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-0-my-cluster-controller-5
    Copy to Clipboard Toggle word wrap

    • NAME is the name of each PV.
    • RECLAIMPOLICY shows that PVs are retained, meaning that the PV is not automatically deleted when the PVC is deleted.
    • CLAIM shows the link to the original PVCs.
  2. Recreate the original namespace:

    oc create namespace myproject
    Copy to Clipboard Toggle word wrap

    Here, we recreate the myproject namespace.

  3. Recreate the original PVC resource specifications, linking the PVCs to the appropriate PV:

    Example PVC resource specification

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data-0-my-cluster-broker-0
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: gp2-retain
      volumeMode: Filesystem
      volumeName: pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c
    Copy to Clipboard Toggle word wrap

  4. Edit the PV specifications to delete the claimRef properties that bound the original PVC.

    Example PV specification

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      annotations:
        kubernetes.io/createdby: aws-ebs-dynamic-provisioner
        pv.kubernetes.io/bound-by-controller: "yes"
        pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
      creationTimestamp: "<date>"
      finalizers:
      - kubernetes.io/pv-protection
      labels:
        failure-domain.beta.kubernetes.io/region: eu-west-1
        failure-domain.beta.kubernetes.io/zone: eu-west-1c
      name: pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea
      resourceVersion: "39431"
      selfLink: /api/v1/persistentvolumes/pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      uid: 7efe6b0d-3317-11ea-a650-06e1eadd9a4c
    spec:
      accessModes:
      - ReadWriteOnce
      awsElasticBlockStore:
        fsType: xfs
        volumeID: aws://eu-west-1c/vol-09db3141656d1c258
      capacity:
        storage: 100Gi
      claimRef:
        apiVersion: v1
        kind: PersistentVolumeClaim
        name: data-0-my-cluster-kafka-2
        namespace: myproject
        resourceVersion: "39113"
        uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
              - eu-west-1c
            - key: failure-domain.beta.kubernetes.io/region
              operator: In
              values:
              - eu-west-1
      persistentVolumeReclaimPolicy: Retain
      storageClassName: gp2-retain
      volumeMode: Filesystem
    Copy to Clipboard Toggle word wrap

    In the example, the following properties are deleted:

    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: data-0-my-cluster-broker-2
      namespace: myproject
      resourceVersion: "39113"
      uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
    Copy to Clipboard Toggle word wrap
  5. Deploy the Cluster Operator:

    oc create -f install/cluster-operator -n myproject
    Copy to Clipboard Toggle word wrap
  6. Recreate all KafkaTopic resources by applying the KafkaTopic resource configuration:

    oc apply -f <topic_configuration_file> -n myproject
    Copy to Clipboard Toggle word wrap
  7. Recreate all KafkaUser resources:

    1. If user passwords and certificates need to be retained, recreate the user secrets before recreating the KafkaUser resources.

      If the secrets are not recreated, the User Operator will generate new credentials automatically. Ensure that the recreated secrets have exactly the same name, labels, and fields as the original secrets.

    2. Apply the KafkaUser resource configuration:

      oc apply -f <user_configuration_file> -n myproject
      Copy to Clipboard Toggle word wrap
  8. Deploy the Kafka cluster using the original configuration for the Kafka resource. Add the annotation strimzi.io/pause-reconciliation="true" to the original configuration for the Kafka resource, and then deploy the Kafka cluster using the updated configuration.

    oc apply -f <kafka_resource_configuration>.yaml -n myproject
    Copy to Clipboard Toggle word wrap
  9. Recover the original clusterId from logs or copies of the Kafka custom resource. Otherwise, you can retrieve it from one of the volumes by spinning up a temporary pod.

    PVC_NAME="data-0-my-cluster-kafka-0"
    COMMAND="grep cluster.id /disk/kafka-log*/meta.properties | awk -F'=' '{print \$2}'"
    oc run tmp -itq --rm --restart "Never" --image "foo" --overrides "{\"spec\":
      {\"containers\":[{\"name\":\"busybox\",\"image\":\"busybox\",\"command\":[\"/bin/sh\",
      \"-c\",\"$COMMAND\"],\"volumeMounts\":[{\"name\":\"disk\",\"mountPath\":\"/disk\"}]}],
      \"volumes\":[{\"name\":\"disk\",\"persistentVolumeClaim\":{\"claimName\":
      \"$PVC_NAME\"}}]}}" -n myproject
    Copy to Clipboard Toggle word wrap
  10. Edit the Kafka resource to set the .status.clusterId with the recovered value:

    oc edit kafka <cluster-name> --subresource status -n myproject
    Copy to Clipboard Toggle word wrap
  11. Unpause the Kafka resource reconciliation:

    oc annotate kafka my-cluster strimzi.io/pause-reconciliation=false \
      --overwrite -n myproject
    Copy to Clipboard Toggle word wrap
  12. Verify the recovery of the KafkaTopic resources:

    oc get kafkatopics -o wide -w -n myproject
    Copy to Clipboard Toggle word wrap

    Kafka topic status

    NAME         CLUSTER     PARTITIONS  REPLICATION FACTOR READY
    my-topic-1   my-cluster  10          3                  True
    my-topic-2   my-cluster  10          3                  True
    my-topic-3   my-cluster  10          3                  True
    Copy to Clipboard Toggle word wrap

    KafkaTopic custom resource creation is successful when the READY output shows True.

  13. Verify the recovery of the KafkaUser resources:

    oc get kafkausers -o wide -w -n myproject
    Copy to Clipboard Toggle word wrap

    Kafka user status

    NAME       CLUSTER     AUTHENTICATION  AUTHORIZATION READY
    my-user-1  my-cluster  tls             simple        True
    my-user-2  my-cluster  tls             simple        True
    my-user-3  my-cluster  tls             simple        True
    Copy to Clipboard Toggle word wrap

    KafkaUser custom resource creation is successful when the READY output shows True.

34.3. Recovering a deleted ZooKeeper-based Kafka cluster

This procedure describes how to recover a deleted Kafka cluster operating in a ZooKeeper-based environment from persistent volumes (PVs) by recreating the original PersistentVolumeClaim (PVC) resources.

If the Topic Operator and User Operator are deployed, you can recover KafkaTopic and KafkaUser resources by recreating them. It is important that you recreate the KafkaTopic resources with the same configurations, or the Topic Operator will try to update them in Kafka. This procedure shows how to recreate both resources.

Warning

If the User Operator is enabled and Kafka users are not recreated, users are deleted from the Kafka cluster immediately after recovery.

Before you begin

In this procedure, it is essential that PVs are mounted into the correct PVC to avoid data corruption. A volumeName is specified for the PVC and this must match the name of the PV.

For more information, see Section 10.5, “Configuring Kafka storage”.

Procedure

  1. Check information on the PVs in the cluster:

    oc get pv
    Copy to Clipboard Toggle word wrap

    Information is presented for PVs with data.

    Example PV output

    NAME                                          RECLAIMPOLICY  CLAIM
    pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ...  Retain ...     myproject/data-my-cluster-zookeeper-1
    pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-my-cluster-zookeeper-0
    pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-my-cluster-zookeeper-2
    pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ...  Retain ...     myproject/data-0-my-cluster-kafka-0
    pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ...  Retain ...     myproject/data-0-my-cluster-kafka-1
    pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ...  Retain ...     myproject/data-0-my-cluster-kafka-2
    Copy to Clipboard Toggle word wrap

    • NAME is the name of each PV.
    • RECLAIMPOLICY shows that PVs are retained, meaning that the PV is not automatically deleted when the PVC is deleted.
    • CLAIM shows the link to the original PVCs.
  2. Recreate the original namespace:

    oc create namespace myproject
    Copy to Clipboard Toggle word wrap

    Here, we recreate the myproject namespace.

  3. Recreate the original PVC resource specifications, linking the PVCs to the appropriate PV:

    Example PVC resource specification

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data-0-my-cluster-kafka-0
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: gp2-retain
      volumeMode: Filesystem
      volumeName: pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c
    Copy to Clipboard Toggle word wrap

  4. Edit the PV specifications to delete the claimRef properties that bound the original PVC.

    Example PV specification

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      annotations:
        kubernetes.io/createdby: aws-ebs-dynamic-provisioner
        pv.kubernetes.io/bound-by-controller: "yes"
        pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
      creationTimestamp: "<date>"
      finalizers:
      - kubernetes.io/pv-protection
      labels:
        failure-domain.beta.kubernetes.io/region: eu-west-1
        failure-domain.beta.kubernetes.io/zone: eu-west-1c
      name: pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      resourceVersion: "39431"
      selfLink: /api/v1/persistentvolumes/pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      uid: 7efe6b0d-3317-11ea-a650-06e1eadd9a4c
    spec:
      accessModes:
      - ReadWriteOnce
      awsElasticBlockStore:
        fsType: xfs
        volumeID: aws://eu-west-1c/vol-09db3141656d1c258
      capacity:
        storage: 100Gi
      claimRef:
        apiVersion: v1
        kind: PersistentVolumeClaim
        name: data-0-my-cluster-kafka-2
        namespace: myproject
        resourceVersion: "39113"
        uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
              - eu-west-1c
            - key: failure-domain.beta.kubernetes.io/region
              operator: In
              values:
              - eu-west-1
      persistentVolumeReclaimPolicy: Retain
      storageClassName: gp2-retain
      volumeMode: Filesystem
    Copy to Clipboard Toggle word wrap

    In the example, the following properties are deleted:

    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: data-0-my-cluster-kafka-2
      namespace: myproject
      resourceVersion: "39113"
      uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
    Copy to Clipboard Toggle word wrap
  5. Deploy the Cluster Operator:

    oc create -f install/cluster-operator -n myproject
    Copy to Clipboard Toggle word wrap
  6. Recreate all KafkaTopic resources by applying the KafkaTopic resource configuration:

    oc apply -f <topic_configuration_file> -n myproject
    Copy to Clipboard Toggle word wrap
  7. Recreate all KafkaUser resources:

    1. If user passwords and certificates need to be retained, recreate the user secrets before recreating the KafkaUser resources.

      If the secrets are not recreated, the User Operator will generate new credentials automatically. Ensure that the recreated secrets have exactly the same name, labels, and fields as the original secrets.

    2. Apply the KafkaUser resource configuration:

      oc apply -f <user_configuration_file> -n myproject
      Copy to Clipboard Toggle word wrap
  8. Deploy the Kafka cluster using the original configuration for the Kafka resource.

    oc apply -f <kafka_resource_configuration>.yaml -n myproject
    Copy to Clipboard Toggle word wrap
  9. Verify the recovery of the KafkaTopic resources:

    oc get kafkatopics -o wide -w -n myproject
    Copy to Clipboard Toggle word wrap

    Kafka topic status

    NAME         CLUSTER     PARTITIONS  REPLICATION FACTOR READY
    my-topic-1   my-cluster  10          3                  True
    my-topic-2   my-cluster  10          3                  True
    my-topic-3   my-cluster  10          3                  True
    Copy to Clipboard Toggle word wrap

    KafkaTopic custom resource creation is successful when the READY output shows True.

  10. Verify the recovery of the KafkaUser resources:

    oc get kafkausers -o wide -w -n myproject
    Copy to Clipboard Toggle word wrap

    Kafka user status

    NAME       CLUSTER     AUTHENTICATION  AUTHORIZATION READY
    my-user-1  my-cluster  tls             simple        True
    my-user-2  my-cluster  tls             simple        True
    my-user-3  my-cluster  tls             simple        True
    Copy to Clipboard Toggle word wrap

    KafkaUser custom resource creation is successful when the READY output shows True.

맨 위로 이동
Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2025 Red Hat