2.3. Storage class with single replica


You can create a storage class with a single replica to be used by your applications. This avoids redundant data copies and allows resiliency management on the application level.

주의

Enabling this feature creates a single replica pool without data replication, increasing the risk of data loss, data corruption, and potential system instability if your application does not have its own replication. If any OSDs are lost, this feature requires very disruptive steps to recover. All applications can lose their data, and must be recreated in case of a failed OSD.

Prerequisites

  • Make sure to use one new storage device or OSD per failure domain.

Procedure

  1. Enable the single replica feature using the following command:

    $ oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephNonResilientPools/enable", "value": true }]'
  2. Verify storagecluster is in Ready state:

    $ oc get storagecluster

    Example output:

    NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
    ocs-storagecluster   10m   Ready              2024-02-05T13:56:15Z   4.15.0
  3. New cephblockpools are created for each failure domain. Verify cephblockpools are in Ready state:

    $ oc get cephblockpools

    Example output:

    NAME                                          PHASE
    ocs-storagecluster-cephblockpool              Ready
    ocs-storagecluster-cephblockpool-us-east-1a   Ready
    ocs-storagecluster-cephblockpool-us-east-1b   Ready
    ocs-storagecluster-cephblockpool-us-east-1c   Ready
  4. Verify new storage classes have been created:

    $ oc get storageclass

    Example output:

    NAME                                        PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    gp2 (default)                               kubernetes.io/aws-ebs                   Delete          WaitForFirstConsumer   true                   104m
    gp2-csi                                     ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   104m
    gp3-csi                                     ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   104m
    ocs-storagecluster-ceph-non-resilient-rbd   openshift-storage.rbd.csi.ceph.com      Delete          WaitForFirstConsumer   true                   46m
    ocs-storagecluster-ceph-rbd                 openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   52m
    ocs-storagecluster-cephfs                   openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   52m
    openshift-storage.noobaa.io                 openshift-storage.noobaa.io/obc         Delete          Immediate              false                  50m
  5. Three new OSDs are deployed on the new devices after the storage class with single replica is enabled; 3 osd-prepare pods and 3 additional pods. Verify new OSD pods are in Running state:

    $ oc get pods | grep osd

    Example output:

    rook-ceph-osd-0-6dc76777bc-snhnm                                  2/2     Running     0               9m50s
    rook-ceph-osd-1-768bdfdc4-h5n7k                                   2/2     Running     0               9m48s
    rook-ceph-osd-2-69878645c4-bkdlq                                  2/2     Running     0               9m37s
    rook-ceph-osd-3-64c44d7d76-zfxq9                                  2/2     Running     0               5m23s
    rook-ceph-osd-4-654445b78f-nsgjb                                  2/2     Running     0               5m23s
    rook-ceph-osd-5-5775949f57-vz6jp                                  2/2     Running     0               5m22s
    rook-ceph-osd-prepare-ocs-deviceset-gp2-0-data-0x6t87-59swf       0/1     Completed   0               10m
    rook-ceph-osd-prepare-ocs-deviceset-gp2-1-data-0klwr7-bk45t       0/1     Completed   0               10m
    rook-ceph-osd-prepare-ocs-deviceset-gp2-2-data-0mk2cz-jx7zv       0/1     Completed   0               10m

2.3.1. Recovering after OSD lost from single replica

When using replica 1, a storage class with a single replica, data loss is guaranteed when an OSD is lost. Lost OSDs go into a failing state. Use the following steps to recover after OSD loss.

Procedure

Follow these recovery steps to get your applications running again after data loss from replica 1. You first need to identify the domain where the failing OSD is.

  1. If you know which failure domain the failing OSD is in, run the following command to get the exact replica1-pool-name required for the next steps. If you do not know where the failing OSD is, skip to step 2.

    $ oc get cephblockpools

    Example output:

    NAME                                          PHASE
    ocs-storagecluster-cephblockpool              Ready
    ocs-storagecluster-cephblockpool-us-south-1   Ready
    ocs-storagecluster-cephblockpool-us-south-2   Ready
    ocs-storagecluster-cephblockpool-us-south-3   Ready

    Copy the corresponding failure domain name for use in next steps, then skip to step 4.

  2. Find the OSD pod that is in Error state or CrashLoopBackoff state to find the failing OSD:

    $ oc get pods -n openshift-storage -l app=rook-ceph-osd  | grep 'CrashLoopBackOff\|Error'
  3. Identify the replica-1 pool that had the failed OSD.

    1. Identify the node where the failed OSD was running:

      failed_osd_id=0 #replace with the ID of the failed OSD
    2. Identify the failureDomainLabel for the node where the failed OSD was running:

      failure_domain_label=$(oc get storageclass ocs-storagecluster-ceph-non-resilient-rbd -o yaml | grep domainLabel |head -1 |awk -F':' '{print $2}')
      failure_domain_value=$”(oc get pods $failed_osd_id -oyaml |grep topology-location-zone |awk ‘{print $2}’)”

      The output shows the replica-1 pool name whose OSD is failing, for example:

      replica1-pool-name= "ocs-storagecluster-cephblockpool-$failure_domain_value”

      where $failure_domain_value is the failureDomainName.

  4. Delete the replica-1 pool.

    1. Connect to the toolbox pod:

      toolbox=$(oc get pod -l app=rook-ceph-tools -n openshift-storage -o jsonpath='{.items[*].metadata.name}')
      
      oc rsh $toolbox -n openshift-storage
    2. Delete the replica-1 pool. Note that you have to enter the replica-1 pool name twice in the command, for example:

      ceph osd pool rm <replica1-pool-name> <replica1-pool-name> --yes-i-really-really-mean-it

      Replace replica1-pool-name with the failure domain name identified earlier.

  5. Purge the failing OSD by following the steps in section "Replacing operational or failed storage devices" based on your platform in the Replacing devices guide.
  6. Restart the rook-ceph operator:

    $ oc delete pod -l rook-ceph-operator -n openshift-storage
  7. Recreate any affected applications in that avaialbity zone to start using the new pool with same name.
Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 문서 정보

Legal Notice

Theme

© 2026 Red Hat
맨 위로 이동