Este conteúdo não está disponível no idioma selecionado.
Chapter 13. Replacing storage devices
13.1. Replacing operational or failed storage devices on Red Hat OpenStack Platform installer-provisioned infrastructure Copiar o linkLink copiado para a área de transferência!
Use this procedure to replace storage device in OpenShift Container Storage which is deployed on Red Hat OpenStack Platform. This procedure helps to create a new Persistent Volume Claim (PVC) on a new volume and remove the old object storage device (OSD).
Procedure
Identify the OSD that needs to be replaced and the OpenShift Container Platform node that has the OSD scheduled on it.
oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
rook-ceph-osd-0-6d77d6c7c6-m8xj6 0/1 CrashLoopBackOff 0 24h 10.129.0.16 compute-2 <none> <none> rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 24h 10.128.2.24 compute-0 <none> <none> rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 24h 10.130.0.18 compute-1 <none> <none>
rook-ceph-osd-0-6d77d6c7c6-m8xj6 0/1 CrashLoopBackOff 0 24h 10.129.0.16 compute-2 <none> <none> rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 24h 10.128.2.24 compute-0 <none> <none> rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 24h 10.130.0.18 compute-1 <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
rook-ceph-osd-0-6d77d6c7c6-m8xj6needs to be replaced andcompute-2is the OpenShift Container platform node on which the OSD is scheduled.NoteIf the OSD to be replaced is healthy, the status of the pod will be
Running.Scale down the OSD deployment for the OSD to be replaced.
osd_id_to_remove=0 oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0# osd_id_to_remove=0 # oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow where,
osd_id_to_removeis the integer in the pod name immediately after therook-ceph-osdprefix. In this example, the deployment name isrook-ceph-osd-0.Example output:
deployment.extensions/rook-ceph-osd-0 scaled
deployment.extensions/rook-ceph-osd-0 scaledCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
rook-ceph-osdpod is terminated.oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}# oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
No resources found.
No resources found.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf the
rook-ceph-osdpod is interminatingstate, use theforceoption to delete the pod.oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0
# oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the old OSD from the cluster so that a new OSD can be added.
Delete any old
ocs-osd-removaljobs.oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}$ oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
job.batch "ocs-osd-removal-0" deleted
job.batch "ocs-osd-removal-0" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
openshift-storageproject.oc project openshift-storage
$ oc project openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the old OSD from the cluster.
oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f -$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow WarningThis step results in OSD being completely removed from the cluster. Ensure that the correct value of
osd_id_to_removeis provided.
Verify that the OSD is removed successfully by checking the status of the
ocs-osd-removalpod. A status ofCompletedconfirms that the OSD removal job succeeded.oc get pod -l job-name=ocs-osd-removal-${osd_id_to_remove} -n openshift-storage# oc get pod -l job-name=ocs-osd-removal-${osd_id_to_remove} -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf
ocs-osd-removalfails and the pod is not in the expectedCompletedstate, check the pod logs for further debugging. For example:oc logs -l job-name=ocs-osd-removal-${osd_id_to_remove} -n openshift-storage --tail=-1# oc logs -l job-name=ocs-osd-removal-${osd_id_to_remove} -n openshift-storage --tail=-1Copy to Clipboard Copied! Toggle word wrap Toggle overflow If encryption was enabled at the time of install, remove
dm-cryptmanageddevice-mappermapping from the OSD devices that are removed from the respective OpenShift Container Storage nodes.Get PVC name(s) of the replaced OSD(s) from the logs of
ocs-osd-removal-jobpod :oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i ‘pvc|deviceset’
$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i ‘pvc|deviceset’Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxxx-xxx-xxx-xxx"
2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxxx-xxx-xxx-xxx"Copy to Clipboard Copied! Toggle word wrap Toggle overflow For each of the nodes identified in step #1, do the following:
Create a
debugpod andchrootto the host on the storage node.oc debug node/<node name> chroot /host
$ oc debug node/<node name> $ chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow Find relevant device name based on the PVC names identified in the previous step
dmsetup ls| grep <pvc name>
sh-4.4# dmsetup ls| grep <pvc name> ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt (253:0)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the mapped device.
cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
$ cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcryptCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf the above command gets stuck due to insufficient privileges, run the following commands:
-
Press
CTRL+Zto exit the above command. Find PID of the process which was stuck.
ps -ef | grep crypt
$ ps -ef | grep cryptCopy to Clipboard Copied! Toggle word wrap Toggle overflow Terminate the process using
killcommand.kill -9 <PID>
$ kill -9 <PID>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the device name is removed.
dmsetup ls
$ dmsetup lsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
-
Press
Delete the
ocs-osd-removaljob.oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}# oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
job.batch "ocs-osd-removal-0" deleted
job.batch "ocs-osd-removal-0" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verfication steps
Verify that there is a new OSD running.
oc get -n openshift-storage pods -l app=rook-ceph-osd
# oc get -n openshift-storage pods -l app=rook-ceph-osdCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
rook-ceph-osd-0-5f7f4747d4-snshw 1/1 Running 0 4m47s rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 1d20h rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 1d20h
rook-ceph-osd-0-5f7f4747d4-snshw 1/1 Running 0 4m47s rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 1d20h rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 1d20hCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that there is a new PVC created which is in
Boundstate.oc get -n openshift-storage pvc
# oc get -n openshift-storage pvcCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-0 Bound pvc-b44ebb5e-3c67-4000-998e-304752deb5a7 50Gi RWO ocs-storagecluster-ceph-rbd 6d ocs-deviceset-0-data-0-gwb5l Bound pvc-bea680cd-7278-463d-a4f6-3eb5d3d0defe 512Gi RWO standard 94s ocs-deviceset-1-data-0-w9pjm Bound pvc-01aded83-6ef1-42d1-a32e-6ca0964b96d4 512Gi RWO standard 6d ocs-deviceset-2-data-0-7bxcq Bound pvc-5d07cd6c-23cb-468c-89c1-72d07040e308 512Gi RWO standard 6d
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-0 Bound pvc-b44ebb5e-3c67-4000-998e-304752deb5a7 50Gi RWO ocs-storagecluster-ceph-rbd 6d ocs-deviceset-0-data-0-gwb5l Bound pvc-bea680cd-7278-463d-a4f6-3eb5d3d0defe 512Gi RWO standard 94s ocs-deviceset-1-data-0-w9pjm Bound pvc-01aded83-6ef1-42d1-a32e-6ca0964b96d4 512Gi RWO standard 6d ocs-deviceset-2-data-0-7bxcq Bound pvc-5d07cd6c-23cb-468c-89c1-72d07040e308 512Gi RWO standard 6dCopy to Clipboard Copied! Toggle word wrap Toggle overflow (Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
Identify the node(s) where the new OSD pod(s) are running.
oc get -o=custom-columns=NODE:.spec.nodeName pod/<OSD pod name>
$ oc get -o=custom-columns=NODE:.spec.nodeName pod/<OSD pod name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc get -o=custom-columns=NODE:.spec.nodeName pod/rook-ceph-osd-0-544db49d7f-qrgqm
oc get -o=custom-columns=NODE:.spec.nodeName pod/rook-ceph-osd-0-544db49d7f-qrgqmCopy to Clipboard Copied! Toggle word wrap Toggle overflow For each of the nodes identified in previous step, do the following:
Create a debug pod and open a chroot environment for the selected host(s).
oc debug node/<node name> chroot /host
$ oc debug node/<node name> $ chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run “lsblk” and check for the “crypt” keyword beside the
ocs-devicesetname(s)lsblk
$ lsblkCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Log in to OpenShift Web Console and view the storage dashboard.
Figure 13.1. OSD status in OpenShift Container Platform storage dashboard after device replacement