主页
产品
Red Hat OpenShift Data Foundation
4.9
Deploying and managing OpenShift Data Foundation using Red Hat OpenStack Platform
Chapter 15. Replacing storage devices

Chapter 15. Replacing storage devices

15.1. Replacing operational or failed storage devices on Red Hat OpenStack Platform installer-provisioned infrastructure
复制链接

Use this procedure to replace storage device in OpenShift Data Foundation which is deployed on Red Hat OpenStack Platform. This procedure helps to create a new Persistent Volume Claim (PVC) on a new volume and remove the old object storage device (OSD).

Procedure

Identify the OSD that needs to be replaced and the OpenShift Container Platform node that has the OSD scheduled on it.

oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide

$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide

Copy to Clipboard

Toggle word wrap

Example output:

rook-ceph-osd-0-6d77d6c7c6-m8xj6    0/1    CrashLoopBackOff    0    24h   10.129.0.16   compute-2   <none>           <none>
rook-ceph-osd-1-85d99fb95f-2svc7    1/1    Running             0    24h   10.128.2.24   compute-0   <none>           <none>
rook-ceph-osd-2-6c66cdb977-jp542    1/1    Running             0    24h   10.130.0.18   compute-1   <none>           <none>

rook-ceph-osd-0-6d77d6c7c6-m8xj6    0/1    CrashLoopBackOff    0    24h   10.129.0.16   compute-2   <none>           <none>
rook-ceph-osd-1-85d99fb95f-2svc7    1/1    Running             0    24h   10.128.2.24   compute-0   <none>           <none>
rook-ceph-osd-2-6c66cdb977-jp542    1/1    Running             0    24h   10.130.0.18   compute-1   <none>           <none>

Copy to Clipboard

Toggle word wrap

In this example, rook-ceph-osd-0-6d77d6c7c6-m8xj6 needs to be replaced and compute-2 is the OpenShift Container platform node on which the OSD is scheduled.

Note

If the OSD to be replaced is healthy, the status of the pod will be Running.

Scale down the OSD deployment for the OSD to be replaced.
```
osd_id_to_remove=0
oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
```
```
$ osd_id_to_remove=0
$ oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
```
Copy to Clipboard Toggle word wrap
where, osd_id_to_remove is the integer in the pod name immediately after the rook-ceph-osd prefix. In this example, the deployment name is rook-ceph-osd-0.
Example output:
```
deployment.extensions/rook-ceph-osd-0 scaled
```
```
deployment.extensions/rook-ceph-osd-0 scaled
```
Copy to Clipboard Toggle word wrap

Verify that the rook-ceph-osd pod is terminated.

oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}

$ oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}

Copy to Clipboard

Toggle word wrap

Example output:

No resources found.

No resources found.

Copy to Clipboard

Toggle word wrap

Note

If the rook-ceph-osd pod is in terminating state, use the force option to delete the pod.

oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0

$ oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0

Copy to Clipboard

Toggle word wrap

Example output:

warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
  pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted

warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
  pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted

Copy to Clipboard

Toggle word wrap

Incase, the persistent volume associated with the failed OSD fails, get the failed persistent volumes details and delete them using the following commands:
```
oc get pv
oc delete pv <failed-pv-name>
```
```
$ oc get pv
$ oc delete pv <failed-pv-name>
```
Copy to Clipboard Toggle word wrap
Remove the old OSD from the cluster so that a new OSD can be added.
1. Delete any old ocs-osd-removal jobs.
  $ oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}
  Copy to Clipboard Toggle word wrap
  Example output:
  job.batch "ocs-osd-removal-0" deleted
  Copy to Clipboard Toggle word wrap
2. Change to the openshift-storage project.
  $ oc project openshift-storage
  Copy to Clipboard Toggle word wrap
3. Remove the old OSD from the cluster.
  $ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f -
  Copy to Clipboard Toggle word wrap
  You can remove more than one OSD by adding comma separated OSD IDs in the command. (For example: FAILED_OSD_IDS=0,1,2)
  Warning
  This step results in OSD being completely removed from the cluster. Ensure that the correct value of osd_id_to_remove is provided.
Verify that the OSD was removed successfully by checking the status of the ocs-osd-removal-job pod.
A status of Completed confirms that the OSD removal job succeeded.
```
oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
```
```
# oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
```
Copy to Clipboard Toggle word wrap

Ensure that the OSD removal is completed.

oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'

$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'

Copy to Clipboard

Toggle word wrap

Example output:

2022-05-10 06:50:04.501511 I | cephosd: completed removal of OSD 0

2022-05-10 06:50:04.501511 I | cephosd: completed removal of OSD 0

Copy to Clipboard

Toggle word wrap

Important

If the ocs-osd-removal-job fails and the pod is not in the expected Completed state, check the pod logs for further debugging.

For example:

oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1

# oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1

Copy to Clipboard

Toggle word wrap

If encryption was enabled at the time of install, remove dm-crypt managed device-mapper mapping from the OSD devices that are removed from the respective OpenShift Data Foundation nodes.
1. Get PVC name(s) of the replaced OSD(s) from the logs of ocs-osd-removal-job pod :
  $ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i ‘pvc|deviceset’
  Copy to Clipboard Toggle word wrap
  For example:
  2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxxx-xxx-xxx-xxx"
  Copy to Clipboard Toggle word wrap
2. For each of the nodes identified in step #1, do the following:
  1. Create a debug pod and chroot to the host on the storage node.
    
    $ oc debug node/<node name> $ chroot /host
    
    Copy to Clipboard Toggle word wrap
  2. Find relevant device name based on the PVC names identified in the previous step
    
    sh-4.4# dmsetup ls| grep <pvc name> ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt (253:0)
    
    Copy to Clipboard Toggle word wrap
  3. Remove the mapped device.
    
    $ cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
    
    Copy to Clipboard Toggle word wrap
    
    Note
    If the above command gets stuck due to insufficient privileges, run the following commands:
    Press CTRL+Z to exit the above command.
    Find PID of the process which was stuck.
    
    $ ps -ef | grep crypt
    
    Copy to Clipboard Toggle word wrap
    
    Terminate the process using kill command.
    
    $ kill -9 <PID>
    
    Copy to Clipboard Toggle word wrap
    
    Verify that the device name is removed.
    
    $ dmsetup ls
    
    Copy to Clipboard Toggle word wrap

Delete the ocs-osd-removal job.

oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}

$ oc delete -n openshift-storage job ocs-osd-removal-${osd_id_to_remove}

Copy to Clipboard

Toggle word wrap

Example output:

job.batch "ocs-osd-removal-0" deleted

job.batch "ocs-osd-removal-0" deleted

Copy to Clipboard

Toggle word wrap

Verfication steps

Verify that there is a new OSD running.

oc get -n openshift-storage pods -l app=rook-ceph-osd

$ oc get -n openshift-storage pods -l app=rook-ceph-osd

Copy to Clipboard

Toggle word wrap

Example output:

rook-ceph-osd-0-5f7f4747d4-snshw                                  1/1     Running     0          4m47s
rook-ceph-osd-1-85d99fb95f-2svc7                                  1/1     Running     0          1d20h
rook-ceph-osd-2-6c66cdb977-jp542                                  1/1     Running     0          1d20h

rook-ceph-osd-0-5f7f4747d4-snshw                                  1/1     Running     0          4m47s
rook-ceph-osd-1-85d99fb95f-2svc7                                  1/1     Running     0          1d20h
rook-ceph-osd-2-6c66cdb977-jp542                                  1/1     Running     0          1d20h

Copy to Clipboard

Toggle word wrap

Verify that there is a new PVC created which is in Bound state.

oc get -n openshift-storage pvc

$ oc get -n openshift-storage pvc

Copy to Clipboard

Toggle word wrap

Example output:

NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-0                 Bound    pvc-b44ebb5e-3c67-4000-998e-304752deb5a7   50Gi       RWO            ocs-storagecluster-ceph-rbd   6d
ocs-deviceset-0-data-0-gwb5l   Bound    pvc-bea680cd-7278-463d-a4f6-3eb5d3d0defe   512Gi      RWO            standard                      94s
ocs-deviceset-1-data-0-w9pjm   Bound    pvc-01aded83-6ef1-42d1-a32e-6ca0964b96d4   512Gi      RWO            standard                      6d
ocs-deviceset-2-data-0-7bxcq   Bound    pvc-5d07cd6c-23cb-468c-89c1-72d07040e308   512Gi      RWO            standard                      6d

NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-0                 Bound    pvc-b44ebb5e-3c67-4000-998e-304752deb5a7   50Gi       RWO            ocs-storagecluster-ceph-rbd   6d
ocs-deviceset-0-data-0-gwb5l   Bound    pvc-bea680cd-7278-463d-a4f6-3eb5d3d0defe   512Gi      RWO            standard                      94s
ocs-deviceset-1-data-0-w9pjm   Bound    pvc-01aded83-6ef1-42d1-a32e-6ca0964b96d4   512Gi      RWO            standard                      6d
ocs-deviceset-2-data-0-7bxcq   Bound    pvc-5d07cd6c-23cb-468c-89c1-72d07040e308   512Gi      RWO            standard                      6d

Copy to Clipboard

Toggle word wrap

Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
1. Identify the node(s) where the new OSD pod(s) are running.
  $ oc get -o=custom-columns=NODE:.spec.nodeName pod/<OSD pod name>
  Copy to Clipboard Toggle word wrap
  For example:
  oc get -o=custom-columns=NODE:.spec.nodeName pod/rook-ceph-osd-0-544db49d7f-qrgqm
  Copy to Clipboard Toggle word wrap
2. For each of the nodes identified in previous step, do the following:
  1. Create a debug pod and open a chroot environment for the selected host(s).
    
    $ oc debug node/<node name> $ chroot /host
    
    Copy to Clipboard Toggle word wrap
  2. Run “lsblk” and check for the “crypt” keyword beside the ocs-deviceset name(s)
    
    $ lsblk
    
    Copy to Clipboard Toggle word wrap
Log in to OpenShift Web Console and view the storage dashboard.
Figure 15.1. OSD status in OpenShift Container Platform storage dashboard after device replacement

View larger image

返回顶部

Chapter 15. Replacing storage devices

15.1. Replacing operational or failed storage devices on Red Hat OpenStack Platform installer-provisioned infrastructure
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 15. Replacing storage devices

15.1. Replacing operational or failed storage devices on Red Hat OpenStack Platform installer-provisioned infrastructure复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

15.1. Replacing operational or failed storage devices on Red Hat OpenStack Platform installer-provisioned infrastructure
复制链接