Este contenido no está disponible en el idioma seleccionado.

Chapter 2. Dynamically provisioned OpenShift Data Foundation deployed on VMware

2.1. Replacing operational or failed storage devices on VMware infrastructure
Copiar enlace

Create a new Persistent Volume Claim (PVC) on a new volume, and remove the old object storage device (OSD) when one or more virtual machine disks (VMDK) needs to be replaced in OpenShift Data Foundation which is deployed dynamically on VMware infrastructure.

Prerequisites

Ensure that the data is resilient.
- In the OpenShift Web Console, click Storage Data Foundation.
- Click the Storage Systems tab, and then click ocs-storagecluster-storagesystem.
- In the Status card of Block and File dashboard, under the Overview tab, verify that Data Resiliency has a green tick mark.

Procedure

Identify the OSD that needs to be replaced and the OpenShift Container Platform node that has the OSD scheduled on it.

oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide

$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide

Copy to Clipboard

Toggle word wrap

Example output:

rook-ceph-osd-0-6d77d6c7c6-m8xj6    0/1    CrashLoopBackOff    0    24h   10.129.0.16   compute-2   <none>           <none>
rook-ceph-osd-1-85d99fb95f-2svc7    1/1    Running             0    24h   10.128.2.24   compute-0   <none>           <none>
rook-ceph-osd-2-6c66cdb977-jp542    1/1    Running             0    24h   10.130.0.18   compute-1   <none>           <none>

rook-ceph-osd-0-6d77d6c7c6-m8xj6    0/1    CrashLoopBackOff    0    24h   10.129.0.16   compute-2   <none>           <none>
rook-ceph-osd-1-85d99fb95f-2svc7    1/1    Running             0    24h   10.128.2.24   compute-0   <none>           <none>
rook-ceph-osd-2-6c66cdb977-jp542    1/1    Running             0    24h   10.130.0.18   compute-1   <none>           <none>

Copy to Clipboard

Toggle word wrap

In this example, rook-ceph-osd-0-6d77d6c7c6-m8xj6 needs to be replaced and compute-2 is the OpenShift Container platform node on which the OSD is scheduled.

Note

The status of the pod is Running, if the OSD you want to replace is healthy.

Scale down the OSD deployment for the OSD to be replaced.
Each time you want to replace the OSD, update the osd_id_to_remove parameter with the OSD ID, and repeat this step.
```
osd_id_to_remove=0
```
```
$ osd_id_to_remove=0
```
Copy to Clipboard Toggle word wrap
```
oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
```
```
$ oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
```
Copy to Clipboard Toggle word wrap
where, osd_id_to_remove is the integer in the pod name immediately after the rook-ceph-osd prefix. In this example, the deployment name is rook-ceph-osd-0.
Example output:
```
deployment.extensions/rook-ceph-osd-0 scaled
```
```
deployment.extensions/rook-ceph-osd-0 scaled
```
Copy to Clipboard Toggle word wrap

Verify that the rook-ceph-osd pod is terminated.

oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}

$ oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}

Copy to Clipboard

Toggle word wrap

Example output:

No resources found.

No resources found.

Copy to Clipboard

Toggle word wrap

Important

If the rook-ceph-osd pod is in terminating state, use the force option to delete the pod.

oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0

$ oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0

Copy to Clipboard

Toggle word wrap

Example output:

warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
  pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted

warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
  pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted

Copy to Clipboard

Toggle word wrap

Remove the old OSD from the cluster so that you can add a new OSD.
1. Delete any old ocs-osd-removal jobs.
  $ oc delete -n openshift-storage job ocs-osd-removal-job
  Copy to Clipboard Toggle word wrap
  Example output:
  job.batch "ocs-osd-removal-job" deleted
  Copy to Clipboard Toggle word wrap
  Note
  If the above job does not reach Completed state after 10 minutes, then the job must be deleted and rerun with FORCE_OSD_REMOVAL=true.
2. Navigate to the openshift-storage project.
  $ oc project openshift-storage
  Copy to Clipboard Toggle word wrap
3. Remove the old OSD from the cluster.
  $ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f -
  Copy to Clipboard Toggle word wrap
  The FORCE_OSD_REMOVAL value must be changed to “true” in clusters that only have three OSDs, or clusters with insufficient space to restore all three replicas of the data after the OSD is removed.
  Warning
  This step results in OSD being completely removed from the cluster. Ensure that the correct value of osd_id_to_remove is provided.
Verify that the OSD was removed successfully by checking the status of the ocs-osd-removal-job pod.
A status of Completed confirms that the OSD removal job succeeded.
```
oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
```
```
$ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
```
Copy to Clipboard Toggle word wrap

Ensure that the OSD removal is completed.

oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'

$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'

Copy to Clipboard

Toggle word wrap

Example output:

2022-05-10 06:50:04.501511 I | cephosd: completed removal of OSD 0

2022-05-10 06:50:04.501511 I | cephosd: completed removal of OSD 0

Copy to Clipboard

Toggle word wrap

Important

If the ocs-osd-removal-job pod fails and the pod is not in the expected Completed state, check the pod logs for further debugging.

For example:

oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1

# oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1

Copy to Clipboard

Toggle word wrap

If encryption was enabled at the time of install, remove dm-crypt managed device-mapper mapping from the OSD devices that are removed from the respective OpenShift Data Foundation nodes.
1. Get the PVC name(s) of the replaced OSD(s) from the logs of ocs-osd-removal-job pod.
  $ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i ‘pvc|deviceset’
  Copy to Clipboard Toggle word wrap
  Example output:
  2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxxx-xxx-xxx-xxx"
  Copy to Clipboard Toggle word wrap
2. For each of the previously identified nodes, do the following:
  1. Create a debug pod and chroot to the host on the storage node.
    
    $ oc debug node/<node name>
    
    Copy to Clipboard Toggle word wrap
    
    <node name>
    Is the name of the node.
    
    $ chroot /host
    
    Copy to Clipboard Toggle word wrap
  2. Find a relevant device name based on the PVC names identified in the previous step.
    
    $ dmsetup ls| grep <pvc name>
    
    Copy to Clipboard Toggle word wrap
    
    <pvc name>
    Is the name of the PVC.
    Example output:
    
    ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt (253:0)
    
    Copy to Clipboard Toggle word wrap
  3. Remove the mapped device.
    
    $ cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
    
    Copy to Clipboard Toggle word wrap
    
    Important
    If the above command gets stuck due to insufficient privileges, run the following commands:
    Press CTRL+Z to exit the above command.
    Find the PID of the process which was stuck.
    
    $ ps -ef | grep crypt
    
    Copy to Clipboard Toggle word wrap
    
    Terminate the process using the kill command.
    
    $ kill -9 <PID>
    
    Copy to Clipboard Toggle word wrap
    
    <PID>
    Is the process ID.
    Verify that the device name is removed.
    
    $ dmsetup ls
    
    Copy to Clipboard Toggle word wrap

Delete the ocs-osd-removal job.

oc delete -n openshift-storage job ocs-osd-removal-job

$ oc delete -n openshift-storage job ocs-osd-removal-job

Copy to Clipboard

Toggle word wrap

Example output:

job.batch "ocs-osd-removal-job" deleted

job.batch "ocs-osd-removal-job" deleted

Copy to Clipboard

Toggle word wrap

Note

When using an external key management system (KMS) with data encryption, the old OSD encryption key can be removed from the Vault server as it is now an orphan key.

Verfication steps

Verify that there is a new OSD running.

oc get -n openshift-storage pods -l app=rook-ceph-osd

$ oc get -n openshift-storage pods -l app=rook-ceph-osd

Copy to Clipboard

Toggle word wrap

Example output:

rook-ceph-osd-0-5f7f4747d4-snshw                                  1/1     Running     0          4m47s
rook-ceph-osd-1-85d99fb95f-2svc7                                  1/1     Running     0          1d20h
rook-ceph-osd-2-6c66cdb977-jp542                                  1/1     Running     0          1d20h

rook-ceph-osd-0-5f7f4747d4-snshw                                  1/1     Running     0          4m47s
rook-ceph-osd-1-85d99fb95f-2svc7                                  1/1     Running     0          1d20h
rook-ceph-osd-2-6c66cdb977-jp542                                  1/1     Running     0          1d20h

Copy to Clipboard

Toggle word wrap

Verify that there is a new PVC created which is in Bound state.

oc get -n openshift-storage pvc

$ oc get -n openshift-storage pvc

Copy to Clipboard

Toggle word wrap

Example output:

NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
ocs-deviceset-0-0-2s6w4   Bound    pvc-7c9bcaf7-de68-40e1-95f9-0b0d7c0ae2fc   512Gi      RWO            thin            5m
ocs-deviceset-1-0-q8fwh   Bound    pvc-9e7e00cb-6b33-402e-9dc5-b8df4fd9010f   512Gi      RWO            thin            1d20h
ocs-deviceset-2-0-9v8lq   Bound    pvc-38cdfcee-ea7e-42a5-a6e1-aaa6d4924291   512Gi      RWO            thin            1d20h

NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
ocs-deviceset-0-0-2s6w4   Bound    pvc-7c9bcaf7-de68-40e1-95f9-0b0d7c0ae2fc   512Gi      RWO            thin            5m
ocs-deviceset-1-0-q8fwh   Bound    pvc-9e7e00cb-6b33-402e-9dc5-b8df4fd9010f   512Gi      RWO            thin            1d20h
ocs-deviceset-2-0-9v8lq   Bound    pvc-38cdfcee-ea7e-42a5-a6e1-aaa6d4924291   512Gi      RWO            thin            1d20h

Copy to Clipboard

Toggle word wrap

Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
1. Identify the nodes where the new OSD pods are running.
  $ oc get -n openshift-storage -o=custom-columns=NODE:.spec.nodeName pod/<OSD-pod-name>
  Copy to Clipboard Toggle word wrap
  <OSD-pod-name>
  Is the name of the OSD pod.
  For example:
  
  $ oc get -n openshift-storage -o=custom-columns=NODE:.spec.nodeName pod/rook-ceph-osd-0-544db49d7f-qrgqm
  
  Copy to Clipboard Toggle word wrap
  
  Example output:
  
  NODE compute-1
  
  Copy to Clipboard Toggle word wrap
2. For each of the nodes identified in the previous step, do the following:
  1. Create a debug pod and open a chroot environment for the selected host(s).
    
    $ oc debug node/<node name>
    
    Copy to Clipboard Toggle word wrap
    
    <node name>
    Is the name of the node.
    
    $ chroot /host
    
    Copy to Clipboard Toggle word wrap
  2. Check for the crypt keyword beside the ocs-deviceset name(s).
    
    $ lsblk
    
    Copy to Clipboard Toggle word wrap
Log in to OpenShift Web Console and view the storage dashboard.

Este contenido no está disponible en el idioma seleccionado.

Chapter 2. Dynamically provisioned OpenShift Data Foundation deployed on VMware

2.1. Replacing operational or failed storage devices on VMware infrastructure
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

Chapter 2. Dynamically provisioned OpenShift Data Foundation deployed on VMware

2.1. Replacing operational or failed storage devices on VMware infrastructureCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Replacing operational or failed storage devices on VMware infrastructure
Copiar enlace