Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 8. Removing failed or unwanted Ceph Object Storage devices

The failed or unwanted Ceph OSDs (Object Storage Devices) affects the performance of the storage infrastructure. Hence, to improve the reliability and resilience of the storage cluster, you must remove the failed or unwanted Ceph OSDs.

If you have any failed or unwanted Ceph OSDs to remove:

Verify the Ceph health status.
For more information see: Verifying Ceph cluster is healthy.
Based on the provisioning of the OSDs, remove failed or unwanted Ceph OSDs.
See:
- Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation.
- Removing failed or unwanted Ceph OSDs provisioned using local storage devices.

If you are using local disks, you can reuse these disks after removing the old OSDs.

8.1. Verifying Ceph cluster is healthy
Copier lien

Storage health is visible on the Block and File and Object dashboards.

Procedure

In the OpenShift Web Console, click Storage Data Foundation.
In the Status card of the Overview tab, click Storage System and then click the storage system link from the pop up that appears.
In the Status card of the Block and File tab, verify that Storage Cluster has a green tick.
In the Details card, verify that the cluster information is displayed.

8.2. Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation
Copier lien

Follow the steps in the procedure to remove the failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation.

Important

Scaling down of cluster is supported only with the help of the Red Hat support team.

Warning

Removing an OSD when the Ceph component is not in a healthy state can result in data loss.
Removing two or more OSDs at the same time results in data loss.

Prerequisites

Check if Ceph is healthy. For more information see Verifying Ceph cluster is healthy.
Ensure no alerts are firing or any rebuilding process is in progress.

Procedure

Scale down the OSD deployment.

# oc scale deployment rook-ceph-osd-<osd-id> --replicas=0

Get the osd-prepare pod for the Ceph OSD to be removed.

# oc get deployment rook-ceph-osd-<osd-id> -oyaml | grep ceph.rook.io/pvc

Delete the osd-prepare pod.

# oc delete -n openshift-storage pod rook-ceph-osd-prepare-<pvc-from-above-command>-<pod-suffix>

Remove the failed OSD from the cluster.
```
# failed_osd_id=<osd-id>

# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
```
where, FAILED_OSD_ID is the integer in the pod name immediately after the rook-ceph-osd prefix.

Verify that the OSD is removed successfully by checking the logs.

# oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>

Optional: If you get an error as cephosd:osd.0 is NOT ok to destroy from the ocs-osd-removal-job pod in OpenShift Container Platform, see Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs.

Delete the OSD deployment.

# oc delete deployment rook-ceph-osd-<osd-id>

Verification step

To check if the OSD is deleted successfully, run:
```
# oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
```
This command must return the status as Completed.

8.3. Removing failed or unwanted Ceph OSDs provisioned using local storage devices
Copier lien

You can remove failed or unwanted Ceph provisioned using local storage devices by following the steps in the procedure.

Important

Scaling down of cluster is supported only with the help of the Red Hat support team.

Warning

Removing an OSD when the Ceph component is not in a healthy state can result in data loss.
Removing two or more OSDs at the same time results in data loss.

Prerequisites

Check if Ceph is healthy. For more information see Verifying Ceph cluster is healthy.
Ensure no alerts are firing or any rebuilding process is in progress.

Procedure

Forcibly, mark the OSD down by scaling the replicas on the OSD deployment to 0. You can skip this step if the OSD is already down due to failure.
```
# oc scale deployment rook-ceph-osd-<osd-id> --replicas=0
```
Remove the failed OSD from the cluster.
```
# failed_osd_id=<osd_id>

# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
```
where, FAILED_OSD_ID is the integer in the pod name immediately after the rook-ceph-osd prefix.

Verify that the OSD is removed successfully by checking the logs.

# oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>

Optional: If you get an error as cephosd:osd.0 is NOT ok to destroy from the ocs-osd-removal-job pod in OpenShift Container Platform, see Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs.

Delete persistent volume claim (PVC) resources associated with the failed OSD.

Get the PVC associated with the failed OSD.

# oc get -n openshift-storage -o yaml deployment rook-ceph-osd-<osd-id> | grep ceph.rook.io/pvc

Get the persistent volume (PV) associated with the PVC.
```
# oc get -n openshift-storage pvc <pvc-name>
```

Get the failed device name.

# oc get pv <pv-name-from-above-command> -oyaml | grep path

Get the prepare-pod associated with the failed OSD.

# oc describe -n openshift-storage pvc ocs-deviceset-0-0-nvs68 | grep Mounted

Delete the osd-prepare pod before removing the associated PVC.

# oc delete -n openshift-storage pod <osd-prepare-pod-from-above-command>

Delete the PVC associated with the failed OSD.

# oc delete -n openshift-storage pvc <pvc-name-from-step-a>

Remove failed device entry from the LocalVolume custom resource (CR).
1. Log in to node with the failed device.
  # oc debug node/<node_with_failed_osd>
2. Record the /dev/disk/by-id/<id> for the failed device name.
  # ls -alh /mnt/local-storage/localblock/
Optional: In case, Local Storage Operator is used for provisioning OSD, login to the machine with {osd-id} and remove the device symlink.
```
# oc debug node/<node_with_failed_osd>
```
1. Get the OSD symlink for the failed device name.
  # ls -alh /mnt/local-storage/localblock
2. Remove the symlink.
  # rm /mnt/local-storage/localblock/<failed-device-name>
Delete the PV associated to the OSD.

# oc delete pv <pv-name>

Verification step

To check if the OSD is deleted successfully, run:
```
#oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
```
This command must return the status as Completed.

8.4. Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs
Copier lien

If you get an error as cephosd:osd.0 is NOT ok to destroy from the ocs-osd-removal-job pod in OpenShift Container Platform, run the OSD removal job with FORCE_OSD_REMOVAL option to move the OSD to a destroyed state.

# oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -

Note

You must use the FORCE_OSD_REMOVAL option only if all the PGs are in active state. If not, PGs must either complete the back filling or further investigated to ensure they are active.

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 8. Removing failed or unwanted Ceph Object Storage devices

8.1. Verifying Ceph cluster is healthy
Copier lien

8.2. Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation
Copier lien

8.3. Removing failed or unwanted Ceph OSDs provisioned using local storage devices
Copier lien

8.4. Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 8. Removing failed or unwanted Ceph Object Storage devices

8.1. Verifying Ceph cluster is healthyCopier lienLien copié sur presse-papiers!

8.2. Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data FoundationCopier lienLien copié sur presse-papiers!

8.3. Removing failed or unwanted Ceph OSDs provisioned using local storage devicesCopier lienLien copié sur presse-papiers!

8.4. Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDsCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Verifying Ceph cluster is healthy
Copier lien

8.2. Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation
Copier lien

8.3. Removing failed or unwanted Ceph OSDs provisioned using local storage devices
Copier lien

8.4. Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs
Copier lien