Este conteúdo não está disponível no idioma selecionado.

Chapter 7. Changing an OSD Drive

Ceph is designed for fault tolerance, which means Ceph can operate in a degraded state without losing data. For example, Ceph can operate even if a data storage drive fails. In the context of a failed drive, the degraded state means that the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the cluster. However, if an OSD drive fails, you will have to replace the failed OSD drive and recreate the OSD manually.

When a drive fails, initially the OSD status will be down and in the cluster. Ceph health warnings will indicate that an OSD is down. Just because an OSD gets marked down doesn’t mean the drive has failed. For example, heart beating and other networking issues could get an OSD marked down even if is up.

Modern servers typically deploy with hot-swappable drives so you can pull a failed drive and replace it with a new one without bringing down the node. However, with Ceph Storage you will also have to address software-defined part of the OSD. The general procedure for replacing an OSD involves removing the OSD from your Ceph cluster, replacing the drive and then re-creating the OSD.

Check cluster health.
```
ceph health
```
```
# ceph health
```
Copy to Clipboard Toggle word wrap
If an OSD is down, identify its location in the CRUSH hierarchy.
```
ceph osd tree | grep -i down
```
```
# ceph osd tree | grep -i down
```
Copy to Clipboard Toggle word wrap
If an OSD is down and in, log in to the OSD node and try to restart it.
```
ssh {osd-node}
systemctl start ceph-osd@{osd-id}
```
```
# ssh {osd-node}
# systemctl start ceph-osd@{osd-id}
```
Copy to Clipboard Toggle word wrap
If the command indicates that the OSD is already running, it may be a heartbeat or networking issue. If you cannot restart the OSD, the drive may have failed.
Note
If the OSD is down, it will eventually get marked out. This is normal behavior for Ceph Storage. When the OSD gets marked out, other OSDs with copies of the failed OSD’s data will begin backfilling to ensure that the required number of copies exist within the cluster. While the cluster is backfilling, the cluster will be in a degraded state.
Check the failed OSD’s mount point.
If you cannot restart the OSD, you should check the mount point. If the mount point no longer appears, you can try to re-mount the OSD drive and restart the OSD. For example, if the server restarted, but lost the mount point in fstab, remount the drive.
```
df -h
```
```
# df -h
```
Copy to Clipboard Toggle word wrap
If you cannot restore the mount point, you may have a failed OSD drive. Use your drive utilities to determine if the drive is healthy. For example:
```
yum install smartmontools
smartctl -H /dev/{drive}
```
```
# yum install smartmontools
# smartctl -H /dev/{drive}
```
Copy to Clipboard Toggle word wrap
If the drive has failed, you will need to replace it.
Ensure the OSD is out of cluster.
```
ceph osd out osd.<num>
```
```
# ceph osd out osd.<num>
```
Copy to Clipboard Toggle word wrap
Ensure the OSD process is stopped.
```
systemctl stop ceph-osd@<osd-id>
```
```
# systemctl stop ceph-osd@<osd-id>
```
Copy to Clipboard Toggle word wrap
Ensure the failed OSD is backfilling.
```
ceph -w
```
```
# ceph -w
```
Copy to Clipboard Toggle word wrap
Remove the OSD from the CRUSH Map.
```
ceph osd crush remove osd.<num>
```
```
# ceph osd crush remove osd.<num>
```
Copy to Clipboard Toggle word wrap
Remove the OSD’s authentication keys.
```
ceph auth del osd.<num>
```
```
# ceph auth del osd.<num>
```
Copy to Clipboard Toggle word wrap
Remove the OSD from the Ceph Cluster.
```
ceph osd rm osd.<num>
```
```
# ceph osd rm osd.<num>
```
Copy to Clipboard Toggle word wrap

Unmount the failed drive path.

umount /var/lib/ceph/{daemon}/{cluster}-{daemon-id}

# umount /var/lib/ceph/{daemon}/{cluster}-{daemon-id}

Copy to Clipboard

Toggle word wrap

Replace the physical drive. Refer to the documentation for your hardware node. If the drive is hot swappable, simply replace the failed drive with a new drive. If the drive is NOT hot swappable and the node contains multiple OSDs, you MAY need to bring the node down to replace the physical drive. If you need to bring the node down temporarily, you may set the cluster to noout to prevent backfilling.
```
ceph osd set noout
```
```
ceph osd set noout
```
Copy to Clipboard Toggle word wrap
Once you replace the drive and you bring the node and its OSDs back online, remove the noout setting.
```
ceph osd unset noout
```
```
ceph osd unset noout
```
Copy to Clipboard Toggle word wrap
Allow the new drive to appear under /dev and make a note of the drive path before proceeding further.
Find the OSD drive and format the disk.
Recreate the OSD. See Adding an OSD for details.
Check your CRUSH hierarchy to ensure it is accurate.
```
ceph osd tree
```
```
ceph osd tree
```
Copy to Clipboard Toggle word wrap
If you are not satisfied with the location of the OSD in your CRUSH hierarchy, you may move it with the move command.
```
ceph osd crush move <bucket-to-move> <bucket-type>=<parent-bucket>
```
```
ceph osd crush move <bucket-to-move> <bucket-type>=<parent-bucket>
```
Copy to Clipboard Toggle word wrap
Ensure the OSD is online.

Este conteúdo não está disponível no idioma selecionado.

Chapter 7. Changing an OSD Drive

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links