Questo contenuto non è disponibile nella lingua selezionata.
Chapter 2. Handling a disk failure
As a storage administrator, you will have to deal with a disk failure at some point over the life time of the storage cluster. Testing and simulating a disk failure before a real failure happens will ensure you are ready for when the real thing does happen.
Here is the high-level workflow for replacing a failed disk:
- Find the failed OSD.
- Take OSD out.
- Stop the OSD daemon on the node.
- Check Ceph’s status.
- Remove the OSD from the CRUSH map.
- Delete the OSD authorization.
- Remove the OSD from the storage cluster.
- Unmount the filesystem on node.
- Replace the failed drive.
- Add the OSD back to the storage cluster.
- Check Ceph’s status.
2.1. Prerequisites Copia collegamentoCollegamento copiato negli appunti!
- A running Red Hat Ceph Storage cluster.
- A failed disk.
2.2. Disk failures Copia collegamentoCollegamento copiato negli appunti!
Ceph is designed for fault tolerance, which means Ceph can operate in a degraded
state without losing data. Ceph can still operate even if a data storage drive fails. The degraded
state means the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the storage cluster. When an OSD gets marked down
this can mean the drive has failed.
When a drive fails, initially the OSD status will be down
, but still in
the storage cluster. Networking issues can also mark an OSD as down
even if it is really up
. First check for any network issues in the environment. If the networking checks out okay, then it is likely the OSD drive has failed.
Modern servers typically deploy with hot-swappable drives allowing you to pull a failed drive and replace it with a new one without bringing down the node. However, with Ceph you will also have to remove the software-defined part of the OSD.
2.3. Simulating a disk failure Copia collegamentoCollegamento copiato negli appunti!
There are two disk failure scenarios: hard and soft. A hard failure means replacing the disk. Soft failure might be an issue with the device driver or some other software component.
In the case of a soft failure, replacing the disk might not be needed. If replacing a disk, then steps need to be followed to remove the failed disk and add the replacement disk to Ceph. In order to simulate a soft disk failure the best thing to do is delete the device. Choose a device and delete the device from the system.
Prerequisites
- A healthy, and running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph OSD node.
Procedure
Remove the block device from
sysfs
:Syntax
echo 1 > /sys/block/BLOCK_DEVICE/device/delete
echo 1 > /sys/block/BLOCK_DEVICE/device/delete
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
echo 1 > /sys/block/sdb/device/delete
[root@osd ~]# echo 1 > /sys/block/sdb/device/delete
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the Ceph OSD log, on the OSD node, Ceph detected the failure and started the recovery process automatically.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Looking at Ceph OSD disk tree, we also see the disk is offline.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.4. Replacing a failed OSD disk Copia collegamentoCollegamento copiato negli appunti!
The general procedure for replacing an OSD involves removing the OSD from the storage cluster, replacing the drive and then recreating the OSD.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A failed disk.
Procedure
Check storage cluster health:
ceph health
[root@mon ~]# ceph health
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the OSD location in the CRUSH hierarchy:
ceph osd tree | grep -i down
[root@mon ~]# ceph osd tree | grep -i down
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On the OSD node, try to start the OSD:
Syntax
systemctl start ceph-osd@OSD_ID
systemctl start ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
NoteIf the OSD is
down
, then the OSD will eventually get markedout
. This is normal behavior for Ceph Storage. When the OSD gets markedout
, other OSDs with copies of the failed OSD’s data will begin backfilling to ensure that the required number of copies exist within the storage cluster. While the storage cluster is backfilling, the cluster will be in adegraded
state.For containerized deployments of Ceph, try to start the OSD container with the OSD_ID:
Syntax
systemctl start ceph-osd@OSD_ID
systemctl start ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
NoteThe drive associated with the OSD can be determined by Mapping a container OSD ID to a drive.
Check the failed OSD’s mount point:
NoteFor containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted, so you cannot run
df
to check its mount point. Use another method to determine if the OSD drive has failed. For example, runsmartctl
on the drive from the container node.df -h
[root@osd ~]# df -h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you cannot restart the OSD, you can check the mount point. If the mount point no longer appears, then you can try remounting the OSD drive and restarting the OSD. If you cannot restore the mount point, then you might have a failed OSD drive.
Using the
smartctl
utility cab help determine if the drive is healthy:Syntax
yum install smartmontools smartctl -H /dev/BLOCK_DEVICE
yum install smartmontools smartctl -H /dev/BLOCK_DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
smartctl -H /dev/sda
[root@osd ~]# smartctl -H /dev/sda
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the drive has failed, you need to replace it.
Stop the OSD process:
Syntax
systemctl stop ceph-osd@OSD_ID
systemctl stop ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For containerized deployments of Ceph, stop the OSD container:
Syntax
systemctl stop ceph-osd@OSD_ID
systemctl stop ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD out of the storage cluster:
Syntax
ceph osd out OSD_ID
ceph osd out OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure the failed OSD is backfilling:
ceph -w
[root@osd ~]# ceph -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the CRUSH Map:
Syntax
ceph osd crush remove osd.OSD_ID
ceph osd crush remove osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis step is only needed, if you are permanently removing the OSD and not redeploying it.
Remove the OSD’s authentication keys:
Syntax
ceph auth del osd.OSD_ID
ceph auth del osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the keys for the OSD are not listed:
Example
ceph auth list
[root@osd ~]# ceph auth list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the storage cluster:
Syntax
ceph osd rm osd.OSD_ID
ceph osd rm osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Unmount the failed drive path:
Syntax
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
umount /var/lib/ceph/osd/ceph-0
[root@osd ~]# umount /var/lib/ceph/osd/ceph-0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteFor containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted. In this case there is nothing to unmount and this step can be skipped.
Replace the physical drive. Refer to the hardware vendor’s documentation for the node. If the drive is hot swappable, simply replace the failed drive with a new drive. If the drive is NOT hot swappable and the node contains multiple OSDs, you MIGHT need to bring the node down to replace the physical drive. If you need to bring the node down temporarily, you might set the cluster to
noout
to prevent backfilling:Example
ceph osd set noout
[root@osd ~]# ceph osd set noout
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once you replace the drive and you bring the node and its OSDs back online, remove the
noout
setting:Example
ceph osd unset noout
[root@osd ~]# ceph osd unset noout
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Allow the new drive to appear under the
/dev/
directory and make a note of the drive path before proceeding further.- Find the OSD drive and format the disk.
Recreate the OSD:
- Using Ceph Ansible.
- Using the command-line interface.
Check the CRUSH hierarchy to ensure it is accurate:
Example
ceph osd tree
[root@osd ~]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you are not satisfied with the location of the OSD in the CRUSH hierarchy, you can move it with the
move
command:Syntax
ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify the OSD is online.
2.5. Replacing an OSD drive while retaining the OSD ID Copia collegamentoCollegamento copiato negli appunti!
When replacing a failed OSD drive, you can keep the original OSD ID and CRUSH map entry.
The ceph-volume lvm
commands defaults to BlueStore for OSDs.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A failed disk.
Procedure
Destroy the OSD:
Syntax
ceph osd destroy OSD_ID --yes-i-really-mean-it
ceph osd destroy OSD_ID --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd destroy 1 --yes-i-really-mean-it
[root@osd ~]# ceph osd destroy 1 --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, if the replacement disk was used previously, then you need to
zap
the disk:Syntax
ceph-volume lvm zap DEVICE
ceph-volume lvm zap DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm zap /dev/sdb
[root@osd ~]# ceph-volume lvm zap /dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou can find the DEVICE by comparing output from various commands, such as
ceph osd tree
,ceph osd metadata
, anddf
.Create the new OSD with the existing OSD ID:
Syntax
ceph-volume lvm create --osd-id OSD_ID --data DEVICE
ceph-volume lvm create --osd-id OSD_ID --data DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm create --osd-id 1 --data /dev/sdb
[root@mon ~]# ceph-volume lvm create --osd-id 1 --data /dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Adding a Ceph OSD using Ansible with the same disk topologies section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Adding a Ceph OSD using Ansible with different disk topologies section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Preparing Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Activating Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Adding a Ceph OSD using the command-line interface section in the Red Hat Ceph Storage Operations Guide for more details.