Chapter 2. Handling a disk failure

As a storage administrator, you will have to deal with a disk failure at some point over the life time of the storage cluster. Testing and simulating a disk failure before a real failure happens will ensure you are ready for when the real thing does happen.

Here is the high-level workflow for replacing a failed disk:

Find the failed OSD.
Take OSD out.
Stop the OSD daemon on the node.
Check Ceph’s status.
Remove the OSD from the CRUSH map.
Delete the OSD authorization.
Remove the OSD from the storage cluster.
Unmount the filesystem on node.
Replace the failed drive.
Add the OSD back to the storage cluster.
Check Ceph’s status.

2.1. Prerequisites
Copy link

A running Red Hat Ceph Storage cluster.
A failed disk.

2.2. Disk failures
Copy link

Ceph is designed for fault tolerance, which means Ceph can operate in a degraded state without losing data. Ceph can still operate even if a data storage drive fails. The degraded state means the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the storage cluster. When an OSD gets marked down this can mean the drive has failed.

When a drive fails, initially the OSD status will be down, but still in the storage cluster. Networking issues can also mark an OSD as down even if it is really up. First check for any network issues in the environment. If the networking checks out okay, then it is likely the OSD drive has failed.

Modern servers typically deploy with hot-swappable drives allowing you to pull a failed drive and replace it with a new one without bringing down the node. However, with Ceph you will also have to remove the software-defined part of the OSD.

2.3. Simulating a disk failure
Copy link

There are two disk failure scenarios: hard and soft. A hard failure means replacing the disk. Soft failure might be an issue with the device driver or some other software component.

In the case of a soft failure, replacing the disk might not be needed. If replacing a disk, then steps need to be followed to remove the failed disk and add the replacement disk to Ceph. In order to simulate a soft disk failure the best thing to do is delete the device. Choose a device and delete the device from the system.

Prerequisites

A healthy, and running Red Hat Ceph Storage cluster.
Root-level access to the Ceph OSD node.

Procedure

Remove the block device from sysfs:

Syntax

echo 1 > /sys/block/BLOCK_DEVICE/device/delete

echo 1 > /sys/block/BLOCK_DEVICE/device/delete

Copy to Clipboard

Toggle word wrap

Example

echo 1 > /sys/block/sdb/device/delete

[root@osd ~]# echo 1 > /sys/block/sdb/device/delete

Copy to Clipboard

Toggle word wrap

In the Ceph OSD log, on the OSD node, Ceph detected the failure and started the recovery process automatically.

Example

tail -50 /var/log/ceph/ceph-osd.1.log
2020-09-02 15:50:50.187067 7ff1ce9a8d80  1 bdev(0x563d263d4600 /var/lib/ceph/osd/ceph-2/block) close
2020-09-02 15:50:50.440398 7ff1ce9a8d80 -1 osd.2 0 OSD:init: unable to mount object store
2020-09-02 15:50:50.440416 7ff1ce9a8d80 -1 ^[[0;31m ** ERROR: osd init failed: (5) Input/output error^[[0m
2020-09-02 15:51:10.633738 7f495c44bd80  0 set uid:gid to 167:167 (ceph:ceph)
2020-09-02 15:51:10.633752 7f495c44bd80  0 ceph version 12.2.12-124.el7cp (e8948288b90d312c206301a9fcf80788fbc3b1f8) luminous (stable), process ceph-osd, pid 36209
2020-09-02 15:51:10.634703 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.635749 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.636642 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.637535 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.641256 7f495c44bd80  0 pidfile_write: ignore empty --pid-file
2020-09-02 15:51:10.669317 7f495c44bd80  0 load: jerasure load: lrc load: isa
2020-09-02 15:51:10.669387 7f495c44bd80  1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2020-09-02 15:51:10.669395 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2020-09-02 15:51:10.669611 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational
2020-09-02 15:51:10.670320 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.670328 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) close
2020-09-02 15:51:10.924727 7f495c44bd80  1 bluestore(/var/lib/ceph/osd/ceph-2) _mount path /var/lib/ceph/osd/ceph-2
2020-09-02 15:51:10.925582 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.925628 7f495c44bd80  1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2020-09-02 15:51:10.925630 7f495c44bd80  1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2020-09-02 15:51:10.925784 7f495c44bd80  1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational
2020-09-02 15:51:10.926549 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error

[root@osd ~]# tail -50 /var/log/ceph/ceph-osd.1.log
2020-09-02 15:50:50.187067 7ff1ce9a8d80  1 bdev(0x563d263d4600 /var/lib/ceph/osd/ceph-2/block) close
2020-09-02 15:50:50.440398 7ff1ce9a8d80 -1 osd.2 0 OSD:init: unable to mount object store
2020-09-02 15:50:50.440416 7ff1ce9a8d80 -1 ^[[0;31m ** ERROR: osd init failed: (5) Input/output error^[[0m
2020-09-02 15:51:10.633738 7f495c44bd80  0 set uid:gid to 167:167 (ceph:ceph)
2020-09-02 15:51:10.633752 7f495c44bd80  0 ceph version 12.2.12-124.el7cp (e8948288b90d312c206301a9fcf80788fbc3b1f8) luminous (stable), process ceph-osd, pid 36209
2020-09-02 15:51:10.634703 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.635749 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.636642 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.637535 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.641256 7f495c44bd80  0 pidfile_write: ignore empty --pid-file
2020-09-02 15:51:10.669317 7f495c44bd80  0 load: jerasure load: lrc load: isa
2020-09-02 15:51:10.669387 7f495c44bd80  1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2020-09-02 15:51:10.669395 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2020-09-02 15:51:10.669611 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational
2020-09-02 15:51:10.670320 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.670328 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) close
2020-09-02 15:51:10.924727 7f495c44bd80  1 bluestore(/var/lib/ceph/osd/ceph-2) _mount path /var/lib/ceph/osd/ceph-2
2020-09-02 15:51:10.925582 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
2020-09-02 15:51:10.925628 7f495c44bd80  1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2020-09-02 15:51:10.925630 7f495c44bd80  1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2020-09-02 15:51:10.925784 7f495c44bd80  1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational
2020-09-02 15:51:10.926549 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error

Copy to Clipboard

Toggle word wrap

Looking at Ceph OSD disk tree, we also see the disk is offline.

Example

ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.28976 root default
-2 0.09659     host ceph3
 1 0.09659         osd.1       down 1.00000          1.00000
-3 0.09659     host ceph1
 2 0.09659         osd.2       up  1.00000          1.00000
-4 0.09659     host ceph2
 0 0.09659         osd.0       up  1.00000          1.00000

[root@osd ~]# ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.28976 root default
-2 0.09659     host ceph3
 1 0.09659         osd.1       down 1.00000          1.00000
-3 0.09659     host ceph1
 2 0.09659         osd.2       up  1.00000          1.00000
-4 0.09659     host ceph2
 0 0.09659         osd.0       up  1.00000          1.00000

Copy to Clipboard

Toggle word wrap

2.4. Replacing a failed OSD disk
Copy link

The general procedure for replacing an OSD involves removing the OSD from the storage cluster, replacing the drive and then recreating the OSD.

Prerequisites

A running Red Hat Ceph Storage cluster.
A failed disk.

Procedure

Check storage cluster health:
```
ceph health
```
```
[root@mon ~]# ceph health
```
Copy to Clipboard Toggle word wrap
Identify the OSD location in the CRUSH hierarchy:
```
ceph osd tree | grep -i down
```
```
[root@mon ~]# ceph osd tree | grep -i down
```
Copy to Clipboard Toggle word wrap
On the OSD node, try to start the OSD:
Syntax
```
systemctl start ceph-osd@OSD_ID
```
```
systemctl start ceph-osd@OSD_ID
```
Copy to Clipboard Toggle word wrap
If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
Note
If the OSD is down, then the OSD will eventually get marked out. This is normal behavior for Ceph Storage. When the OSD gets marked out, other OSDs with copies of the failed OSD’s data will begin backfilling to ensure that the required number of copies exist within the storage cluster. While the storage cluster is backfilling, the cluster will be in a degraded state.
For containerized deployments of Ceph, try to start the OSD container with the OSD_ID:
Syntax
```
systemctl start ceph-osd@OSD_ID
```
```
systemctl start ceph-osd@OSD_ID
```
Copy to Clipboard Toggle word wrap
If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
Note
The drive associated with the OSD can be determined by Mapping a container OSD ID to a drive.
Check the failed OSD’s mount point:
Note
For containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted, so you cannot run df to check its mount point. Use another method to determine if the OSD drive has failed. For example, run smartctl on the drive from the container node.
```
df -h
```
```
[root@osd ~]# df -h
```
Copy to Clipboard Toggle word wrap
If you cannot restart the OSD, you can check the mount point. If the mount point no longer appears, then you can try remounting the OSD drive and restarting the OSD. If you cannot restore the mount point, then you might have a failed OSD drive.
Using the smartctl utility cab help determine if the drive is healthy:
Syntax
```
yum install smartmontools
smartctl -H /dev/BLOCK_DEVICE
```
```
yum install smartmontools
smartctl -H /dev/BLOCK_DEVICE
```
Copy to Clipboard Toggle word wrap
Example
```
smartctl -H /dev/sda
```
```
[root@osd ~]# smartctl -H /dev/sda
```
Copy to Clipboard Toggle word wrap
If the drive has failed, you need to replace it.
Stop the OSD process:
Syntax
```
systemctl stop ceph-osd@OSD_ID
```
```
systemctl stop ceph-osd@OSD_ID
```
Copy to Clipboard Toggle word wrap
For containerized deployments of Ceph, stop the OSD container:
Syntax
```
systemctl stop ceph-osd@OSD_ID
```
```
systemctl stop ceph-osd@OSD_ID
```
Copy to Clipboard Toggle word wrap
Remove the OSD out of the storage cluster:
Syntax
```
ceph osd out OSD_ID
```
```
ceph osd out OSD_ID
```
Copy to Clipboard Toggle word wrap
Ensure the failed OSD is backfilling:
```
ceph -w
```
```
[root@osd ~]# ceph -w
```
Copy to Clipboard Toggle word wrap
Remove the OSD from the CRUSH Map:
Syntax
```
ceph osd crush remove osd.OSD_ID
```
```
ceph osd crush remove osd.OSD_ID
```
Copy to Clipboard Toggle word wrap
Note
This step is only needed, if you are permanently removing the OSD and not redeploying it.
Remove the OSD’s authentication keys:
Syntax
```
ceph auth del osd.OSD_ID
```
```
ceph auth del osd.OSD_ID
```
Copy to Clipboard Toggle word wrap
Verify that the keys for the OSD are not listed:
Example
```
ceph auth list
```
```
[root@osd ~]# ceph auth list
```
Copy to Clipboard Toggle word wrap
Remove the OSD from the storage cluster:
Syntax
```
ceph osd rm osd.OSD_ID
```
```
ceph osd rm osd.OSD_ID
```
Copy to Clipboard Toggle word wrap
Unmount the failed drive path:
Syntax
```
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
```
```
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
```
Copy to Clipboard Toggle word wrap
Example
```
umount /var/lib/ceph/osd/ceph-0
```
```
[root@osd ~]# umount /var/lib/ceph/osd/ceph-0
```
Copy to Clipboard Toggle word wrap
Note
For containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted. In this case there is nothing to unmount and this step can be skipped.
Replace the physical drive. Refer to the hardware vendor’s documentation for the node. If the drive is hot swappable, simply replace the failed drive with a new drive. If the drive is NOT hot swappable and the node contains multiple OSDs, you MIGHT need to bring the node down to replace the physical drive. If you need to bring the node down temporarily, you might set the cluster to noout to prevent backfilling:
Example
```
ceph osd set noout
```
```
[root@osd ~]# ceph osd set noout
```
Copy to Clipboard Toggle word wrap
Once you replace the drive and you bring the node and its OSDs back online, remove the noout setting:
Example
```
ceph osd unset noout
```
```
[root@osd ~]# ceph osd unset noout
```
Copy to Clipboard Toggle word wrap
Allow the new drive to appear under the /dev/ directory and make a note of the drive path before proceeding further.
Find the OSD drive and format the disk.
Recreate the OSD:
1. Using Ceph Ansible.
2. Using the command-line interface.
Check the CRUSH hierarchy to ensure it is accurate:
Example
```
ceph osd tree
```
```
[root@osd ~]# ceph osd tree
```
Copy to Clipboard Toggle word wrap
If you are not satisfied with the location of the OSD in the CRUSH hierarchy, you can move it with the move command:
Syntax
```
ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
```
```
ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
```
Copy to Clipboard Toggle word wrap
Verify the OSD is online.

2.5. Replacing an OSD drive while retaining the OSD ID
Copy link

When replacing a failed OSD drive, you can keep the original OSD ID and CRUSH map entry.

Note

The ceph-volume lvm commands defaults to BlueStore for OSDs.

Prerequisites

A running Red Hat Ceph Storage cluster.
A failed disk.

Procedure

Destroy the OSD:

Syntax

ceph osd destroy OSD_ID --yes-i-really-mean-it

ceph osd destroy OSD_ID --yes-i-really-mean-it

Copy to Clipboard

Toggle word wrap

Example

ceph osd destroy 1 --yes-i-really-mean-it

[root@osd ~]# ceph osd destroy 1 --yes-i-really-mean-it

Copy to Clipboard

Toggle word wrap

Optionally, if the replacement disk was used previously, then you need to zap the disk:
Syntax
```
ceph-volume lvm zap DEVICE
```
```
ceph-volume lvm zap DEVICE
```
Copy to Clipboard Toggle word wrap
Example
```
ceph-volume lvm zap /dev/sdb
```
```
[root@osd ~]# ceph-volume lvm zap /dev/sdb
```
Copy to Clipboard Toggle word wrap
Note
You can find the DEVICE by comparing output from various commands, such as ceph osd tree, ceph osd metadata, and df.

Create the new OSD with the existing OSD ID:

Syntax

ceph-volume lvm create --osd-id OSD_ID --data DEVICE

ceph-volume lvm create --osd-id OSD_ID --data DEVICE

Copy to Clipboard

Toggle word wrap

Example

ceph-volume lvm create --osd-id 1 --data /dev/sdb

[root@mon ~]# ceph-volume lvm create --osd-id 1 --data /dev/sdb

Copy to Clipboard

Toggle word wrap

Additional Resources

See the Adding a Ceph OSD using Ansible with the same disk topologies section in the Red Hat Ceph Storage Operations Guide for more details.
See the Adding a Ceph OSD using Ansible with different disk topologies section in the Red Hat Ceph Storage Operations Guide for more details.
See the Preparing Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Operations Guide for more details.
See the Activating Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Operations Guide for more details.
See the Adding a Ceph OSD using the command-line interface section in the Red Hat Ceph Storage Operations Guide for more details.

Chapter 2. Handling a disk failure

2.1. Prerequisites
Copy link

2.2. Disk failures
Copy link

2.3. Simulating a disk failure
Copy link

2.4. Replacing a failed OSD disk
Copy link

2.5. Replacing an OSD drive while retaining the OSD ID
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. Handling a disk failure

2.1. PrerequisitesCopy linkLink copied to clipboard!

2.2. Disk failuresCopy linkLink copied to clipboard!

2.3. Simulating a disk failureCopy linkLink copied to clipboard!

2.4. Replacing a failed OSD diskCopy linkLink copied to clipboard!

2.5. Replacing an OSD drive while retaining the OSD IDCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Prerequisites
Copy link

2.2. Disk failures
Copy link

2.3. Simulating a disk failure
Copy link

2.4. Replacing a failed OSD disk
Copy link

2.5. Replacing an OSD drive while retaining the OSD ID
Copy link