Chapter 12. Replacing a failed disk
If one of the disks fails in your Ceph cluster, complete the following procedures to replace it:
- Determining if there is a device name change, see Section 12.1, “Determining if there is a device name change”.
- Ensuring that the OSD is down and destroyed, see Section 12.2, “Ensuring that the OSD is down and destroyed”.
- Removing the old disk from the system and installing the replacement disk, see Section 12.3, “Removing the old disk from the system and installing the replacement disk”.
- Verifying that the disk replacement is successful, see Section 12.4, “Verifying that the disk replacement is successful”.
12.1. Determining if there is a device name change
Before you replace the disk, determine if the replacement disk for the replacement OSD has a different name in the operating system than the device that you want to replace. If the replacement disk has a different name, you must update Ansible parameters for the devices list so that subsequent runs of ceph-ansible
, including when director runs ceph-ansible
, do not fail as a result of the change. For an example of the devices list that you must change when you use director, see Section 5.3, “Mapping the Ceph Storage node disk layout”.
If the device name changes and you use the following procedures to update your system outside of ceph-ansible
or director, there is a risk that the configuration management tools are out of sync with the system that they manage until you update the system definition files and the configuration is reasserted without error.
Persistent naming of storage devices
Storage devices that the sd
driver manages might not always have the same name across reboots. For example, a disk that is normally identified by /dev/sdc
might be named /dev/sdb
. It is also possible for the replacement disk, /dev/sdc
, to appear in the operating system as /dev/sdd
even if you want to use it as a replacement for /dev/sdc
. To address this issue, use names that are persistent and match the following pattern: /dev/disk/by-*
. For more information, see Persistent Naming in the Red Hat Enterprise Linux (RHEL) 7 Storage Administration Guide.
Depending on the naming method that you use to deploy Ceph, you might need to update the devices
list after you replace the OSD. Use the following list of naming methods to determine if you must change the devices list:
- The major and minor number range method
If you used
sd
and want to continue to use it, after you install the new disk, check if the name has changed. If the name did not change, for example, if the same name appears correctly as/dev/sdd
, it is not necessary to change the name after you complete the disk replacement procedures.ImportantThis naming method is not recommended because there is still a risk that the name becomes inconsistent over time. For more information, see Persistent Naming in the RHEL 7 Storage Administration Guide.
- The
by-path
method If you use this method, and you add a replacement disk in the same slot, then the path is consistent and no change is necessary.
ImportantAlthough this naming method is preferable to the major and minor number range method, use caution to ensure that the target numbers do not change. For example, use persistent binding and update the names if a host adapter is moved to a different PCI slot. In addition, there is the possibility that the SCSI host numbers can change if a HBA fails to probe, if drivers are loaded in a different order, or if a new HBA is installed on the system. The
by-path
naming method also differs between RHEL7 and RHEL8. For more information, see:- Article [What is the difference between "by-path" links created in RHEL8 and RHEL7?] https://access.redhat.com/solutions/5171991
- Overview of persistent naming attributes in the RHEL 8 Managing file systems guide.
- The
by-uuid
method -
If you use this method, you can use the
blkid
utility to set the new disk to have the same UUID as the old disk. For more information, see Persistent Naming in the RHEL 7 Storage Administration Guide. - The
by-id
method - If you use this method, you must change the devices list because this identifier is a property of the device and the device has been replaced.
When you add the new disk to the system, if it is possible to modify the persistent naming attributes according to the RHEL7 Storage Administrator Guide, see Persistent Naming, so that the device name is unchanged, then it is not necessary to update the devices list and re-run ceph-ansible
, or trigger director to re-run ceph-ansible
and you can proceed with the disk replacement procedures. However, you can re-run ceph-ansible
to ensure that the change did not result in any inconsistencies.
12.2. Ensuring that the OSD is down and destroyed
On the server that hosts the Ceph Monitor, use the ceph
command in the running monitor container to ensure that the OSD that you want to replace is down, and then destroy it.
Procedure
Identify the name of the running Ceph monitor container and store it in an environment variable called
MON
:MON=$(podman ps | grep ceph-mon | awk {'print $1'})
Alias the
ceph
command so that it executes within the running Ceph monitor container:alias ceph="podman exec $MON ceph"
Use the new alias to verify that the OSD that you want to replace is down:
[root@overcloud-controller-0 ~]# ceph osd tree | grep 27 27 hdd 0.04790 osd.27 down 1.00000 1.00000
Destroy the OSD. The following example command destroys
OSD 27
:[root@overcloud-controller-0 ~]# ceph osd destroy 27 --yes-i-really-mean-it destroyed osd.27
12.3. Removing the old disk from the system and installing the replacement disk
On the container host with the OSD that you want to replace, remove the old disk from the system and install the replacement disk.
Prerequisites:
- Verify that the device ID has changed. For more information, see Section 12.1, “Determining if there is a device name change”.
The ceph-volume
command is present in the Ceph container but is not installed on the overcloud node. Create an alias so that the ceph-volume
command runs the ceph-volume
binary inside the Ceph container. Then use the ceph-volume
command to clean the new disk and add it as an OSD.
Procedure
Ensure that the failed OSD is not running:
systemctl stop ceph-osd@27
Identify the image ID of the ceph container image and store it in an environment variable called
IMG
:IMG=$(podman images | grep ceph | awk {'print $3'})
Alias the
ceph-volume
command so that it runs inside the$IMG
Ceph container, with theceph-volume
entry point and relevant directories:alias ceph-volume="podman run --rm --privileged --net=host --ipc=host -v /run/lock/lvm:/run/lock/lvm:z -v /var/run/udev/:/var/run/udev/:z -v /dev:/dev -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=ceph-volume $IMG --cluster ceph"
Verify that the aliased command runs successfully:
ceph-volume lvm list
Check that your new OSD device is not already part of LVM. Use the
pvdisplay
command to inspect the device, and ensure that theVG Name
field is empty. Replace<NEW_DEVICE>
with the/dev/*
path of your new OSD device:[root@overcloud-computehci-2 ~]# pvdisplay <NEW_DEVICE> --- Physical volume --- PV Name /dev/sdj VG Name ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2 PV Size 50.00 GiB / not usable 1.00 GiB Allocatable yes (but full) PE Size 1.00 GiB Total PE 49 Free PE 0 Allocated PE 49 PV UUID kOO0If-ge2F-UH44-6S1z-9tAv-7ypT-7by4cp [root@overcloud-computehci-2 ~]#
If the
VG Name
field is not empty, then the device belongs to a volume group that you must remove.If the device belongs to a volume group, use the
lvdisplay
command to check if there is a logical volume in the volume group. Replace<VOLUME_GROUP>
with the value of theVG Name
field that you retrieved from thepvdisplay
command:[root@overcloud-computehci-2 ~]# lvdisplay | grep <VOLUME_GROUP> LV Path /dev/ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2/osd-data-a0810722-7673-43c7-8511-2fd9db1dbbc6 VG Name ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2 [root@overcloud-computehci-2 ~]#
If the
LV Path
field is not empty, then the device contains a logical volume that you must remove.If the new device is part of a logical volume or volume group, remove the logical volume, volume group, and the device association as a physical volume within the LVM system.
-
Replace
<LV_PATH>
with the value of theLV Path
field. -
Replace
<VOLUME_GROUP>
with the value of theVG Name
field. Replace
<NEW_DEVICE>
with the/dev/*
path of your new OSD device.[root@overcloud-computehci-2 ~]# lvremove --force <LV_PATH> Logical volume "osd-data-a0810722-7673-43c7-8511-2fd9db1dbbc6" successfully removed
[root@overcloud-computehci-2 ~]# vgremove --force <VOLUME_GROUP> Volume group "ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2" successfully removed
[root@overcloud-computehci-2 ~]# pvremove <NEW_DEVICE> Labels on physical volume "/dev/sdj" successfully wiped.
-
Replace
Ensure that the new OSD device is clean. In the following example, the device is
/dev/sdj
:[root@overcloud-computehci-2 ~]# ceph-volume lvm zap /dev/sdj --> Zapping: /dev/sdj --> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/sbin/wipefs --all /dev/sdj Running command: /bin/dd if=/dev/zero of=/dev/sdj bs=1M count=10 stderr: 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.010618 s, 988 MB/s --> Zapping successful for: <Raw Device: /dev/sdj> [root@overcloud-computehci-2 ~]#
Create the new OSD with the existing OSD ID by using the new device but pass
--no-systemd
so thatceph-volume
does not attempt to start the OSD. This is not possible from within the container:ceph-volume lvm create --osd-id 27 --data /dev/sdj --no-systemd
Start the OSD outside of the container:
systemctl start ceph-osd@27
12.4. Verifying that the disk replacement is successful
To check that your disk replacement is successful, on the undercloud, complete the following steps.
Procedure
- Check if the device name changed, update the devices list according to the naming method you used to deploy Ceph. For more information, see Section 12.1, “Determining if there is a device name change”.
- To ensure that the change did not introduce any inconsistencies, re-run the overcloud deploy command to perform a stack update.
In cases where you have hosts that have different device lists, you might have to define an exception. For example, you might use the following example heat environment file to deploy a node with three OSD devices.
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd osd_scenario: lvm osd_objectstore: bluestore
The
CephAnsibleDisksConfig
parameter applies to all nodes that host OSDs, so you cannot update thedevices
parameter with the new device list. Instead, you must define an exception for the new host that has a different device list. For more information about defining an exception, see Section 5.5, “Overriding parameters for dissimilar Ceph Storage nodes” and Section 5.5.1.2, “Altering the disk layout in Ceph Storage nodes”.