16.20. Replacing a failed disk in RAID
You can reconstruct the data from the failed disks by using the remaining disks. RAID level and the total number of disks determines the minimum amount of remaining disks needed for a successful data reconstruction.
In this procedure, the /dev/md0 RAID contains four disks. The /dev/sdd disk has failed and you need to replace it with the /dev/sdf disk.
Prerequisites
- A spare disk for replacement.
-
The
mdadmpackage is installed.
Procedure
Check the failed disk:
View the kernel logs:
# journalctl -k -fSearch for a message similar to the following:
md/raid:md0: Disk failure on sdd, disabling device. md/raid:md0: Operation continuing on 3 devices.-
Press Ctrl+C on your keyboard to exit the
journalctlprogram.
Mark the failed disk as faulty:
# mdadm --manage /dev/md0 --fail /dev/sddOptional: Check if the failed disk was marked correctly:
# mdadm --detail /dev/md0At the end of the output is a list of disks in the /dev/md0 RAID where the disk /dev/sdd has the faulty status:
Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc - 0 0 2 removed 3 8 64 3 active sync /dev/sde 2 8 48 - faulty /dev/sddRemove the failed disk from the RAID:
# mdadm --manage /dev/md0 --remove /dev/sdd주의If your RAID cannot withstand another disk failure, do not remove any disk until the new disk has the active sync status. You can monitor the progress using the
watch cat /proc/mdstatcommand.Add the new disk to the RAID:
# mdadm --manage /dev/md0 --add /dev/sdfThe /dev/md0 RAID now includes the new disk /dev/sdf and the
mdadmservice will automatically starts copying data to it from other disks.
Verification
Check the details of the array:
# mdadm --detail /dev/md0If this command shows a list of disks in the /dev/md0 RAID where the new disk has spare rebuilding status at the end of the output, data is still being copied to it from other disks:
Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 4 8 80 2 spare rebuilding /dev/sdf 3 8 64 3 active sync /dev/sdeAfter data copying is finished, the new disk has an active sync status.
Check the progress of synchronization:
# cat /proc/mdstat Personalities : [raid4] [raid5] [raid6] md0 : active raid5 sdf[5] sde[4] sdc[1] sdb[0] 6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U] [==============>......] recovery = 72.0% (1509820/2094080) finish=0.0min speed=215688K/sec unused devices: <none>