主页
产品
Red Hat Enterprise Linux
10
Managing storage devices
16.20. Replacing a failed disk in RAID

16.20. Replacing a failed disk in RAID

You can reconstruct the data from the failed disks by using the remaining disks. RAID level and the total number of disks determines the minimum amount of remaining disks needed for a successful data reconstruction.

In this procedure, the /dev/md0 RAID contains four disks. The /dev/sdd disk has failed and you need to replace it with the /dev/sdf disk.

Prerequisites

A spare disk for replacement.
The mdadm package is installed.

Procedure

Check the failed disk:
1. View the kernel logs:
  # journalctl -k -f
2. Search for a message similar to the following:
  md/raid:md0: Disk failure on sdd, disabling device. md/raid:md0: Operation continuing on 3 devices.
3. Press Ctrl+C on your keyboard to exit the journalctl program.

Mark the failed disk as faulty:

# mdadm --manage /dev/md0 --fail /dev/sdd

Optional: Check if the failed disk was marked correctly:

# mdadm --detail /dev/md0

At the end of the output is a list of disks in the /dev/md0 RAID where the disk /dev/sdd has the faulty status:

Number   Major   Minor   RaidDevice State
   0       8       16        0      active sync   /dev/sdb
   1       8       32        1      active sync   /dev/sdc
   -       0        0        2      removed
   3       8       64        3      active sync   /dev/sde

   2       8       48        -      faulty   /dev/sdd

Remove the failed disk from the RAID:
```
# mdadm --manage /dev/md0 --remove /dev/sdd
```
警告
If your RAID cannot withstand another disk failure, do not remove any disk until the new disk has the active sync status. You can monitor the progress using the watch cat /proc/mdstat command.
Add the new disk to the RAID:
```
# mdadm --manage /dev/md0 --add /dev/sdf
```
The /dev/md0 RAID now includes the new disk /dev/sdf and the mdadm service will automatically starts copying data to it from other disks.

Verification

Check the details of the array:

# mdadm --detail /dev/md0

If this command shows a list of disks in the /dev/md0 RAID where the new disk has spare rebuilding status at the end of the output, data is still being copied to it from other disks:

Number   Major   Minor   RaidDevice State
   0       8       16        0      active sync   /dev/sdb
   1       8       32        1      active sync   /dev/sdc
   4       8       80        2      spare rebuilding   /dev/sdf
   3       8       64        3      active sync   /dev/sde

After data copying is finished, the new disk has an active sync status.

Check the progress of synchronization:

# cat /proc/mdstat
Personalities : [raid4] [raid5] [raid6]
md0 : active raid5 sdf[5] sde[4] sdc[1] sdb[0]
      6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      [==============>......]  recovery = 72.0% (1509820/2094080) finish=0.0min speed=215688K/sec

unused devices: <none>

16.20. Replacing a failed disk in RAID

学习

尝试、购买和销售

社区

關於紅帽

让开源更具包容性

关于红帽文档

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links