5.4.16.8. Setting a RAID fault policy
LVM RAID handles device failures in an automatic fashion based on the preferences defined by the
raid_fault_policy
field in the lvm.conf
file.
- If the
raid_fault_policy
field is set toallocate
, the system will attempt to replace the failed device with a spare device from the volume group. If there is no available spare device, this will be reported to the system log. - If the
raid_fault_policy
field is set towarn
, the system will produce a warning and the log will indicate that a device has failed. This allows the user to determine the course of action to take.
As long as there are enough devices remaining to support usability, the RAID logical volume will continue to operate.
5.4.16.8.1. The allocate RAID Fault Policy
In the following example, the
raid_fault_policy
field has been set to allocate
in the lvm.conf
file. The RAID logical volume is laid out as follows.
# lvs -a -o name,copy_percent,devices my_vg
LV Copy% Devices
my_lv 100.00 my_lv_rimage_0(0),my_lv_rimage_1(0),my_lv_rimage_2(0)
[my_lv_rimage_0] /dev/sde1(1)
[my_lv_rimage_1] /dev/sdf1(1)
[my_lv_rimage_2] /dev/sdg1(1)
[my_lv_rmeta_0] /dev/sde1(0)
[my_lv_rmeta_1] /dev/sdf1(0)
[my_lv_rmeta_2] /dev/sdg1(0)
If the
/dev/sde
device fails, the system log will display error messages.
# grep lvm /var/log/messages
Jan 17 15:57:18 bp-01 lvm[8599]: Device #0 of raid1 array, my_vg-my_lv, has failed.
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at
250994294784: Input/output error
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at
250994376704: Input/output error
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at 0:
Input/output error
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at
4096: Input/output error
Jan 17 15:57:19 bp-01 lvm[8599]: Couldn't find device with uuid
3lugiV-3eSP-AFAR-sdrP-H20O-wM2M-qdMANy.
Jan 17 15:57:27 bp-01 lvm[8599]: raid1 array, my_vg-my_lv, is not in-sync.
Jan 17 15:57:36 bp-01 lvm[8599]: raid1 array, my_vg-my_lv, is now in-sync.
Since the
raid_fault_policy
field has been set to allocate
, the failed device is replaced with a new device from the volume group.
# lvs -a -o name,copy_percent,devices vg
Couldn't find device with uuid 3lugiV-3eSP-AFAR-sdrP-H20O-wM2M-qdMANy.
LV Copy% Devices
lv 100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[lv_rimage_0] /dev/sdh1(1)
[lv_rimage_1] /dev/sdf1(1)
[lv_rimage_2] /dev/sdg1(1)
[lv_rmeta_0] /dev/sdh1(0)
[lv_rmeta_1] /dev/sdf1(0)
[lv_rmeta_2] /dev/sdg1(0)
Note that even though the failed device has been replaced, the display still indicates that LVM could not find the failed device. This is because, although the failed device has been removed from the RAID logical volume, the failed device has not yet been removed from the volume group. To remove the failed device from the volume group, you can execute
vgreduce --removemissing VG
.
If the
raid_fault_policy
has been set to allocate
but there are no spare devices, the allocation will fail, leaving the logical volume as it is. If the allocation fails, you have the option of fixing the drive, then deactivating and activating the logical volume, as described in Section 5.4.16.8.2, “The warn RAID Fault Policy”. Alternately, you can replace the failed device, as described in Section 5.4.16.9, “Replacing a RAID device”.