Rechercher

12.8. Troubleshooting LVM RAID

download PDF

You can troubleshoot various issues in LVM RAID devices to correct data errors, recover devices, or replace failed devices.

12.8.1. Checking data coherency in a RAID logical volume (RAID scrubbing)

LVM provides scrubbing support for RAID logical volumes. RAID scrubbing is the process of reading all the data and parity blocks in an array and checking to see whether they are coherent.

Procédure

  1. Optional: Limit the I/O bandwidth that the scrubbing process uses.

    When you perform a RAID scrubbing operation, the background I/O required by the sync operations can crowd out other I/O to LVM devices, such as updates to volume group metadata. This might cause the other LVM operations to slow down. You can control the rate of the scrubbing operation by implementing recovery throttling.

    Add the following options to the lvchange --syncaction commands in the next steps:

    --maxrecoveryrate Rate[bBsSkKmMgG]
    Sets the maximum recovery rate so that the operation does crowd out nominal I/O operations. Setting the recovery rate to 0 means that the operation is unbounded.
    --minrecoveryrate Rate[bBsSkKmMgG]
    Sets the minimum recovery rate to ensure that I/O for sync operations achieves a minimum throughput, even when heavy nominal I/O is present.

    Specify the Rate value as an amount per second for each device in the array. If you provide no suffix, the options assume kiB per second per device.

  2. Display the number of discrepancies in the array, without repairing them:

    # lvchange --syncaction check vg/raid_lv
  3. Correct the discrepancies in the array:

    # lvchange --syncaction repair vg/raid_lv
    Note

    The lvchange --syncaction repair operation does not perform the same function as the lvconvert --repair operation:

    • The lvchange --syncaction repair operation initiates a background synchronization operation on the array.
    • The lvconvert --repair operation repairs or replaces failed devices in a mirror or RAID logical volume.
  4. Optional: Display information about the scrubbing operation:

    # lvs -o +raid_sync_action,raid_mismatch_count vg/lv
    • The raid_sync_action field displays the current synchronization operation that the RAID volume is performing. It can be one of the following values:

      idle
      All sync operations complete (doing nothing)
      resync
      Initializing an array or recovering after a machine failure
      recover
      Replacing a device in the array
      check
      Looking for array inconsistencies
      repair
      Looking for and repairing inconsistencies
    • The raid_mismatch_count field displays the number of discrepancies found during a check operation.
    • The Cpy%Sync field displays the progress of the sync operations.
    • The lv_attr field provides additional indicators. Bit 9 of this field displays the health of the logical volume, and it supports the following indicators:

      • m (mismatches) indicates that there are discrepancies in a RAID logical volume. This character is shown after a scrubbing operation has detected that portions of the RAID are not coherent.
      • r (refresh) indicates that a device in a RAID array has suffered a failure and the kernel regards it as failed, even though LVM can read the device label and considers the device to be operational. Refresh the logical volume to notify the kernel that the device is now available, or replace the device if you suspect that it failed.

Ressources supplémentaires

  • For more information, see the lvchange(8) and lvmraid(7) man pages.

12.8.2. Failed devices in LVM RAID

RAID is not like traditional LVM mirroring. LVM mirroring required failed devices to be removed or the mirrored logical volume would hang. RAID arrays can keep on running with failed devices. In fact, for RAID types other than RAID1, removing a device would mean converting to a lower level RAID (for example, from RAID6 to RAID5, or from RAID4 or RAID5 to RAID0).

Therefore, rather than removing a failed device unconditionally and potentially allocating a replacement, LVM allows you to replace a failed device in a RAID volume in a one-step solution by using the --repair argument of the lvconvert command.

12.8.3. Recovering a failed RAID device in a logical volume

If the LVM RAID device failure is a transient failure or you are able to repair the device that failed, you can initiate recovery of the failed device.

Conditions préalables

  • The previously failed device is now working.

Procédure

  • Refresh the logical volume that contains the RAID device:

    # lvchange --refresh my_vg/my_lv

Verification steps

  • Examine the logical volume with the recovered device:

    # lvs --all --options name,devices,lv_attr,lv_health_status my_vg

12.8.4. Replacing a failed RAID device in a logical volume

This procedure replaces a failed device that serves as a physical volume in an LVM RAID logical volume.

Conditions préalables

  • The volume group includes a physical volume that provides enough free capacity to replace the failed device.

    If no physical volume with sufficient free extents is available on the volume group, add a new, sufficiently large physical volume using the vgextend utility.

Procédure

  1. In the following example, a RAID logical volume is laid out as follows:

    # lvs --all --options name,copy_percent,devices my_vg
    
      LV               Cpy%Sync Devices
      my_lv            100.00   my_lv_rimage_0(0),my_lv_rimage_1(0),my_lv_rimage_2(0)
      [my_lv_rimage_0]          /dev/sde1(1)
      [my_lv_rimage_1]          /dev/sdc1(1)
      [my_lv_rimage_2]          /dev/sdd1(1)
      [my_lv_rmeta_0]           /dev/sde1(0)
      [my_lv_rmeta_1]           /dev/sdc1(0)
      [my_lv_rmeta_2]           /dev/sdd1(0)
  2. If the /dev/sdc device fails, the output of the lvs command is as follows:

    # lvs --all --options name,copy_percent,devices my_vg
    
      /dev/sdc: open failed: No such device or address
      Couldn't find device with uuid A4kRl2-vIzA-uyCb-cci7-bOod-H5tX-IzH4Ee.
      WARNING: Couldn't find all devices for LV my_vg/my_lv_rimage_1 while checking used and assumed devices.
      WARNING: Couldn't find all devices for LV my_vg/my_lv_rmeta_1 while checking used and assumed devices.
      LV               Cpy%Sync Devices
      my_lv            100.00   my_lv_rimage_0(0),my_lv_rimage_1(0),my_lv_rimage_2(0)
      [my_lv_rimage_0]          /dev/sde1(1)
      [my_lv_rimage_1]          [unknown](1)
      [my_lv_rimage_2]          /dev/sdd1(1)
      [my_lv_rmeta_0]           /dev/sde1(0)
      [my_lv_rmeta_1]           [unknown](0)
      [my_lv_rmeta_2]           /dev/sdd1(0)
  3. Replace the failed device and display the logical volume:

    # lvconvert --repair my_vg/my_lv
    
      /dev/sdc: open failed: No such device or address
      Couldn't find device with uuid A4kRl2-vIzA-uyCb-cci7-bOod-H5tX-IzH4Ee.
      WARNING: Couldn't find all devices for LV my_vg/my_lv_rimage_1 while checking used and assumed devices.
      WARNING: Couldn't find all devices for LV my_vg/my_lv_rmeta_1 while checking used and assumed devices.
    Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
      Faulty devices in my_vg/my_lv successfully replaced.

    Optional: To manually specify the physical volume that replaces the failed device, add the physical volume at the end of the command:

    # lvconvert --repair my_vg/my_lv replacement_pv
  4. Examine the logical volume with the replacement:

    # lvs --all --options name,copy_percent,devices my_vg
    
      /dev/sdc: open failed: No such device or address
      /dev/sdc1: open failed: No such device or address
      Couldn't find device with uuid A4kRl2-vIzA-uyCb-cci7-bOod-H5tX-IzH4Ee.
      LV               Cpy%Sync Devices
      my_lv            43.79    my_lv_rimage_0(0),my_lv_rimage_1(0),my_lv_rimage_2(0)
      [my_lv_rimage_0]          /dev/sde1(1)
      [my_lv_rimage_1]          /dev/sdb1(1)
      [my_lv_rimage_2]          /dev/sdd1(1)
      [my_lv_rmeta_0]           /dev/sde1(0)
      [my_lv_rmeta_1]           /dev/sdb1(0)
      [my_lv_rmeta_2]           /dev/sdd1(0)

    Until you remove the failed device from the volume group, LVM utilities still indicate that LVM cannot find the failed device.

  5. Remove the failed device from the volume group:

    # vgreduce --removemissing VG
Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.