Chapter 10. Troubleshooting scrub and deep-scrub issues
Learn to troubleshoot the scrub and deep-scrub issues.
10.1. Addressing the scrub slowness issue while upgrading from 6 to 7 Copy linkLink copied to clipboard!
Learn to troubleshoot the scrub slowness issue which seen after upgrading from 6 to 7.
Scrub slowness is caused by the automated OSD benchmark setting a very low value for osd_mclock_max_capacity_iops_hdd
. Due to this, scrub operations are impacted since the IOPS capacity of an OSD plays a significant role in determining the bandwidth the scrub operation receives. To further increase the problem, scrubs receive only a fraction of the total IOPS capacity based on the QoS allocation defined by the mClock profile.
Due to this, the Ceph cluster reports the expected scrub completion time in multiples of days or weeks.
Prequisites
- A running Red Hat Ceph Storage cluster in a healthy state.
- Root-level access to the node.
Procedure
Detect low measured IOPS reported by OSD bench during OSD boot-up and fallback to default IOPS setting defined for
osd_mclock_max_capacity_iops_[hdd|ssd]
. The fallback is triggered if the reported IOPS falls below a threshold determined byosd_mclock_iops_capacity_low_threshold_[hdd|ssd]
. A cluster warning is also logged.Example:
ceph config rm osd.X osd_mclock_max_capacity_iops_[hdd|ssd]
$ ceph config rm osd.X osd_mclock_max_capacity_iops_[hdd|ssd]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [Optional]: Perform the following steps if you have not yet upgraded to 7 from 6 (before the upgrade):
For clusters already affected by the issue, remove the IOPS capacity setting on the OSD(s) before upgrading to the release with the fix by running the following command:
Example:
ceph config rm osd.X osd_mclock_max_capacity_iops_[hdd|ssd]
$ ceph config rm osd.X osd_mclock_max_capacity_iops_[hdd|ssd]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
osd_mclock_force_run_benchmark_on_init
option for the affected OSD to true before the upgrade:Example:
ceph config set osd.X osd_mclock_force_run_benchmark_on_init true
$ ceph config set osd.X osd_mclock_force_run_benchmark_on_init true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After upgrading to the release with this fix, the IOPS capacity reflects the default setting or the new one reported by the OSD bench.
[Optional]: Perform the following steps if you have already upgraded from 6 to 7 (after upgrade):
If you were unable to perform the above steps before upgrade, you re-run the OSD bench again after upgrading by removing the
osd_mclock_max_capacity_iops_[hdd|ssd]
setting:Example:
ceph config rm osd.X osd_mclock_max_capacity_iops_[hdd|ssd]
$ ceph config rm osd.X osd_mclock_max_capacity_iops_[hdd|ssd]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set
osd_mclock_force_run_benchmark_on_init
to true.Example:
ceph config set osd.X osd_mclock_force_run_benchmark_on_init true
$ ceph config set osd.X osd_mclock_force_run_benchmark_on_init true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the OSD.
After the OSD restarts, the IOPS capacity reflects the default setting or the new setting reported by the OSD bench.