Chapter 13. Configuring maximum time for storage error recovery with eh_deadline
You can configure the maximum allowed time to recover failed SCSI devices. This configuration guarantees an I/O response time even when storage hardware becomes unresponsive due to a failure.
13.1. The eh_deadline parameter
The SCSI error handling (EH) mechanism attempts to perform error recovery on failed SCSI devices. The SCSI host object eh_deadline
parameter enables you to configure the maximum amount of time for the recovery. After the configured time expires, SCSI EH stops and resets the entire host bus adapter (HBA).
Using eh_deadline
can reduce the time:
- to shut off a failed path,
- to switch a path, or
- to disable a RAID slice.
When eh_deadline
expires, SCSI EH resets the HBA, which affects all target paths on that HBA, not only the failing one. If some of the redundant paths are not available for other reasons, I/O errors might occur. Enable eh_deadline
only if you have multipath configured on all targets. Also, if your multipath devices are not fully redundant, you should verify that no_path_retry
is set large enough to allow paths to recover.
The value of the eh_deadline
parameter is specified in seconds. The default setting is off
, which disables the time limit and allows all of the error recovery to take place.
Scenarios when eh_deadline is useful
In most scenarios, you do not need to enable eh_deadline
. Using eh_deadline
can be useful in certain specific scenarios. For example if a link loss occurs between a Fibre Channel (FC) switch and a target port, and the HBA does not receive Registered State Change Notifications (RSCNs). In such a case, I/O requests and error recovery commands all time out rather than encounter an error. Setting eh_deadline
in this environment puts an upper limit on the recovery time. That enables the failed I/O to be retried on another available path by DM Multipath.
Under the following conditions, the eh_deadline
parameter provides no additional benefit, because the I/O and error recovery commands fail immediately, which enables DM Multipath to retry:
- If RSCNs are enabled
- If the HBA does not register the link becoming unavailable
13.2. Setting the eh_deadline parameter
This procedure configures the value of the eh_deadline
parameter to limit the maximum SCSI recovery time.
Procedure
You can configure
eh_deadline
using either of the following methods:defaults
section of themultpath.conf
fileFrom the defaults section of the
multpath.conf
file, set theeh_deadline
parameter to the required number of seconds:# eh_deadline 300
NoteFrom RHEL 8.4, setting the
eh_deadline
parameter using the defaults section of themultpath.conf
file is the preferred method.To turn off the
eh_deadline
parameter with this method, seteh_deadline
tooff
.sysfs
Write the number of seconds into the
/sys/class/scsi_host/host<host-number>/eh_deadline
files. For example, to set theeh_deadline
parameter throughsysfs
on SCSI host 6:# echo 300 > /sys/class/scsi_host/host6/eh_deadline
To turn off the
eh_deadline
parameter with this method, use echooff
.Kernel parameter
Set a default value for all SCSI HBAs using the
scsi_mod.eh_deadline
kernel parameter.# echo 300 > /sys/module/scsi_mod/parameters/eh_deadline
To turn off the
eh_deadline
parameter with this method, use echo-1
.
Additional resources