Chapter 14. Configuring XFS error behavior
You can configure how an XFS file system behaves when it encounters different I/O errors.
14.1. Configurable error handling in XFS
The XFS file system responds in one of the following ways when an error occurs during an I/O operation:
XFS repeatedly retries the I/O operation until the operation succeeds or XFS reaches a set limit.
The limit is based either on a maximum number of retries or a maximum time for retries.
- XFS considers the error permanent and stops the operation on the file system.
You can configure how XFS reacts to the following error conditions:
EIO
- Error when reading or writing
ENOSPC
- No space left on the device
ENODEV
- Device cannot be found
You can set the maximum number of retries and the maximum time in seconds until XFS considers an error permanent. XFS stops retrying the operation when it reaches either of the limits.
You can also configure XFS so that when unmounting a file system, XFS immediately cancels the retries regardless of any other configuration. This configuration enables the unmount operation to succeed despite persistent errors.
Default behavior
The default behavior for each XFS error condition depends on the error context. Some XFS errors such as ENODEV
are considered to be fatal and unrecoverable, regardless of the retry count. Their default retry limit is 0.
14.2. Configuration files for specific and undefined XFS error conditions
The following directories store configuration files that control XFS error behavior for different error conditions:
/sys/fs/xfs/device/error/metadata/EIO/
-
For the
EIO
error condition /sys/fs/xfs/device/error/metadata/ENODEV/
-
For the
ENODEV
error condition /sys/fs/xfs/device/error/metadata/ENOSPC/
-
For the
ENOSPC
error condition /sys/fs/xfs/device/error/default/
- Common configuration for all other, undefined error conditions
Each directory contains the following configuration files for configuring retry limits:
max_retries
- Controls the maximum number of times that XFS retries the operation.
retry_timeout_seconds
- Specifies the time limit in seconds after which XFS stops retrying the operation.
14.3. Setting XFS behavior for specific conditions
This procedure configures how XFS reacts to specific error conditions.
Procedure
Set the maximum number of retries, the retry time limit, or both:
To set the maximum number of retries, write the desired number to the
max_retries
file:# echo value > /sys/fs/xfs/device/error/metadata/condition/max_retries
To set the time limit, write the desired number of seconds to the
retry_timeout_seconds
file:# echo value > /sys/fs/xfs/device/error/metadata/condition/retry_timeout_second
value is a number between -1 and the maximum possible value of the C signed integer type. This is 2147483647 on 64-bit Linux.
In both limits, the value
-1
is used for continuous retries and0
to stop immediately.device is the name of the device, as found in the
/dev/
directory; for example,sda
.
14.4. Setting XFS behavior for undefined conditions
This procedure configures how XFS reacts to all undefined error conditions, which share a common configuration.
Procedure
Set the maximum number of retries, the retry time limit, or both:
To set the maximum number of retries, write the desired number to the
max_retries
file:# echo value > /sys/fs/xfs/device/error/metadata/default/max_retries
To set the time limit, write the desired number of seconds to the
retry_timeout_seconds
file:# echo value > /sys/fs/xfs/device/error/metadata/default/retry_timeout_seconds
value is a number between -1 and the maximum possible value of the C signed integer type. This is 2147483647 on 64-bit Linux.
In both limits, the value
-1
is used for continuous retries and0
to stop immediately.device is the name of the device, as found in the
/dev/
directory; for example,sda
.
14.5. Setting the XFS unmount behavior
This procedure configures how XFS reacts to error conditions when unmounting the file system.
If you set the fail_at_unmount
option in the file system, it overrides all other error configurations during unmount, and immediately unmounts the file system without retrying the I/O operation. This allows the unmount operation to succeed even in case of persistent errors.
You cannot change the fail_at_unmount
value after the unmount process starts, because the unmount process removes the configuration files from the sysfs
interface for the respective file system. You must configure the unmount behavior before the file system starts unmounting.
Procedure
Enable or disable the
fail_at_unmount
option:To cancel retrying all operations when the file system unmounts, enable the option:
# echo 1 > /sys/fs/xfs/device/error/fail_at_unmount
To respect the
max_retries
andretry_timeout_seconds
retry limits when the file system unmounts, disable the option:# echo 0 > /sys/fs/xfs/device/error/fail_at_unmount
device is the name of the device, as found in the
/dev/
directory; for example,sda
.