Chapter 7. File Systems
7.1. Tuning Considerations for File Systems
7.1.1. Formatting Options
Block size can be selected at mkfs
time. The range of valid sizes depends on the system: the upper limit is the maximum page size of the host system, while the lower limit depends on the file system used. The default block size is appropriate for most use cases.
If your system uses striped storage such as RAID5, you can improve performance by aligning data and metadata with the underlying storage geometry at mkfs
time. For software RAID (LVM or MD) and some enterprise hardware storage, this information is queried and set automatically, but in many cases the administrator must specify this geometry manually with mkfs
at the command line.
Metadata-intensive workloads mean that the log section of a journaling file system (such as ext4 and XFS) is updated extremely frequently. To minimize seek time from file system to journal, you can place the journal on dedicated storage. Note, however, that placing the journal on external storage that is slower than the primary file system can nullify any potential advantage associated with using external storage.
Warning
mkfs
time, with journal devices being specified at mount time. Refer to the mke2fs(8)
, mkfs.xfs(8)
, and mount(8)
man pages for further information.
7.1.2. Mount Options
A write barrier is a kernel mechanism used to ensure that file system metadata is correctly written and ordered on persistent storage, even when storage devices with volatile write caches lose power. File systems with write barriers enabled also ensure that any data transmitted via fsync()
persists across a power outage. Red Hat Enterprise Linux enables barriers by default on all hardware that supports them.
fsync()
heavily, or create and delete many small files. For storage with no volatile write cache, or in the rare case where file system inconsistencies and data loss after a power loss is acceptable, barriers can be disabled by using the nobarrier
mount option. For further information, refer to the Storage Administration Guide.
Historically, when a file is read, the access time (atime
) for that file must be updated in the inode metadata, which involves additional write I/O. If accurate atime
metadata is not required, mount the file system with the noatime
option to eliminate these metadata updates. In most cases, however, atime
is not a large overhead due to the default relative atime (or relatime
) behavior in the Red Hat Enterprise Linux 6 kernel. The relatime
behavior only updates atime
if the previous atime
is older than the modification time (mtime
) or status change time (ctime
).
Note
noatime
option also enables nodiratime
behavior; there is no need to set both noatime
and nodiratime
.
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk. Some workloads, such as those involving heavy streaming of sequential I/O, benefit from high read-ahead values.
blockdev
command to view and edit the read-ahead value. To view the current read-ahead value for a particular block device, run:
# blockdev -getra device
# blockdev -setra N device
blockdev
command will not persist between boots. We recommend creating a run level init.d
script to set this value during boot.
7.1.3. File system maintenance
Batch discard and online discard operations are features of mounted file systems that discard blocks which are not in use by the file system. These operations are useful for both solid-state drives and thinly-provisioned storage.
fstrim
command. This command discards all unused blocks in a file system that match the user's criteria. Both operation types are supported for use with the XFS and ext4 file systems in Red Hat Enterprise Linux 6.2 and later as long as the block device underlying the file system supports physical discard operations. Physical discard operations are supported if the value of /sys/block/device/queue/discard_max_bytes
is not zero.
-o discard
option (either in /etc/fstab
or as part of the mount
command), and run in realtime without user intervention. Online discard operations only discard blocks that are transitioning from used to free. Online discard operations are supported on ext4 file systems in Red Hat Enterprise Linux 6.2 and later, and on XFS file systems in Red Hat Enterprise Linux 6.4 and later.
7.1.4. Application Considerations
The ext4, XFS, and GFS2 file systems support efficient space pre-allocation via the fallocate(2)
glibc call. In cases where files may otherwise become badly fragmented due to write patterns, leading to poor read performance, space preallocation can be a useful technique. Pre-allocation marks disk space as if it has been allocated to a file, without writing any data into that space. Until real data is written to a pre-allocated block, read operations will return zeroes.