Accueiil
Products
Red Hat Enterprise Linux
10
Managing file systems
Chapter 26. Factors affecting I/O and file system performance

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 26. Factors affecting I/O and file system performance

The appropriate settings for storage and file system performance are highly dependent on the storage purpose. I/O and file system performance can be affected by various factors.

Below is a list of factors that can affect I/O and file system performance:

Data write or read patterns
Sequential or random
Buffered or Direct IO
Data alignment with underlying geometry
Block size
File system size
Journal size and location
Recording access times
Ensuring data reliability
Pre-fetching data
Pre-allocating disk space
File fragmentation
Resource contention

26.1. Tools for monitoring and diagnosing I/O and file system issues
Copier lien

Monitor and diagnose I/O and file system issues efficiently by using tools that track performance metrics, analyze device load, latency, and trace operations. These tools help to pinpoint bottlenecks and optimize system performance in Red Hat Enterprise Linux 10 environments.

The following tools are available in Red Hat Enterprise Linux 10 for monitoring system performance and diagnosing performance problems related to I/O and file systems:

vmstat tool reports on processes, memory, paging, block I/O, interrupts, and CPU activity across the entire system. It can help administrators determine whether the I/O subsystem is responsible for any performance issues. If vmstat analysis shows that the I/O subsystem causes reduced performance, administrators can use iostat to identify the responsible I/O device.
iostat reports on I/O device load in your system. It is provided by the sysstat package.
blktrace provides detailed information about how time is spent in the I/O subsystem. The companion utility blkparse reads the raw output from blktrace and produces a human readable summary of recorded input and output operations.
btt analyzes blktrace output and displays the amount of time that data spends in each area of the I/O stack. This makes it easier to spot bottlenecks in the I/O subsystem. This utility is provided as part of the blktrace package. Some of the important events tracked by the blktrace mechanism and analyzed by btt are:
- Queuing of the I/O event (Q)
- Dispatch of the I/O to the driver event (D)
- Completion of I/O event (C)
iowatcher can use the blktrace output to graph I/O over time. It focuses on the Logical Block Address (LBA) of disk I/O, throughput, seeks per second, and I/O operations per second. This can help to identify when you are hitting the operations-per-second limit of a device.
BPF Compiler Collection (BCC) is a library, which facilitates the creation of the extended Berkeley Packet Filter (eBPF) programs. The eBPF programs are triggered on events, such as disk I/O, TCP connections, and process creations.
The BCC tools are installed in the /usr/share/bcc/tools/ directory. The following bcc-tools helps to analyze performance:
- biolatency summarizes the latency in block device I/O (disk I/O) in histogram. This allows the distribution to be studied, including two modes for device cache hits and for cache misses, and latency outliers.
- biosnoop is a basic block I/O tracing tool for displaying each I/O event along with the issuing process ID and I/O latency. Using this tool, you can investigate disk I/O performance issues.
- biotop is used for block i/o operations in the kernel.
- filelife tool traces the stat() syscalls.
- fileslower traces slow synchronous file reads and writes.
- filetop displays file reads and writes by process.
- ext4slower, nfsslower, and xfsslower are tools that show file system operations slower than a certain threshold, which defaults to 10ms.
bpftace is a tracing language for eBPF used for analyzing performance issues. It also provides trace utilities like BCC for system observation, which is useful for investigating I/O performance issues.
The following SystemTap scripts may be useful in diagnosing storage or file system performance problems:
- disktop.stp: Checks the status of reading or writing disk every 5 seconds and outputs the top ten entries during that period.
- iotime.stp: Prints the amount of time spent on read and write operations, and the number of bytes read and written.
- traceio.stp: Prints the top ten executable based on cumulative I/O traffic observed, every second.
- traceio2.stp: Prints the executable name and process identifier as reads and writes to the specified device occur.
- Inodewatch.stp: Prints the executable name and process identifier each time a read or write occurs to the specified inode on the specified device.
- inodewatch2.stp: Prints the executable name, process identifier, and attributes each time the attributes are changed on the specified inode on the specified inode.

For more information, see:

vmstat(8), iostat(1), blktrace(8), blkparse(1), btt(1), bpftrace, and iowatcher(1) man pages on your system.

26.2. Available tuning options for formatting a file system
Copier lien

Some file system configuration decisions cannot be changed after the device is formatted. These include the size, block size, geometry, and external journals.

The following are the details of the options that are available before formatting a storage device:

Size

Create an appropriately-sized file system for your workload. Smaller file systems require less time and memory for file system checks. However, if a file system is too small, its performance suffers from high fragmentation.

Block size

The block is the unit of work for the file system. The block size determines how much data can be stored in a single block. It therefore sets the smallest data amount written or read at one time.

The default block size is appropriate for most use cases. However, your file system performs better if the block size matches the typical read or write amount. Optimal performance occurs when the block size equals or slightly exceeds the data typically accessed at once.

A small file still uses an entire block. Files can be spread across multiple blocks, but this can create additional runtime overhead.

Additionally, some file systems are limited to a certain number of blocks, which limits the maximum size of the file system. Block size is specified as part of the file system options when formatting a device with the mkfs command. The parameter that specifies the block size varies with the file system.

Geometry

File system geometry is concerned with the distribution of data across a file system. If your system uses striped storage like RAID, align data and metadata with the underlying storage geometry when formatting. This improves performance.

Many devices export recommended geometry, which is then set automatically when the devices are formatted with a particular file system. If your device does not export these recommendations, or you want to change them, specify geometry manually when formatting with mkfs.

The parameters that specify file system geometry vary with the file system.

External journals

Journaling file systems document changes in a journal file before running write operations. This reduces the likelihood of device corruption during system crashes or power failures. It also speeds up recovery.

Note

It is preferable to not use the external journals option.

Metadata-intensive workloads involve very frequent updates to the journal. A larger journal uses more memory, but reduces the frequency of write operations. Additionally, you can improve the seek time of a device with a metadata-intensive workload by placing its journal on dedicated storage. Use storage as fast as, or faster than the primary storage.

Warning

Ensure that external journals are reliable. Losing an external journal device causes file system corruption. External journals must be created at format time, with journal devices being specified at mount time.

26.3. Available tuning options for mounting a file system
Copier lien

You can explore key tuning options for mounting file systems, including atime, noatime, and read-ahead settings, to select mount options that balance performance and functionality for different workloads.

The following are the options available to most file systems and can be specified as the device is mounted:

Access Time

Every time a file is read, its metadata is updated with the time at which access occurred (atime). This involves additional write I/O. The relatime is the default atime setting for most file systems.

However, if updating this metadata is time consuming, and if accurate access time data is not required, you can mount the file system with the noatime mount option. This disables updates to metadata when a file is read. It also enables nodiratime behavior, which disables updates to metadata when a directory is read.

Note

Disabling atime updates by using the noatime mount option can break applications that rely on them, for example, backup programs.

Read-ahead

Read-ahead behavior speeds up file access by pre-fetching data that is likely to be needed soon and loading it into the page cache, where it can be retrieved more quickly than if it were on disk. The higher the read-ahead value, the further ahead the system pre-fetches data.

Red Hat Enterprise Linux attempts to set an appropriate read-ahead value based on what it detects about your file system. However, accurate detection is not always possible. For example, if a storage array presents itself to the system as a single LUN, the system detects the single LUN, and does not set the appropriate read-ahead value for an array.

Workloads that involve heavy streaming of sequential I/O often benefit from high read-ahead values. The storage-related tuned profiles provided with Red Hat Enterprise Linux raise the read-ahead value, as does using LVM striping, but these adjustments are not always sufficient for all workloads.

26.4. Discarding blocks that are unused
Copier lien

Regularly discarding blocks that are not in use by the file system is a good practice for both solid-state disks and thinly-provisioned storage.

26.5. Solid-state disks tuning considerations
Copier lien

Solid-state disks (SSD) use NAND flash chips rather than rotating magnetic platters to store persistent data. SSD provides a constant access time for data across their full Logical Block Address range. It does not incur measurable seek costs like their rotating counterparts.

They are more expensive per gigabyte of storage space and have a lesser storage density. However, they also have lower latency and greater throughput than Hard Disk Drives (HDD)s.

Performance generally degrades as the used blocks on an SSD approach the capacity of the disk. The degree of degradation varies by vendor, but all devices experience degradation in this circumstance. Enabling discard behavior can help to alleviate this degradation.

The default I/O scheduler and virtual memory options are suitable for use with SSDs. Consider the following factors when configuring settings that can affect SSD performance:

I/O Scheduler

Any I/O scheduler is expected to perform well with most SSDs. However, as with any other storage type, benchmark to determine the optimal configuration for a given workload. When using SSDs, change the I/O scheduler only to benchmark particular workloads.

For instructions on how to switch between I/O schedulers, see the /usr/share/doc/kernel-version/Documentation/block/switching-sched.txt file.

For single queue Host Bus Adapter (HBA), the default I/O scheduler is deadline. For multiple queue HBA, the default I/O scheduler is none.

Virtual Memory

Like the I/O scheduler, virtual memory (VM) subsystem requires no special tuning. Given the fast nature of I/O on SSD, try turning down the vm_dirty_background_ratio and vm_dirty_ratio settings. Increased write-out activity does not usually have a negative impact on the latency of other operations on the disk. However, this tuning can generate more overall I/O, and is therefore not generally preferable without workload-specific testing.

Swap

An SSD can also be used as a swap device, and is likely to produce good page-out and page-in performance.

26.6. Generic block device tuning parameters
Copier lien

The generic tuning parameters listed here are available in the /sys/block/sdX/queue/ directory.

The following listed tuning parameters are separate from I/O scheduler tuning, and are applicable to all I/O schedulers:

add_random

Some I/O events contribute to the entropy pool for the /dev/random. This parameter can be set to 0 if the overhead of these contributions become measurable.

iostats

By default, iostats is enabled and the default value is 1. Setting iostats to 0 disables gathering of I/O statistics for the device. This removes a small amount of overhead with the I/O path.

Setting iostats to 0 might improve performance for high performance devices, such as certain Non-volatile Memory Express (NVMe) storage devices. It is preferable to leave iostats enabled unless otherwise specified for the given storage model by the vendor.

If you disable iostats, the I/O statistics for the device are no longer present within the /proc/diskstats file. The content of /sys/diskstats file is the source of I/O information for monitoring I/O tools, such as sar or iostats. Therefore, if you disable the iostats parameter for a device, it is no longer present in the output of I/O monitoring tools.

max_sectors_kb

Specifies the maximum size of an I/O request in kilobytes. The default value is 512 KB. The minimum value for this parameter is determined by the logical block size of the storage device. The maximum value for this parameter is determined by the value of the max_hw_sectors_kb.

max_sectors_kb must always be a multiple of the optimal I/O size and the internal erase block size. Use a value of logical_block_size for either parameter if they are zero or not specified by the storage device.

nomerges

Most workloads benefit from request merging. However, disabling merges can be useful for debugging purposes. By default, the nomerges parameter is set to 0, which enables merging. To disable simple one-hit merging, set nomerges to 1. and to disable all types of merging, set nomerges to 2.

nr_requests

It is the maximum allowed number of the queued I/O. If the current I/O scheduler is none, this number can only be reduced; otherwise the number can be increased or reduced.

optimal_io_size

Some storage devices report an optimal I/O size through this parameter. If this value is reported, applications issue I/O aligned to and in multiples of the optimal I/O size wherever possible.

read_ahead_kb

Defines the maximum number of kilobytes that the operating system may read ahead during a sequential read operation. As a result, the necessary information is already present within the kernel page cache for the next sequential read. This improves read I/O performance.

Device mappers often benefit from a high read_ahead_kb value. 128 KB for each device to be mapped is a good starting point. Increasing the read_ahead_kb value up to request queue’s max_sectors_kb of the disk might improve performance where sequential reading of large files occur.

rotational

Some solid-state disks do not correctly advertise their solid-state status, and are mounted as traditional rotational disks. Manually set the rotational value to 0 to disable unnecessary seek-reducing logic in the scheduler.

rq_affinity

The default value of the rq_affinity is 1. It completes the I/O operations on one CPU core, which is in the same CPU group of the issued CPU core. To perform completions only on the processor that issued the I/O request, set the rq_affinity to 2. To disable the mentioned two abilities, set it to 0.

scheduler

To set the scheduler or scheduler preference order for a storage device, edit the /sys/block/devname/queue/scheduler. Replace devname with the device name you want to configure.

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 26. Factors affecting I/O and file system performance

26.1. Tools for monitoring and diagnosing I/O and file system issues
Copier lien

26.2. Available tuning options for formatting a file system
Copier lien

26.3. Available tuning options for mounting a file system
Copier lien

26.4. Discarding blocks that are unused
Copier lien

26.5. Solid-state disks tuning considerations
Copier lien

26.6. Generic block device tuning parameters
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de Red Hat

Rendre l’open source plus inclusif

À propos de la documentation Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 26. Factors affecting I/O and file system performance

26.1. Tools for monitoring and diagnosing I/O and file system issuesCopier lienLien copié sur presse-papiers!

26.2. Available tuning options for formatting a file systemCopier lienLien copié sur presse-papiers!

26.3. Available tuning options for mounting a file systemCopier lienLien copié sur presse-papiers!

26.4. Discarding blocks that are unusedCopier lienLien copié sur presse-papiers!

26.5. Solid-state disks tuning considerationsCopier lienLien copié sur presse-papiers!

26.6. Generic block device tuning parametersCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de Red Hat

Rendre l’open source plus inclusif

À propos de la documentation Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

26.1. Tools for monitoring and diagnosing I/O and file system issues
Copier lien

26.2. Available tuning options for formatting a file system
Copier lien

26.3. Available tuning options for mounting a file system
Copier lien

26.4. Discarding blocks that are unused
Copier lien

26.5. Solid-state disks tuning considerations
Copier lien

26.6. Generic block device tuning parameters
Copier lien