7.3.2.2.5. Optimising for sustained metadata modifications
The size of the log is the main factor in determining the achievable level of sustained metadata modification. The log device is circular, so before the tail can be overwritten all the modifications in the log must be written to the real locations on disk. This can involve a significant amount of seeking to write back all dirty metadata. The default configuration scales the log size in relation to the overall file system size, so in most cases log size will not require tuning.
A small log device will result in very frequent metadata writeback - the log will constantly be pushing on its tail to free up space and so frequently modified metadata will be frequently written to disk, causing operations to be slow.
Increasing the log size increases the time period between tail pushing events. This allows better aggregation of dirty metadata, resulting in better metadata writeback patterns, and less writeback of frequently modified metadata. The trade-off is that larger logs require more memory to track all outstanding changes in memory.
If you have a machine with limited memory, then large logs are not beneficial because memory constraints will cause metadata writeback long before the benefits of a large log can be realised. In these cases, smaller rather than larger logs will often provide better performance because metadata writeback from the log running out of space is more efficient than writeback driven by memory reclamation.
You should always try to align the log to the underlying stripe unit of the device that contains the file system.
mkfs
does this by default for MD and DM devices, but for hardware RAID it may need to be specified. Setting this correctly avoids all possibility of log I/O causing unaligned I/O and subsequent read-modify-write operations when writing modifications to disk.
Log operation can be further improved by editing mount options. Increasing the size of the in-memory log buffers (
logbsize
) increases the speed at which changes can be written to the log. The default log buffer size is MAX
(32 KB, log stripe unit), and the maximum size is 256 KB. In general, a larger value results in faster performance. However, under fsync-heavy workloads, small log buffers can be noticeably faster than large buffers with a large stripe unit alignment.
The
delaylog
mount option also improves sustained metadata modification performance by reducing the number of changes to the log. It achieves this by aggregating individual changes in memory before writing them to the log: frequently modified metadata is written to the log periodically instead of on every modification. This option increases the memory usage of tracking dirty metadata and increases the potential lost operations when a crash occurs, but can improve metadata modification speed and scalability by an order of magnitude or more. Use of this option does not reduce data or metadata integrity when fsync
, fdatasync
or sync
are used to ensure data and metadata is written to disk.