第 22 章 Write Barriers
A write barrier is a kernel mechanism used to ensure that file system metadata is correctly written and ordered on persistent storage, even when storage devices with volatile write caches lose power. File systems with write barriers enabled also ensure that data transmitted via
fsync()
is persistent throughout a power loss.
Enabling write barriers incurs a substantial performance penalty for some applications. Specifically, applications that use
fsync()
heavily or create and delete many small files will likely run much slower.
22.1. Importance of Write Barriers
File systems take great care to safely update metadata, ensuring consistency. Journalled file systems bundle metadata updates into transactions and send them to persistent storage in the following manner:
- First, the file system sends the body of the transaction to the storage device.
- Then, the file system sends a commit block.
- If the transaction and its corresponding commit block are written to disk, the file system assumes that the transaction will survive any power failure.
However, file system integrity during power failure becomes more complex for storage devices with extra caches. Storage target devices like local S-ATA or SAS drives may have write caches ranging from 32MB to 64MB in size (with modern drives). Hardware RAID controllers often contain internal write caches. Further, high end arrays, like those from NetApp, IBM, Hitachi and EMC (among others), also have large caches.
Storage devices with write caches report I/O as "complete" when the data is in cache; if the cache loses power, it loses its data as well. Worse, as the cache de-stages to persistent storage, it may change the original metadata ordering. When this occurs, the commit block may be present on disk without having the complete, associated transaction in place. As a result, the journal may replay these uninitialized transaction blocks into the file system during post-power-loss recovery; this will cause data inconsistency and corruption.
How Write Barriers Work
Write barriers are implemented in the Linux kernel via storage write cache flushes before and after the I/O, which is order-critical. After the transaction is written, the storage cache is flushed, the commit block is written, and the cache is flushed again. This ensures that:
- The disk contains all the data.
- No re-ordering has occurred.
With barriers enabled, an
fsync()
call will also issue a storage cache flush. This guarantees that file data is persistent on disk even if power loss occurs shortly after fsync()
returns.