19.8. LVM Cache for Red Hat Gluster Storage
Important
LVM Cache must be used with Red Hat Gluster Storage only on Red Hat Enterprise Linux 7.4 or later. This release includes a number of fixes and enhancements that are critical to a positive experience with caching.
19.8.1. About LVM Cache
An LVM Cache logical volume (LV) can be used to improve the performance of a block device by attaching to it a smaller and much faster device to act as a data acceleration layer. When a cache is attached to an LV, the Linux kernel subsystems attempt to keep 'hot' data copies in the fast cache layer at the block level. Additionally, as space in the cache allows, writes are made initially to the cache layer. The results can be better Input/Output (I/O) performance improvements for many workloads.
19.8.1.1. LVM Cache vs. DM-Cache
dm-cache
refers to the Linux kernel-level device-mapper subsystem that is responsible for all I/O transactions. For most usual operations, the administrator interfaces with the logical volume manager (LVM) as a much simpler abstraction layer above device-mapper. As such, lvmcache
is simply part of the LVM system acting as an abstraction layer for the dm-cache
subsystem.
19.8.1.2. LVM Cache vs. Gluster Tiered Volumes
Red Hat Gluster Storage supports tiered volumes, which are often configured with the same type of fast devices backing the fast tier bricks. The operation of tiering is at the file level and is distributed across the trusted storage pool (TSP). These tiers operate by moving files between the tiers based on tunable algorithms, such that files are migrated between tiers rather than copied.
In contrast, LVM Cache operates locally at each block device backing the bricks and does so at the block level. LVM Cache stores copies of the hot data in the fast layer using a non-tunable algorithm (though chunk sizes may be tuned for optimal performance).
For most workloads, LVM Cache tends to offer greater performance compared to tiering. However, for certain types of workloads where a large number of clients are consistently accessing the same hot file data set, or where writes can consistently go to the hot tier, tiering may prove more beneficial than LVM Cache.
19.8.1.3. Arbiter Bricks
Arbiter bricks operate by storing all file metadata transactions but not data transactions in order to prevent split-brain problems without the overhead of a third data copy. It is important to understand that file metadata is stored with the file, and so arbiter bricks effectively store empty copies of all files.
In a distributed system such as Red Hat Gluster Storage, latency can greatly affect the performance of file operations, especially when files are very small and file-based transactions are very high. With such small files, the overhead of the metadata latency can be more impactful to performance than the throughput of the I/O subsystems. Therefore, it is important when creating arbiter bricks that the backing storage devices be as fast as the fastest data storage devices. Therefore, when using LVM Cache to accelerate your data volumes with fast devices, you must allocate the same class of fast devices to serve as your arbiter brick backing devices, otherwise your slow arbiter bricks could negate the performance benefits of your cache-accelerated data bricks.
19.8.1.4. Writethrough vs. Writeback
LVM Cache can operate in either writethrough or writeback mode, with writethrough being the default. In writethrough mode, any data written is stored both in the cache layer and in the main data layer. The loss of a device associated with the cache layer in this case would not mean the loss of any data.
Writeback mode delays the writing of data blocks from the cache layer to the main data layer. This mode can increase write performance, but the loss of a device associated with the cache layer can result in lost data locally.
Note
Data resiliency protects from global data loss in the case of a writeback cache device failure under most circumstances, but edge cases could lead to inconsistent data that cannot be automatically healed.
19.8.1.5. Cache-Friendly Workloads
While LVM Cache has been demonstrated to improve performance for Red Hat Gluster Storage under many use cases, the relative effects vary based on the workload. The benefits of block-based caching means that LVM Cache can be efficient for even larger file workloads. However, some workloads may see little-to-no benefit from LVM Cache, and highly-random workloads or those with very large working sets may even experience a performance degradation. It is highly recommended that you understand your workload and test accordingly before making a significant investment in hardware to accelerate your storage workload.
19.8.2. Choosing the Size and Speed of Your Cache Devices
Sizing a cache appropriately to a workload can be a complicated study, particularly in Red Hat Gluster Storage where the cache is local to the bricks rather than global to the volume. In general, you want to understand the size of your working set as a percentage of your total data set and then size your cache layer with some headroom (10-20%) beyond that working set size to allow for efficient flushes and room to cache new writes. Optimally, the entire working set is kept in the cache, and the overall performance you experience is near that of storing your data directly on the fast devices.
When heavily stressed by a working set that is not well-suited for the cache size, you will begin to see a higher percentage of cache misses and your performance will be inconsistent. You may find that as this cache-to-data imbalance increases, a higher percentage of data operations will drop to the speed of the slower data device. From the perspective of a user, this can sometimes be more frustrating than a device that is consistently slow. Understanding and testing your own workload is essential to making an appropriate cache sizing decision.
When choosing your cache devices, always consider high-endurance enterprise-class drives. These are typically tuned to either read or write intensive workloads, so be sure to inspect the hardware performance details when making your selection. Pay close attention to latency alongside IOPS or throughput, as the high transaction activity of a cache will benefit significantly from lower-latency hardware. When possible, select NVMe devices that use the PCI bus directly rather than SATA/SAS devices, as this will additionally benefit latency.
19.8.3. Configuring LVM Cache
A cache pool is created using logical volume manager (LVM) with fast devices as the physical volumes (PVs). The cache pool is then attached to an existing thin pool (TP) or thick logical volume (LV). Once this is done, block-level caching is immediately enabled for the configured LV, and the dm-cache algorithms will work to keep hot copies of data on the cache pool sub-volume.
Warning
Adding or removing cache pools can be done on active volumes, even with mounted filesystems in use. However, there is overhead to the operation and performance impacts will be seen, especially when removing a cache volume in writeback mode, as a full data sync will need to occur. As with any changes to the I/O stack, there is risk of data loss. All changes must be made with the requisite caution.
In the following example commands, we assume the use of a high-performance NVMe PCI device for caching. These devices typically present with device file paths such as
/dev/nvme0n1
. A SATA/SAS device will likely present with a device path such as /dev/sdb
. The following example naming has been used:
- Physical Volume (PV) Name:
/dev/nvme0n1
- Volume Group (VG) Name:
GVG
- Thin pool name:
GTP
- Logical Volume (LV) name:
GLV
Note
There are several different ways to configure LVM Cache. Following is the most simple approach applicable to most use cases. For details and further command examples, see
lvmcache(7)
.
- Create a PV for your fast data device.
# pvcreate /dev/nvme0n1
- Add the fast data PV to the VG that hosts the LV you intend to cache.
# vgextend GVG /dev/nvme0n1
- Create the cache pool from your fast data device, reserving space required for metadata during the cache conversion process of your LV.
# lvcreate --type cache-pool -l 100%FREE -n cpool GVG /dev/nvme0n1
- Convert your existing data thin pool LV into a cache LV.
# lvconvert --type cache --cachepool GVG/cpool GVG/GTP
19.8.4. Managing LVM Cache
19.8.4.1. Changing the Mode of an Existing Cache Pool
An existing cache LV can be converted between writethrough and writeback modes with the
lvchange
command. For thin LVs, the command must be run against the tdata subvolume.
# lvchange --cachemode writeback GVG/GTP_tdata
19.8.4.2. Checking Your Configuration
Use the
lsblk
command to view the new virtual block device layout.
# lsblk /dev/{sdb,nvme0n1} NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb 8:16 0 9.1T 0 disk └─GVG-GTP_tdata_corig 253:9 0 9.1T 0 lvm └─GVG-GTP_tdata 253:3 0 9.1T 0 lvm └─GVG-GTP-tpool 253:4 0 9.1T 0 lvm ├─GVG-GTP 253:5 0 9.1T 0 lvm └─GVG-GLV 253:6 0 9.1T 0 lvm /mnt nvme0n1 259:0 0 745.2G 0 disk ├─GVG-GTP_tmeta 253:2 0 76M 0 lvm │ └─GVG-GTP-tpool 253:4 0 9.1T 0 lvm │ ├─GVG-GTP 253:5 0 9.1T 0 lvm │ └─GVG-GLV 253:6 0 9.1T 0 lvm /mnt ├─GVG-cpool_cdata 253:7 0 701.1G 0 lvm │ └─GVG-GTP_tdata 253:3 0 9.1T 0 lvm │ └─GVG-GTP-tpool 253:4 0 9.1T 0 lvm │ ├─GVG-GTP 253:5 0 9.1T 0 lvm │ └─GVG-GLV 253:6 0 9.1T 0 lvm /mnt ├─GVG-cpool_cmeta 253:8 0 48M 0 lvm │ └─GVG-GTP_tdata 253:3 0 9.1T 0 lvm │ └─GVG-GTP-tpool 253:4 0 9.1T 0 lvm │ ├─GVG-GTP 253:5 0 9.1T 0 lvm │ └─GVG-GLV 253:6 0 9.1T 0 lvm /mnt └─GVG-GTP_tdata_corig 253:9 0 9.1T 0 lvm └─GVG-GTP_tdata 253:3 0 9.1T 0 lvm └─GVG-GTP-tpool 253:4 0 9.1T 0 lvm ├─GVG-GTP 253:5 0 9.1T 0 lvm └─GVG-GLV 253:6 0 9.1T 0 lvm /mnt
The
lvs
command displays a number of valuable columns to show the status of your cache pool and volume. For more details, see lvs(8)
.
# lvs -a -o name,vg_name,size,pool_lv,devices,cachemode,chunksize LV VG LSize Pool Devices CacheMode Chunk GLV GVG 9.10t GTP 0 GTP GVG <9.12t GTP_tdata(0) 8.00m [GTP_tdata] GVG <9.12t [cpool] GTP_tdata_corig(0) writethrough 736.00k [GTP_tdata_corig] GVG <9.12t /dev/sdb(0) 0 [GTP_tdata_corig] GVG <9.12t /dev/nvme0n1(185076) 0 [GTP_tmeta] GVG 76.00m /dev/nvme0n1(185057) 0 [cpool] GVG <701.10g cpool_cdata(0) writethrough 736.00k [cpool_cdata] GVG <701.10g /dev/nvme0n1(24) 0 [cpool_cmeta] GVG 48.00m /dev/nvme0n1(12) 0 [lvol0_pmspare] GVG 76.00m /dev/nvme0n1(0) 0 [lvol0_pmspare] GVG 76.00m /dev/nvme0n1(185050) 0 root vg_root 50.00g /dev/sda3(4095) 0 swap vg_root <16.00g /dev/sda3(0) 0
Some of the useful columns from the
lvs
command that can be used to monitor the effectiveness of the cache and to aid in sizing decisions are:
- CacheTotalBlocks
- CacheUsedBlocks
- CacheDirtyBlocks
- CacheReadHits
- CacheReadMisses
- CacheWriteHits
- CacheWriteMisses
You will see a high ratio of Misses to Hits when the cache is cold (freshly attached to the LV). However, with a warm cache (volume online and transacting data for a sufficiently long period of time), high ratios here are indicative of an undersized cache device.
# lvs -a -o devices,cachetotalblocks,cacheusedblocks, \ cachereadhits,cachereadmisses | egrep 'Devices|cdata' Devices CacheTotalBlocks CacheUsedBlocks CacheReadHits CacheReadMisses cpool_cdata(0) 998850 2581 1 192
19.8.4.3. Detaching a Cache Pool
You can split a cache pool from an LV in one command, leaving the data LV in an un-cached state with all data intact and the cache pool still existing but unattached. In writeback mode this can take a long time to complete while all data is synced. This may also negatively impact performance while it is running.
# lvconvert --splitcache GVG/cpool