Chapter 30. VDO Integration
30.1. Theoretical Overview of VDO
Virtual Data Optimizer (VDO) is a block virtualization technology that allows you to easily create compressed and deduplicated pools of block storage.
- Deduplication is a technique for reducing the consumption of storage resources by eliminating multiple copies of duplicate blocks.Instead of writing the same data more than once, VDO detects each duplicate block and records it as a reference to the original block. VDO maintains a mapping from logical block addresses, which are used by the storage layer above VDO, to physical block addresses, which are used by the storage layer under VDO.After deduplication, multiple logical block addresses may be mapped to the same physical block address; these are called shared blocks. Block sharing is invisible to users of the storage, who read and write blocks as they would if VDO were not present. When a shared block is overwritten, a new physical block is allocated for storing the new block data to ensure that other logical block addresses that are mapped to the shared physical block are not modified.
- Compression is a data-reduction technique that works well with file formats that do not necessarily exhibit block-level redundancy, such as log files and databases. See Section 30.4.8, “Using Compression” for more detail.
The VDO solution consists of the following components:
kvdo
- A kernel module that loads into the Linux Device Mapper layer to provide a deduplicated, compressed, and thinly provisioned block storage volume
uds
- A kernel module that communicates with the Universal Deduplication Service (UDS) index on the volume and analyzes data for duplicates.
- Command line tools
- For configuring and managing optimized storage.
30.1.1. The UDS Kernel Module (uds
)
The UDS index provides the foundation of the VDO product. For each new piece of data, it quickly determines if that piece is identical to any previously stored piece of data. If the index finds match, the storage system can then internally reference the existing item to avoid storing the same information more than once.
The UDS index runs inside the kernel as the
uds
kernel module.
30.1.2. The VDO Kernel Module (kvdo
)
The
kvdo
Linux kernel module provides block-layer deduplication services within the Linux Device Mapper layer. In the Linux kernel, Device Mapper serves as a generic framework for managing pools of block storage, allowing the insertion of block-processing modules into the storage stack between the kernel's block interface and the actual storage device drivers.
The
kvdo
module is exposed as a block device that can be accessed directly for block storage or presented through one of the many available Linux file systems, such as XFS or ext4. When kvdo
receives a request to read a (logical) block of data from a VDO volume, it maps the requested logical block to the underlying physical block and then reads and returns the requested data.
When
kvdo
receives a request to write a block of data to a VDO volume, it first checks whether it is a DISCARD
or TRIM
request or whether the data is uniformly zero. If either of these conditions holds, kvdo
updates its block map and acknowledges the request. Otherwise, a physical block is allocated for use by the request.
Overview of VDO Write Policies
If the
kvdo
module is operating in synchronous mode:
- It temporarily writes the data in the request to the allocated block and then acknowledges the request.
- Once the acknowledgment is complete, an attempt is made to deduplicate the block by computing a MurmurHash-3 signature of the block data, which is sent to the VDO index.
- If the VDO index contains an entry for a block with the same signature,
kvdo
reads the indicated block and does a byte-by-byte comparison of the two blocks to verify that they are identical. - If they are indeed identical, then
kvdo
updates its block map so that the logical block points to the corresponding physical block and releases the allocated physical block. - If the VDO index did not contain an entry for the signature of the block being written, or the indicated block does not actually contain the same data,
kvdo
updates its block map to make the temporary physical block permanent.
If
kvdo
is operating in asynchronous mode:
- Instead of writing the data, it will immediately acknowledge the request.
- It will then attempt to deduplicate the block in same manner as described above.
- If the block turns out to be a duplicate,
kvdo
will update its block map and release the allocated block. Otherwise, it will write the data in the request to the allocated block and update the block map to make the physical block permanent.
30.1.3. VDO Volume
VDO uses a block device as a backing store, which can include an aggregation of physical storage consisting of one or more disks, partitions, or even flat files. When a VDO volume is created by a storage management tool, VDO reserves space from the volume for both a UDS index and the VDO volume, which interact together to provide deduplicated block storage to users and applications. Figure 30.1, “VDO Disk Organization” illustrates how these pieces fit together.
Figure 30.1. VDO Disk Organization
Slabs
The physical storage of the VDO volume is divided into a number of slabs, each of which is a contiguous region of the physical space. All of the slabs for a given volume will be of the same size, which may be any power of 2 multiple of 128 MB up to 32 GB.
The default slab size is 2 GB in order to facilitate evaluating VDO on smaller test systems. A single VDO volume may have up to 8192 slabs. Therefore, in the default configuration with 2 GB slabs, the maximum allowed physical storage is 16 TB. When using 32 GB slabs, the maximum allowed physical storage is 256 TB. At least one entire slab is reserved by VDO for metadata, and therefore cannot be used for storing user data.
Slab size has no effect on the performance of the VDO volume.
Physical Volume Size | Recommended Slab Size |
---|---|
10–99 GB | 1 GB |
100 GB – 1 TB | 2 GB |
2–256 TB | 32 GB |
The size of a slab can be controlled by providing the
--vdoSlabSize=megabytes
option to the vdo create
command.
Physical Size and Available Physical Size
Both physical size and available physical size describe the amount of disk space on the block device that VDO can utilize:
- Physical size is the same size as the underlying block device. VDO uses this storage for:
- User data, which might be deduplicated and compressed
- VDO metadata, such as the UDS index
- Available physical size is the portion of the physical size that VDO is able to use for user data.It is equivalent to the physical size minus the size of the metadata, minus the remainder after dividing the volume into slabs by the given slab size.
For examples of how much storage VDO metadata require on block devices of different sizes, see Section 30.2.3, “Examples of VDO System Requirements by Physical Volume Size”.
Logical Size
If the
--vdoLogicalSize
option is not specified, the logical volume size defaults to the available physical volume size. Note that, in Figure 30.1, “VDO Disk Organization”, the VDO deduplicated storage target sits completely on top of the block device, meaning the physical size of the VDO volume is the same size as the underlying block device.
VDO currently supports any logical size up to 254 times the size of the physical volume with an absolute maximum logical size of 4PB.