Search

Chapter 1. Deploying VDO

download PDF

As a system administrator, you can use VDO to create deduplicated and compressed storage pools.

1.1. Introduction to VDO

Virtual Data Optimizer (VDO) provides inline data reduction for Linux in the form of deduplication, compression, and thin provisioning. When you set up a VDO volume, you specify a block device on which to construct your VDO volume and the amount of logical storage you plan to present.

  • When hosting active VMs or containers, Red Hat recommends provisioning storage at a 10:1 logical to physical ratio: that is, if you are utilizing 1 TB of physical storage, you would present it as 10 TB of logical storage.
  • For object storage, such as the type provided by Ceph, Red Hat recommends using a 3:1 logical to physical ratio: that is, 1 TB of physical storage would present as 3 TB logical storage.

In either case, you can simply put a file system on top of the logical device presented by VDO and then use it directly or as part of a distributed cloud storage architecture.

Because VDO is thinly provisioned, the file system and applications only see the logical space in use and are not aware of the actual physical space available. Use scripting to monitor the actual available space and generate an alert if use exceeds a threshold: for example, when the VDO volume is 80% full.

Additional resources

1.2. VDO deployment scenarios

You can deploy VDO in a variety of ways to provide deduplicated storage for:

  • both block and file access
  • both local and remote storage

Because VDO exposes its deduplicated storage as a standard Linux block device, you can use it with standard file systems, iSCSI and FC target drivers, or as unified storage.

Note

Deployment of VDO volumes on top of Ceph RADOS Block Device (RBD) is currently supported. However, the deployment of Red Hat Ceph Storage cluster components on top of VDO volumes is currently not supported.

KVM

You can deploy VDO on a KVM server configured with Direct Attached Storage.

VDO Deployment with KVM

File systems

You can create file systems on top of VDO and expose them to NFS or CIFS users with the NFS server or Samba.

Deduplicated NAS

Placement of VDO on iSCSI

You can export the entirety of the VDO storage target as an iSCSI target to remote iSCSI initiators.

Deduplicated block storage target

When creating a VDO volume on iSCSI, you can place the VDO volume above or below the iSCSI layer. Although there are many considerations to be made, some guidelines are provided here to help you select the method that best suits your environment.

When placing the VDO volume on the iSCSI server (target) below the iSCSI layer:

  • The VDO volume is transparent to the initiator, similar to other iSCSI LUNs. Hiding the thin provisioning and space savings from the client makes the appearance of the LUN easier to monitor and maintain.
  • There is decreased network traffic because there are no VDO metadata reads or writes, and read verification for the dedupe advice does not occur across the network.
  • The memory and CPU resources being used on the iSCSI target can result in better performance. For example, the ability to host an increased number of hypervisors because the volume reduction is happening on the iSCSI target.
  • If the client implements encryption on the initiator and there is a VDO volume below the target, you will not realize any space savings.

When placing the VDO volume on the iSCSI client (initiator) above the iSCSI layer:

  • There is a potential for lower network traffic across the network in ASYNC mode if achieving high rates of space savings.
  • You can directly view and control the space savings and monitor usage.
  • If you want to encrypt the data, for example, using dm-crypt, you can implement VDO on top of the crypt and take advantage of space efficiency.

LVM

On more feature-rich systems, you can use LVM to provide multiple logical unit numbers (LUNs) that are all backed by the same deduplicated storage pool.

In the following diagram, the VDO target is registered as a physical volume so that it can be managed by LVM. Multiple logical volumes (LV1 to LV4) are created out of the deduplicated storage pool. In this way, VDO can support multiprotocol unified block or file access to the underlying deduplicated storage pool.

Deduplicated unified storage

Deduplicated unified storage design enables for multiple file systems to collectively use the same deduplication domain through the LVM tools. Also, file systems can take advantage of LVM snapshot, copy-on-write, and shrink or grow features, all on top of VDO.

Encryption

Device Mapper (DM) mechanisms such as DM Crypt are compatible with VDO. Encrypting VDO volumes helps ensure data security, and any file systems above VDO are still deduplicated.

Using VDO with encryption
Important

Applying the encryption layer above VDO results in little if any data deduplication. Encryption makes duplicate blocks different before VDO can deduplicate them.

Always place the encryption layer below VDO.

1.3. Components of a VDO volume

VDO uses a block device as a backing store, which can include an aggregation of physical storage consisting of one or more disks, partitions, or even flat files. When a storage management tool creates a VDO volume, VDO reserves volume space for the UDS index and VDO volume. The UDS index and the VDO volume interact together to provide deduplicated block storage.

Figure 1.1. VDO disk organization

VDO disk organization

The VDO solution consists of the following components:

kvdo

A kernel module that loads into the Linux Device Mapper layer provides a deduplicated, compressed, and thinly provisioned block storage volume.

The kvdo module exposes a block device. You can access this block device directly for block storage or present it through a Linux file system, such as XFS or ext4.

When kvdo receives a request to read a logical block of data from a VDO volume, it maps the requested logical block to the underlying physical block and then reads and returns the requested data.

When kvdo receives a request to write a block of data to a VDO volume, it first checks whether the request is a DISCARD or TRIM request or whether the data is uniformly zero. If either of these conditions is true, kvdo updates its block map and acknowledges the request. Otherwise, VDO processes and optimizes the data.

uds

A kernel module that communicates with the Universal Deduplication Service (UDS) index on the volume and analyzes data for duplicates. For each new piece of data, UDS quickly determines if that piece is identical to any previously stored piece of data. If the index finds a match, the storage system can then internally reference the existing item to avoid storing the same information more than once.

The UDS index runs inside the kernel as the uds kernel module.

Command line tools
For configuring and managing optimized storage.

1.4. The physical and logical size of a VDO volume

VDO utilizes physical, available physical, and logical size in the following ways:

Physical size

This is the same size as the underlying block device. VDO uses this storage for:

  • User data, which might be deduplicated and compressed
  • VDO metadata, such as the UDS index
Available physical size

This is the portion of the physical size that VDO is able to use for user data

It is equivalent to the physical size minus the size of the metadata, minus the remainder after dividing the volume into slabs by the given slab size.

Logical Size

This is the provisioned size that the VDO volume presents to applications. It is usually larger than the available physical size. If the --vdoLogicalSize option is not specified, then the provisioning of the logical volume is now provisioned to a 1:1 ratio. For example, if a VDO volume is put on top of a 20 GB block device, then 2.5 GB is reserved for the UDS index (if the default index size is used). The remaining 17.5 GB is provided for the VDO metadata and user data. As a result, the available storage to consume is not more than 17.5 GB, and can be less due to metadata that makes up the actual VDO volume.

VDO currently supports any logical size up to 254 times the size of the physical volume with an absolute maximum logical size of 4PB.

Figure 1.2. VDO disk organization

VDO disk organization

In this figure, the VDO deduplicated storage target sits completely on top of the block device, meaning the physical size of the VDO volume is the same size as the underlying block device.

Additional resources

1.5. Slab size in VDO

The physical storage of the VDO volume is divided into a number of slabs. Each slab is a contiguous region of the physical space. All of the slabs for a given volume have the same size, which can be any power of 2 multiple of 128 MB up to 32 GB.

The default slab size is 2 GB to facilitate evaluating VDO on smaller test systems. A single VDO volume can have up to 8192 slabs. Therefore, in the default configuration with 2 GB slabs, the maximum allowed physical storage is 16 TB. When using 32 GB slabs, the maximum allowed physical storage is 256 TB. VDO always reserves at least one entire slab for metadata, and therefore, the reserved slab cannot be used for storing user data.

Slab size has no effect on the performance of the VDO volume.

Table 1.1. Recommended VDO slab sizes by physical volume size
Physical volume sizeRecommended slab size

10–99 GB

1 GB

100 GB – 1 TB

2 GB

2–256 TB

32 GB

Note

The minimal disk usage for a VDO volume using default settings of 2 GB slab size and 0.25 dense index, requires approx 4.7 GB. This provides slightly less than 2 GB of physical data to write at 0% deduplication or compression.

Here, the minimal disk usage is the sum of the default slab size and dense index.

You can control the slab size by providing the --config 'allocation/vdo_slab_size_mb=size-in-megabytes' option to the lvcreate command.

1.6. VDO requirements

VDO has certain requirements on its placement and your system resources.

1.6.1. VDO memory requirements

Each VDO volume has two distinct memory requirements:

The VDO module

VDO requires a fixed 38 MB of RAM and several variable amounts:

  • 1.15 MB of RAM for each 1 MB of configured block map cache size. The block map cache requires a minimum of 150MB RAM.
  • 1.6 MB of RAM for each 1 TB of logical space.
  • 268 MB of RAM for each 1 TB of physical storage managed by the volume.
The UDS index

The Universal Deduplication Service (UDS) requires a minimum of 250 MB of RAM, which is also the default amount that deduplication uses. You can configure the value when formatting a VDO volume, because the value also affects the amount of storage that the index needs.

The memory required for the UDS index is determined by the index type and the required size of the deduplication window:

Index typeDeduplication windowNote

Dense

1 TB per 1 GB of RAM

A 1 GB dense index is generally sufficient for up to 4 TB of physical storage.

Sparse

10 TB per 1 GB of RAM

A 1 GB sparse index is generally sufficient for up to 40 TB of physical storage.

Note

The minimal disk usage for a VDO volume using default settings of 2 GB slab size and 0.25 dense index, requires approx 4.7 GB. This provides slightly less than 2 GB of physical data to write at 0% deduplication or compression.

Here, the minimal disk usage is the sum of the default slab size and dense index.

The UDS Sparse Indexing feature is the recommended mode for VDO. It relies on the temporal locality of data and attempts to retain only the most relevant index entries in memory. With the sparse index, UDS can maintain a deduplication window that is ten times larger than with dense, while using the same amount of memory.

Although the sparse index provides the greatest coverage, the dense index provides more deduplication advice. For most workloads, given the same amount of memory, the difference in deduplication rates between dense and sparse indexes is negligible.

1.6.2. VDO storage space requirements

You can configure a VDO volume to use up to 256 TB of physical storage. Only a certain part of the physical storage is usable to store data. This section provides the calculations to determine the usable size of a VDO-managed volume.

VDO requires storage for two types of VDO metadata and for the UDS index:

  • The first type of VDO metadata uses approximately 1 MB for each 4 GB of physical storage plus an additional 1 MB per slab.
  • The second type of VDO metadata consumes approximately 1.25 MB for each 1 GB of logical storage, rounded up to the nearest slab.
  • The amount of storage required for the UDS index depends on the type of index and the amount of RAM allocated to the index. For each 1 GB of RAM, a dense UDS index uses 17 GB of storage, and a sparse UDS index will use 170 GB of storage.

1.6.3. Placement of VDO in the storage stack

Place storage layers either above, or under the Virtual Data Optimizer (VDO), to fit the placement requirements.

A VDO volume is a thin-provisioned block device. You can prevent running out of physical space by placing the volume above a storage layer that you can expand at a later time. Examples of such expandable storage are Logical Volume Manager (LVM) volumes, or Multiple Device Redundant Array Inexpensive or Independent Disks (MD RAID) arrays.

You can place thick provisioned layers above VDO. There are two aspects of thick provisioned layers that you must consider:

  • Writing new data to unused logical space on a thick device. When using VDO, or other thin-provisioned storage, the device can report that it is out of space during this kind of write.
  • Overwriting used logical space on a thick device with new data. When using VDO, overwriting data can also result in a report of the device being out of space.

These limitations affect all layers above the VDO layer. If you do not monitor the VDO device, you can unexpectedly run out of physical space on the thick-provisioned volumes above VDO.

See the following examples of supported and unsupported VDO volume configurations.

Figure 1.3. Supported VDO volume configurations

vdo placement supported

Figure 1.4. Unsupported VDO volume configurations

vdo placement unsupported

Additional resources

1.6.4. Examples of VDO requirements by physical size

The following tables provide approximate system requirements of VDO based on the physical size of the underlying volume. Each table lists requirements appropriate to the intended deployment, such as primary storage or backup storage.

The exact numbers depend on your configuration of the VDO volume.

Primary storage deployment

In the primary storage case, the UDS index is between 0.01% to 25% the size of the physical size.

Table 1.2. Storage and memory requirements for primary storage
Physical sizeRAM usage: UDSRAM usage: VDODisk usageIndex type

10GB–1TB

250MB

472MB

2.5GB

Dense

2–10TB

1GB

3GB

10GB

Dense

250MB

22GB

Sparse

11–50TB

2GB

14GB

170GB

Sparse

51–100TB

3GB

27GB

255GB

Sparse

101–256TB

12GB

69GB

1020GB

Sparse

Backup storage deployment

In the backup storage case, the UDS index covers the size of the backup set but is not bigger than the physical size. If you expect the backup set or the physical size to grow in the future, factor this into the index size.

Table 1.3. Storage and memory requirements for backup storage
Physical sizeRAM usage: UDSRAM usage: VDODisk usageIndex type

10GB–1TB

250MB

472MB

2.5 GB

Dense

2–10TB

2GB

3GB

170GB

Sparse

11–50TB

10GB

14GB

850GB

Sparse

51–100TB

20GB

27GB

1700GB

Sparse

101–256TB

26GB

69GB

3400GB

Sparse

1.7. Installing VDO

This procedure installs software necessary to create, mount, and manage VDO volumes.

Procedure

  • Install the VDO software:

    # yum install lvm2 kmod-kvdo vdo

1.8. Creating a VDO volume

This procedure creates a VDO volume on a block device.

Prerequisites

Procedure

In all the following steps, replace vdo-name with the identifier you want to use for your VDO volume; for example, vdo1. You must use a different name and device for each instance of VDO on the system.

  1. Find a persistent name for the block device where you want to create the VDO volume. For more information about persistent names, see Chapter 6, Overview of persistent naming attributes.

    If you use a non-persistent device name, then VDO might fail to start properly in the future if the device name changes.

  2. Create the VDO volume:

    # vdo create \
          --name=vdo-name \
          --device=block-device \
          --vdoLogicalSize=logical-size
    • Replace block-device with the persistent name of the block device where you want to create the VDO volume. For example, /dev/disk/by-id/scsi-3600508b1001c264ad2af21e903ad031f.
    • Replace logical-size with the amount of logical storage that the VDO volume should present:

      • For active VMs or container storage, use logical size that is ten times the physical size of your block device. For example, if your block device is 1TB in size, use 10T here.
      • For object storage, use logical size that is three times the physical size of your block device. For example, if your block device is 1TB in size, use 3T here.
    • If the physical block device is larger than 16TiB, add the --vdoSlabSize=32G option to increase the slab size on the volume to 32GiB.

      Using the default slab size of 2GiB on block devices larger than 16TiB results in the vdo create command failing with the following error:

      vdo: ERROR - vdoformat: formatVDO failed on '/dev/device': VDO Status: Exceeds maximum number of slabs supported

    Example 1.1. Creating VDO for container storage

    For example, to create a VDO volume for container storage on a 1TB block device, you might use:

    # vdo create \
          --name=vdo1 \
          --device=/dev/disk/by-id/scsi-3600508b1001c264ad2af21e903ad031f \
          --vdoLogicalSize=10T
    Important

    If a failure occurs when creating the VDO volume, remove the volume to clean up. See Section 2.10.2, “Removing an unsuccessfully created VDO volume” for details.

  3. Create a file system on top of the VDO volume:

    • For the XFS file system:

      # mkfs.xfs -K /dev/mapper/vdo-name
    • For the ext4 file system:

      # mkfs.ext4 -E nodiscard /dev/mapper/vdo-name
      Note

      The purpose of the -K and -E nodiscard options on a freshly created VDO volume is to not spend time sending requests, as it has no effect on an un-allocated block. A fresh VDO volume starts out 100% un-allocated.

  4. Use the following command to wait for the system to register the new device node:

    # udevadm settle

Next steps

  1. Mount the file system. See Section 1.9, “Mounting a VDO volume” for details.
  2. Enable the discard feature for the file system on your VDO device. See Section 1.10, “Enabling periodic block discard” for details.

Additional resources

  • The vdo(8) man page

1.9. Mounting a VDO volume

This procedure mounts a file system on a VDO volume, either manually or persistently.

Prerequisites

Procedure

  • To mount the file system on the VDO volume manually, use:

    # mount /dev/mapper/vdo-name mount-point
  • To configure the file system to mount automatically at boot, add a line to the /etc/fstab file:

    • For the XFS file system:

      /dev/mapper/vdo-name mount-point xfs defaults 0 0
    • For the ext4 file system:

      /dev/mapper/vdo-name mount-point ext4 defaults 0 0

    If the VDO volume is located on a block device that requires network, such as iSCSI, add the _netdev mount option.

Additional resources

  • The vdo(8) man page.
  • For iSCSI and other block devices requiring network, see the systemd.mount(5) man page for information about the _netdev mount option.

1.10. Enabling periodic block discard

You can enable a systemd timer to regularly discard unused blocks on all supported file systems.

Procedure

  • Enable and start the systemd timer:

    # systemctl enable --now fstrim.timer
    Created symlink /etc/systemd/system/timers.target.wants/fstrim.timer  /usr/lib/systemd/system/fstrim.timer.

Verification

  • Verify the status of the timer:

    # systemctl status fstrim.timer
    fstrim.timer - Discard unused blocks once a week
       Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; enabled; vendor preset: disabled)
       Active: active (waiting) since Wed 2023-05-17 13:24:41 CEST; 3min 15s ago
      Trigger: Mon 2023-05-22 01:20:46 CEST; 4 days left
         Docs: man:fstrim
    
    May 17 13:24:41 localhost.localdomain systemd[1]: Started Discard unused blocks once a week.

1.11. Monitoring VDO

This procedure describes how to obtain usage and efficiency information from a VDO volume.

Prerequisites

Procedure

  • Use the vdostats utility to get information about a VDO volume:

    # vdostats --human-readable
    
    Device                   1K-blocks    Used     Available    Use%    Space saving%
    /dev/mapper/node1osd1    926.5G       21.0G    905.5G       2%      73%
    /dev/mapper/node1osd2    926.5G       28.2G    898.3G       3%      64%

Additional resources

  • The vdostats(8) man page.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.