Chapter 4. Testing VDO performance

4.1. Preparing an environment for VDO performance testing

Before testing VDO performance, you must consider the host system configuration, VDO configuration, and the workloads that will be used during testing. These choices affect the benchmarking of space efficiency, bandwidth, and latency.

To prevent one test from affecting the results of another, you must create a new VDO volume for each iteration of each test.

4.1.1. Considerations before testing VDO performance

The following conditions and configurations affect the VDO test results:

System configuration

Number and type of CPU cores available. You can list this information using the taskset utility.
Available memory and total installed memory
Configuration of storage devices
Active disk scheduler
Linux kernel version
Packages installed

VDO configuration

Partitioning scheme
File systems used on VDO volumes
Size of the physical storage assigned to a VDO volume
Size of the logical VDO volume created
Sparse or dense UDS indexing
UDS Index in memory size
VDO thread configuration

Workloads

Types of tools used to generate test data
Number of concurrent clients
The quantity of duplicate 4 KiB blocks in the written data
Read and write patterns
The working set size

4.1.2. Special considerations for testing VDO read performance

You must consider these additional factors before testing VDO read performance:

If a 4 KiB block has never been written, VDO does not read from the storage and immediately responds with a zero block.
If a 4 KiB block has been written but contains all zeros, VDO does not read from the storage and immediately responds with a zero block.

This behavior results in very fast read performance when there is no data to read. This is why read tests must prefill the volume with actual data.

4.1.3. Preparing the system for testing VDO performance

This procedure configures system settings to achieve optimal VDO performance during testing.

Important

Testing beyond the bounds listed in any particular test might result in the loss of testing time due to abnormal results.

For example, the VDO tests describe a test that conducts random reads over a 100 GiB address range. To test a working set of 500 GiB, you must increase the amount of RAM allocated for the VDO block map cache accordingly.

Procedure

Ensure that your CPU is running at its highest performance setting.
If possible, disable CPU frequency scaling using the BIOS configuration or the Linux cpupower utility.
If possible, enable dynamic processor frequency adjustment (Turbo Boost or Turbo Core) for the CPU. This feature introduces some variability in the test results, but improves overall performance.
File systems might have unique impacts on performance. They often skew performance measurements, making it harder to isolate the impact of VDO on the results.
If reasonable, measure performance on the raw block device. If this is not possible, format the device using the file system that VDO will use in the target implementation.

4.2. Creating a VDO volume for performance testing

This procedure creates a VDO volume with a logical size of 1 TiB on a 512 GiB physical volume for testing VDO performance.

Procedure

Create a VDO volume:
```
# vdo create --name=vdo-test \
             --device=/dev/sdb \
             --vdoLogicalSize=1T \
             --writePolicy=policy \
             --verbose
```
- Replace /dev/sdb with the path to a block device.
- To test the VDO async mode on top of asynchronous storage, create an asynchronous volume using the --writePolicy=async option.
- To test the VDO sync mode on top of synchronous storage, create a synchronous volume using the --writePolicy=sync option.

4.3. Cleaning up the VDO performance testing volume

This procedure removes the VDO volume used for testing VDO performance from the system.

Prerequisites

A VDO test volume exists on the system.

Procedure

Remove the VDO test volume from the system:
```
# vdo remove --name=vdo-test
```

Verification steps

Verify that the volume has been removed:
```
# vdo list --all | grep vdo-test
```
The command should not list the VDO test partition.

4.4. Testing the effects of I/O depth on VDO performance

These tests determine the I/O depth that produces the optimal throughput and the lowest latency for your VDO configuration. I/O depth represents the number of I/O requests that the fio tool submits at a time.

Because VDO uses a 4 KiB sector size, the tests perform four-corner testing at 4 KiB I/O operations, and I/O depth of 1, 8, 16, 32, 64, 128, 256, 512, and 1024.

4.4.1. Testing the effect of I/O depth on sequential 100% reads in VDO

This test determines how sequential 100% read operations perform on a VDO volume at different I/O depth values.

Procedure

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for sequential 100% reads:

# for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
  fio --rw=read \
      --bs=4096 \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=$depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.2. Testing the effect of I/O depth on sequential 100% writes in VDO

This test determines how sequential 100% write operations perform on a VDO volume at different I/O depth values.

Procedure

Create a new VDO test volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for sequential 100% writes:

# for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
  fio --rw=write \
      --bs=4096 \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=$depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.3. Testing the effect of I/O depth on random 100% reads in VDO

This test determines how random 100% read operations perform on a VDO volume at different I/O depth values.

Procedure

Create a new VDO test volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for random 100% reads:

# for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
  fio --rw=randread \
      --bs=4096 \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=$depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.4. Testing the effect of I/O depth on random 100% writes in VDO

This test determines how random 100% write operations perform on a VDO volume at different I/O depth values.

Important

You must recreate the VDO volume between each I/O depth test run.

Procedure

Perform the following series of steps separately for the I/O depth values of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, and 2048:

Create a new VDO test volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for random 100% writes:

# fio --rw=randwrite \
      --bs=4096 \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=depth-value
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.5. Analysis of VDO performance at different I/O depths

The following example analyses VDO throughput and latency recorded at different I/O depth values.

Watch the behavior across the range and the points of inflection where increased I/O depth provides diminishing throughput gains. Sequential access and random access probably peak at different values, but the peaks might be different for all types of storage configurations.

Example 4.1. I/O depth analysis

Figure 4.1. VDO throughput analysis

Notice the "knee" in each performance curve:

Marker 1 identifies the peak sequential throughput at point X. This particular configuration does not benefit from sequential 4 KiB I/O depth larger than X.
Marker 2 identifies peak random 4 KiB throughput at point Z. This particular configuration does not benefit from random 4 KiB I/O depth larger than Z.

Beyond the I/O depth at points X and Z, there are diminishing bandwidth gains, and average request latency increases 1:1 for each additional I/O request.

The following image shows an example of the random write latency after the "knee" of the curve in the previous graph. You should test at these points for maximum throughput that incurs the least response time penalty.

Figure 4.2. VDO latency analysis

Optimal I/O depth

Point Z marks the optimal I/O depth. The test plan collects additional data with I/O depth equal to Z.

4.5. Testing the effects of I/O request size on VDO performance

Using these tests, you can identify the block size that produces the best performance of VDO at the optimal I/O depth.

The tests perform four-corner testing at a fixed I/O depth, with varied block sizes over the range of 8 KiB to 1 MiB.

Prerequisites

You have determined the optimal I/O depth value. For details, see Section 4.4, “Testing the effects of I/O depth on VDO performance”.
In the following tests, replace optimal-depth with the optimal I/O depth value.

4.5.1. Testing the effect of I/O request size on sequential writes in VDO

This test determines how sequential write operations perform on a VDO volume at different I/O request sizes.

Procedure

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for the sequential write test:

# for iosize in 4 8 16 32 64 128 256 512 1024; do
  fio --rw=write \
      --bs=${iosize}k \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=optimal-depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.2. Testing the effect of I/O request size on random writes in VDO

This test determines how random write operations perform on a VDO volume at different I/O request sizes.

Important

You must recreate the VDO volume between each I/O request size test run.

Procedure

Perform the following series steps separately for the I/O request sizes of 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, and 1024k:

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for the random write test:

# fio --rw=randwrite \
      --bs=request-size \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=optimal-depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.3. Testing the effect of I/O request size on sequential read in VDO

This test determines how sequential read operations perform on a VDO volume at different I/O request sizes.

Procedure

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for the sequential read test:

# for iosize in 4 8 16 32 64 128 256 512 1024; do
  fio --rw=read \
      --bs=${iosize}k \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=optimal-depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.4. Testing the effect of I/O request size on random read in VDO

This test determines how random read operations perform on a VDO volume at different I/O request sizes.

Procedure

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for the random read test:

# for iosize in 4 8 16 32 64 128 256 512 1024; do
  fio --rw=read \
      --bs=${iosize}k \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --numjobs=1 \
      --thread \
      --norandommap \
      --runtime=300 \
      --direct=1 \
      --iodepth=optimal-depth \
      --scramble_buffers=1 \
      --offset=0 \
      --size=100g
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.5. Analysis of VDO performance at different I/O request sizes

The following example analyses VDO throughput and latency recorded at different I/O request sizes.

Example 4.2. I/O request size analysis

Figure 4.3. Request size versus throughput analysis and key inflection points

Analyzing the example results:

Sequential writes reach a peak throughput at request size Y.
This curve demonstrates how applications that are configurable or naturally dominated by certain request sizes might perceive performance. Larger request sizes often provide more throughput because 4 KiB I/O operations might benefit from merging.
Sequential reads reach a similar peak throughput at point Z.
After these peaks, the overall latency before the I/O operation completes increases with no additional throughput. You should tune the device to not accept I/O operations larger than this size.
Random reads achieve peak throughput at point X.
Certain devices might achieve near-sequential throughput rates at large request size random accesses, but others suffer more penalty when varying from purely sequential access.
Random writes achieve peak throughput at point Y.
Random writes involve the most interaction of a deduplication device, and VDO achieves high performance especially when request sizes or I/O depths are large.

4.6. Testing the effects of mixed I/O loads on VDO performance

This test determines how your VDO configuration behaves with mixed read and write I/O loads, and analyzes the effects of mixed reads and writes at the optimal random queue depth and request sizes from 4 KB to 1 MB.

This procedure performs four-corner testing at fixed I/O depth, varied block size over the 8 KB to 256 KB range, and set read percentage at 10% increments, beginning with 0%.

Prerequisites

You have determined the optimal I/O depth value. For details, see Section 4.4, “Testing the effects of I/O depth on VDO performance”.
In the following procedure, replace optimal-depth with the optimal I/O depth value.

Procedure

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for the read and write input stimulus:

# for readmix in 0 10 20 30 40 50 60 70 80 90 100; do
    for iosize in 4 8 16 32 64 128 256 512 1024; do
      fio --rw=rw \
          --rwmixread=$readmix \
          --bs=${iosize}k \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=0 \
          --iodepth=optimal-depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
    done
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.
Graph the test results.
Example 4.3. Mixed I/O loads analysis
The following image shows an example of how VDO might respond to mixed I/O loads:
Figure 4.4. Performance is consistent across varying read and write mixes
Aggregate performance and aggregate latency are relatively consistent across the range of mixing reads and writes, trending from the lower maximum write throughput to the higher maximum read throughput.
This behavior might vary with different storage, but the important observation is that the performance is consistent under varying loads or that you can understand performance expectation for applications that demonstrate specific read and write mixes.
Note
If your system does not show a similar response consistency, it might be a sign of a sub-optimal configuration. Contact your Red Hat Sales Engineer if this occurs.

4.7. Testing the effects of application environments on VDO performance

These tests determine how your VDO configuration behaves when deployed in a mixed, real application environment. If you know more details about the expected environment, test them as well.

Prerequisites

Consider limiting the permissible queue depth on your configuration.
If possible, tune the application to issue requests with the block sizes that are the most beneficial to VDO performance.

Procedure

Create a new VDO volume.
For details, see Section 4.2, “Creating a VDO volume for performance testing”.

Prefill any areas that the test might access by performing a write fio job on the test volume:

# fio --rw=write \
      --bs=8M \
      --name=vdo \
      --filename=/dev/mapper/vdo-test \
      --ioengine=libaio \
      --thread \
      --direct=1 \
      --scramble_buffers=1

Record the reported throughput and latency for the read and write input stimulus:

# for readmix in 20 50 80; do
    for iosize in 4 8 16 32 64 128 256 512 1024; do
      fio --rw=rw \
          --rwmixread=$readmix \
          --bsrange=4k-256k \
          --name=vdo \
          --filename=/dev/mapper/vdo-name \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=0 \
          --iodepth=$iosize \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
    done
  done

Remove the VDO test volume.
For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.
Graph the test results.
Example 4.4. Application environment analysis
The following image shows an example of how VDO might respond to mixed I/O loads:
Figure 4.5. Mixed environment performance

4.8. Options used for testing VDO performance with fio

The VDO tests use the fio utility to synthetically generate data with repeatable characteristics. The following fio options are necessary to simulate real world workloads in the tests:

Table 4.1. Used fio options
Argument	Description	Value used in the tests
`--size`	The quantity of data that `fio` sends to the target per job. See also the `--numjobs` option.	100 GiB
`--bs`	The block size of each read-and-write request produced by `fio`. Red Hat recommends a 4 KiB block size to match 4 KiB default of VDO.	4k
`--numjobs`	The number of jobs that `fio` creates for the benchmark. Each job sends the amount of data specified by the `--size` option. The first job sends data to the device at the offset specified by the `--offset` option. Subsequent jobs overwrite the same region of the disk unless you provide the `--offset_increment` option, which offsets each job from where the previous job began by that value. To achieve peak performance on flash disks (SSD), Red Hat recommends at least two jobs. One job is typically enough to saturate rotational disk (HDD) throughput.	1 for HDD, 2 for SSD
`--thread`	Instructs `fio` jobs to run in threads rather than to fork, which might provide better performance by limiting context switching.	none
`--ioengine`	The I/O engine that `fio` uses for the benchmark. Red Hat testing uses the asynchronous unbuffered engine called `libaio` to test workloads where one or more processes are making simultaneous random requests. The `libaio` engine enables a single thread to make multiple requests asynchronously before it retrieves any data. This limits the number of context switches that a synchronous engine would require if it provided the requests by many threads.	`libaio`
`--direct`	The option enables requests submitted to the device to bypass the kernel page cache. You must use the `libaio` engine with the `--direct` option. Otherwise, the kernel uses the sync API for all I/O requests.	1 (`libaio`)
`--iodepth`	The number of I/O buffers in flight at any time. A high value usually increases performance, particularly for random reads or writes. High values ensure that the controller always has requests to batch. However, setting the value too high, (typically greater than 1K, might cause undesirable latency. Red Hat recommends a value between 128 and 512. The final value is a trade-off and depends on how your application tolerates latency.	128 at minimum
`--iodepth_batch_submit`	The number of I/O requests to create when the I/O depth buffer pool begins to empty. This option limits task switching from I/O operations to buffer creation during the test.	16
`--iodepth_batch_complete`	The number of I/O operations to complete before submitting a batch. This option limits task switching from I/O operations to buffer creation during the test.	16
`--gtod_reduce`	Disables time-of-day calls to calculate latency. This setting lowers throughput if enabled. Enable the option unless you require latency measurement.	1

4.1. Preparing an environment for VDO performance testing

4.1.1. Considerations before testing VDO performance

4.1.2. Special considerations for testing VDO read performance

4.1.3. Preparing the system for testing VDO performance

4.2. Creating a VDO volume for performance testing

4.3. Cleaning up the VDO performance testing volume

4.4. Testing the effects of I/O depth on VDO performance

4.4.1. Testing the effect of I/O depth on sequential 100% reads in VDO

4.4.2. Testing the effect of I/O depth on sequential 100% writes in VDO

4.4.3. Testing the effect of I/O depth on random 100% reads in VDO

4.4.4. Testing the effect of I/O depth on random 100% writes in VDO

4.4.5. Analysis of VDO performance at different I/O depths

4.5. Testing the effects of I/O request size on VDO performance

4.5.1. Testing the effect of I/O request size on sequential writes in VDO

4.5.2. Testing the effect of I/O request size on random writes in VDO

4.5.3. Testing the effect of I/O request size on sequential read in VDO

4.5.4. Testing the effect of I/O request size on random read in VDO

4.5.5. Analysis of VDO performance at different I/O request sizes

4.6. Testing the effects of mixed I/O loads on VDO performance

4.7. Testing the effects of application environments on VDO performance

4.8. Options used for testing VDO performance with fio

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links