Chapter 21. Performing latency tests for platform verification

21.1. Prerequisites for running latency tests
Copier lien

Your cluster must meet the following requirements before you can run the latency tests:

You have applied all the required CNF configurations. This includes the PerformanceProfile cluster and other configuration according to the reference design specifications (RDS) or your specific requirements.
You have logged in to registry.redhat.io with your Customer Portal credentials by using the podman login command.

21.2. Measuring latency
Copier lien

To accurately measure system latency, use the hwlatdetect, cyclictest, and oslat tools provided in the cnf-tests image. Evaluating these metrics helps you identify and resolve performance delays in your environment.

Each tool has a specific use. Use the tools in sequence to achieve reliable test results.

hwlatdetect: Measures the baseline that the bare-metal hardware can achieve. Before proceeding with the next latency test, ensure that the latency reported by hwlatdetect meets the required threshold because you cannot fix hardware latency spikes by operating system tuning.
cyclictest: Verifies the real-time kernel scheduler latency after hwlatdetect passes validation. The cyclictest tool schedules a repeated timer and measures the difference between the desired and the actual trigger times. The difference can uncover basic issues with the tuning caused by interrupts or process priorities. The tool must run on a real-time kernel.
oslat: Behaves similarly to a CPU-intensive DPDK application and measures all the interruptions and disruptions to the busy loop that simulates CPU heavy data processing.

The tests introduce the following environment variables:

Expand

Table 21.1. Latency test environment variables
Environment variables	Description
`LATENCY_TEST_DELAY`	Specifies the amount of time in seconds after which the test starts running. You can use the variable to allow the CPU manager reconcile loop to update the default CPU pool. The default value is 0.
`LATENCY_TEST_CPUS`	Specifies the number of CPUs that the pod running the latency tests uses. If you do not set the variable, the default configuration includes all isolated CPUs.
`LATENCY_TEST_RUNTIME`	Specifies the amount of time in seconds that the latency test must run. The default value is 300 seconds. Note To prevent the Ginkgo 2.0 test suite from timing out before the latency tests complete, set the `-ginkgo.timeout` flag to a value greater than `LATENCY_TEST_RUNTIME` + 2 minutes. If you also set a `LATENCY_TEST_DELAY` value then you must set `-ginkgo.timeout` to a value greater than `LATENCY_TEST_RUNTIME` + `LATENCY_TEST_DELAY` + 2 minutes. The default timeout value for the Ginkgo 2.0 test suite is 1 hour.
`HWLATDETECT_MAXIMUM_LATENCY`	Specifies the maximum acceptable hardware latency in microseconds for the workload and operating system. If you do not set the value of `HWLATDETECT_MAXIMUM_LATENCY` or `MAXIMUM_LATENCY`, the tool compares the default expected threshold (20μs) and the actual maximum latency in the tool itself. Then, the test fails or succeeds accordingly.
`CYCLICTEST_MAXIMUM_LATENCY`	Specifies the maximum latency in microseconds that all threads expect before waking up during the `cyclictest` run. If you do not set the value of `CYCLICTEST_MAXIMUM_LATENCY` or `MAXIMUM_LATENCY`, the tool skips the comparison of the expected and the actual maximum latency.
`OSLAT_MAXIMUM_LATENCY`	Specifies the maximum acceptable latency in microseconds for the `oslat` test results. If you do not set the value of `OSLAT_MAXIMUM_LATENCY` or `MAXIMUM_LATENCY`, the tool skips the comparison of the expected and the actual maximum latency.
`MAXIMUM_LATENCY`	Unified variable that specifies the maximum acceptable latency in microseconds. Applicable for all available latency tools.

Note

Variables that are specific to a latency tool take precedence over unified variables. For example, if OSLAT_MAXIMUM_LATENCY is set to 30 microseconds and MAXIMUM_LATENCY is set to 10 microseconds, the oslat test will run with maximum acceptable latency of 30 microseconds.

21.3. Running the latency tests
Copier lien

Run the cluster latency tests to validate node tuning for your Cloud-native Network Functions (CNF) workload.

Note

When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.

The procedure runs the three individual tests hwlatdetect, cyclictest, and oslat. For details on these individual tests, see their individual sections.

Procedure

Open a shell prompt in the directory containing the kubeconfig file.
You provide the test image with a kubeconfig file in current directory and its related $KUBECONFIG environment variable, mounted through a volume. This allows the running container to use the kubeconfig file from inside the container.
Note
In the following command, your local kubeconfig is mounted to kubeconfig/kubeconfig in the cnf-tests container, which allows access to the cluster.
To run the latency tests, run the following command, substituting variable values as appropriate:
```
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e LATENCY_TEST_RUNTIME=600\
-e MAXIMUM_LATENCY=20 \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 /usr/bin/test-run.sh \
--ginkgo.v --ginkgo.timeout="24h"
```
The LATENCY_TEST_RUNTIME is shown in seconds, in this case 600 seconds (10 minutes). The test runs successfully when the maximum observed latency is lower than MAXIMUM_LATENCY (20 μs).
If the results exceed the latency threshold, the test fails.
Optional: Append --ginkgo.dry-run flag to run the latency tests in dry-run mode. This is useful for checking what commands the tests run.
Optional: Append --ginkgo.v flag to run the tests with increased verbosity.
Optional: Append --ginkgo.timeout="24h" flag to ensure the Ginkgo 2.0 test suite does not timeout before the latency tests complete.
Important
During testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).

21.3.1. Running hwlatdetect
Copier lien

To measure hardware latency, run the hwlatdetect tool. This diagnostic utility is available in the rt-kernel package through your Red Hat Enterprise Linux (RHEL) 9.x subscription.

Note

When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.

Prerequisites

You have reviewed the prerequisites for running latency tests.

Procedure

To run the hwlatdetect tests, run the following command, substituting variable values as appropriate:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/test-run.sh --ginkgo.focus="hwlatdetect" --ginkgo.v --ginkgo.timeout="24h"

The hwlatdetect test runs for 10 minutes (600 seconds). The test runs successfully when the maximum observed latency is lower than MAXIMUM_LATENCY (20 μs).

If the results exceed the latency threshold, the test fails.

Important

During testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).

Example failure output

running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=hwlatdetect
I0908 15:25:20.023712      27 request.go:601] Waited for 1.046586367s due to client-side throttling, not priority and fairness, request: GET:https://api.hlxcl6.lab.eng.tlv2.redhat.com:6443/apis/imageregistry.operator.openshift.io/v1?timeout=32s
Running Suite: CNF Features e2e integration tests
=================================================
Random Seed: 1662650718
Will run 1 of 3 specs

[...]

• Failure [283.574 seconds]
[performance] Latency Test
/remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:62
  with the hwlatdetect image
  /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:228
    should succeed [It]
    /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:236

    Log file created at: 2022/09/08 15:25:27
    Running on machine: hwlatdetect-b6n4n
    Binary: Built with gc go1.17.12 for linux/amd64
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg    I0908 15:25:27.160620       1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd1,gpt3)/ostree/rhcos-c6491e1eedf6c1f12ef7b95e14ee720bf48359750ac900b7863c625769ef5fb9/vmlinuz-4.18.0-372.19.1.el8_6.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/c6491e1eedf6c1f12ef7b95e14ee720bf48359750ac900b7863c625769ef5fb9/0 ip=dhcp root=UUID=5f80c283-f6e6-4a27-9b47-a287157483b2 rw rootflags=prjquota boot=UUID=773bf59a-bafd-48fc-9a87-f62252d739d3 skew_tick=1 nohz=on rcu_nocbs=0-3 tuned.non_isolcpus=0000ffff,ffffffff,fffffff0 systemd.cpu_affinity=4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 intel_iommu=on iommu=pt isolcpus=managed_irq,0-3 nohz_full=0-3 tsc=nowatchdog nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 + +
    I0908 15:25:27.160830       1 node.go:46] Environment information: kernel version 4.18.0-372.19.1.el8_6.x86_64
    I0908 15:25:27.160857       1 main.go:50] running the hwlatdetect command with arguments [/usr/bin/hwlatdetect --threshold 1 --hardlimit 1 --duration 100 --window 10000000us --width 950000us]
    F0908 15:27:10.603523       1 main.go:53] failed to run hwlatdetect command; out: hwlatdetect:  test duration 100 seconds
       detector: tracer
       parameters:
            Latency threshold: 1us
            Sample window:     10000000us
            Sample width:      950000us
         Non-sampling period:  9050000us
            Output File:       None

    Starting test
    test finished
    Max Latency: 326us
    Samples recorded: 5
    Samples exceeding threshold: 5
    ts: 1662650739.017274507, inner:6, outer:6
    ts: 1662650749.257272414, inner:14, outer:326
    ts: 1662650779.977272835, inner:314, outer:12
    ts: 1662650800.457272384, inner:3, outer:9
    ts: 1662650810.697273520, inner:3, outer:2

[...]

JUnit report was created: /junit.xml/cnftests-junit.xml


Summarizing 1 Failure:

[Fail] [performance] Latency Test with the hwlatdetect image [It] should succeed
/remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:476

Ran 1 of 194 Specs in 365.797 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestTest (366.08s)
FAIL

Latency threshold: You can configure the latency threshold by using the MAXIMUM_LATENCY or the HWLATDETECT_MAXIMUM_LATENCY environment variables.
Max Latency: The maximum latency value measured during the test.

21.3.2. Example hwlatdetect test results
Copier lien

To track the impact of changes made during testing, capture the raw data from each run along with a combined set of your optimal configuration settings. Retaining these metrics provides a comprehensive history of your test results.

You can capture the following types of results:

Rough results that are gathered after each run to create a history of impact on any changes made throughout the test.
The combined set of the rough tests with the best results and configuration settings.

Example of good results

hwlatdetect: test duration 3600 seconds
detector: tracer
parameters:
Latency threshold: 10us
Sample window: 1000000us
Sample width: 950000us
Non-sampling period: 50000us
Output File: None

Starting test
test finished
Max Latency: Below threshold
Samples recorded: 0

The hwlatdetect tool only provides output if the sample exceeds the specified threshold.

Example of bad results

hwlatdetect: test duration 3600 seconds
detector: tracer
parameters:Latency threshold: 10usSample window: 1000000us
Sample width: 950000usNon-sampling period: 50000usOutput File: None

Starting tests:1610542421.275784439, inner:78, outer:81
ts: 1610542444.330561619, inner:27, outer:28
ts: 1610542445.332549975, inner:39, outer:38
ts: 1610542541.568546097, inner:47, outer:32
ts: 1610542590.681548531, inner:13, outer:17
ts: 1610543033.818801482, inner:29, outer:30
ts: 1610543080.938801990, inner:90, outer:76
ts: 1610543129.065549639, inner:28, outer:39
ts: 1610543474.859552115, inner:28, outer:35
ts: 1610543523.973856571, inner:52, outer:49
ts: 1610543572.089799738, inner:27, outer:30
ts: 1610543573.091550771, inner:34, outer:28
ts: 1610543574.093555202, inner:116, outer:63

The output of hwlatdetect shows that multiple samples exceed the threshold. However, the same output can indicate different results based on the following factors:

The duration of the test
The number of CPU cores
The host firmware settings

Warning

Before proceeding with the next latency test, ensure that the latency reported by hwlatdetect meets the required threshold. Fixing latencies introduced by hardware might require you to contact the system vendor support.

Not all latency spikes are hardware related. Ensure that you tune the host firmware to meet your workload requirements. For more information, see "Setting firmware parameters for system tuning".

21.3.3. Running cyclictest
Copier lien

To measure real-time kernel scheduler latency on specified CPUs, run the cyclictest tool. Evaluating these metrics helps you identify execution delays and optimize your system for high-performance operations.

Note

When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.

Prerequisites

You have reviewed the prerequisites for running latency tests.

Procedure

To perform the cyclictest, run the following command, substituting variable values as appropriate:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e LATENCY_TEST_CPUS=10 -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/test-run.sh --ginkgo.focus="cyclictest" --ginkgo.v --ginkgo.timeout="24h"

The command runs the cyclictest tool for 10 minutes (600 seconds). The test runs successfully when the maximum observed latency is lower than MAXIMUM_LATENCY (in this example, 20 μs). Latency spikes of 20 μs and above are generally not acceptable for telco RAN workloads.

If the results exceed the latency threshold, the test fails.

Important

During testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).

Example failure output

running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=cyclictest
I0908 13:01:59.193776      27 request.go:601] Waited for 1.046228824s due to client-side throttling, not priority and fairness, request: GET:https://api.compute-1.example.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
Running Suite: CNF Features e2e integration tests
=================================================
Random Seed: 1662642118
Will run 1 of 3 specs

[...]

Summarizing 1 Failure:

[Fail] [performance] Latency Test with the cyclictest image [It] should succeed
/remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:220

Ran 1 of 194 Specs in 161.151 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestTest (161.48s)
FAIL

21.3.4. Example cyclictest results
Copier lien

To accurately interpret latency test results, evaluate the metrics against your specific workload requirements. Acceptable performance thresholds differ significantly depending on whether you are running 4G DU or 5G DU workloads.

The following example shows a spike up to 18μs that is acceptable for 4G DU workloads, but not for 5G DU workloads:

Example of good results

running cmd: cyclictest -q -D 10m -p 1 -t 16 -a 2,4,6,8,10,12,14,16,54,56,58,60,62,64,66,68 -h 30 -i 1000 -m
# Histogram
000000 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000001 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000002 579506   535967  418614  573648  532870  529897  489306  558076  582350  585188  583793  223781  532480  569130  472250  576043
More histogram entries ...
# Total: 000600000 000600000 000600000 000599999 000599999 000599999 000599998 000599998 000599998 000599997 000599997 000599996 000599996 000599995 000599995 000599995
# Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Max Latencies: 00005 00005 00004 00005 00004 00004 00005 00005 00006 00005 00004 00005 00004 00004 00005 00004
# Histogram Overflows: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:
# Thread 2:
# Thread 3:
# Thread 4:
# Thread 5:
# Thread 6:
# Thread 7:
# Thread 8:
# Thread 9:
# Thread 10:
# Thread 11:
# Thread 12:
# Thread 13:
# Thread 14:
# Thread 15:

Example of bad results

running cmd: cyclictest -q -D 10m -p 1 -t 16 -a 2,4,6,8,10,12,14,16,54,56,58,60,62,64,66,68 -h 30 -i 1000 -m
# Histogram
000000 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000001 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000002 564632   579686  354911  563036  492543  521983  515884  378266  592621  463547  482764  591976  590409  588145  589556  353518
More histogram entries ...
# Total: 000599999 000599999 000599999 000599997 000599997 000599998 000599998 000599997 000599997 000599996 000599995 000599996 000599995 000599995 000599995 000599993
# Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Max Latencies: 00493 00387 00271 00619 00541 00513 00009 00389 00252 00215 00539 00498 00363 00204 00068 00520
# Histogram Overflows: 00001 00001 00001 00002 00002 00001 00000 00001 00001 00001 00002 00001 00001 00001 00001 00002
# Histogram Overflow at cycle number:
# Thread 0: 155922
# Thread 1: 110064
# Thread 2: 110064
# Thread 3: 110063 155921
# Thread 4: 110063 155921
# Thread 5: 155920
# Thread 6:
# Thread 7: 110062
# Thread 8: 110062
# Thread 9: 155919
# Thread 10: 110061 155919
# Thread 11: 155918
# Thread 12: 155918
# Thread 13: 110060
# Thread 14: 110060
# Thread 15: 110059 155917

21.3.5. Running oslat
Copier lien

To evaluate how your cluster handles CPU-heavy data processing, run the oslat test. This diagnostic tool simulates a CPU-intensive DPDK application to measure system interruptions and performance disruptions.

Note

When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.

Prerequisites

You have reviewed the prerequisites for running latency tests.

Procedure

To perform the oslat test, run the following command, substituting variable values as appropriate:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e LATENCY_TEST_CPUS=10 -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/test-run.sh --ginkgo.focus="oslat" --ginkgo.v --ginkgo.timeout="24h"

LATENCY_TEST_CPUS specifies the number of CPUs to test with the oslat command.

The command runs the oslat tool for 10 minutes (600 seconds). The test runs successfully when the maximum observed latency is lower than MAXIMUM_LATENCY (20 μs).

If the results exceed the latency threshold, the test fails.

Important

During testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).

Example failure output

running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=oslat
I0908 12:51:55.999393      27 request.go:601] Waited for 1.044848101s due to client-side throttling, not priority and fairness, request: GET:https://compute-1.example.com:6443/apis/machineconfiguration.openshift.io/v1?timeout=32s
Running Suite: CNF Features e2e integration tests
=================================================
Random Seed: 1662641514
Will run 1 of 3 specs

[...]

• Failure [77.833 seconds]
[performance] Latency Test
/remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:62
  with the oslat image
  /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:128
    should succeed [It]
    /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:153

    The current latency 304 is bigger than the expected one 1 :

1



[...]

Summarizing 1 Failure:

[Fail] [performance] Latency Test with the oslat image [It] should succeed
/remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:177

Ran 1 of 194 Specs in 161.091 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestTest (161.42s)
FAIL

1: In this example, the measured latency is outside the maximum allowed value.

21.4. Generating a latency test failure report
Copier lien

To analyze test failures and troubleshoot performance issues, generate a JUnit latency test output and test failure report. Reviewing this diagnostic data helps you pinpoint exactly where your system is experiencing delays.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.

Procedure

Create a test failure report with information about the cluster state and resources for troubleshooting by passing the --report parameter with the path to where the report is dumped:
```
$ podman run -v $(pwd)/:/kubeconfig:Z -v $(pwd)/reportdest:<report_folder_path> \
-e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/test-run.sh --report <report_folder_path> --ginkgo.v
```
- <report_folder_path>: Specifies the path to the folder where the report is generated.

21.5. Generating a JUnit latency test report
Copier lien

To analyze system performance and track execution delays, generate a JUnit latency test report. Reviewing this diagnostic output helps you identify configuration issues and performance bottlenecks within your cluster.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.

Procedure

Create a JUnit-compliant XML report by passing the --junit parameter together with the path to where the report is dumped:

Note

You must create the junit folder before running this command.

$ podman run -v $(pwd)/:/kubeconfig:Z -v $(pwd)/junit:/junit \
-e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/test-run.sh --ginkgo.junit-report junit/<file_name>.xml --ginkgo.v

where:

file_name: The name of the XML report file.

21.6. Running latency tests on a single-node OpenShift cluster
Copier lien

To validate node tuning and identify performance delays, run latency tests on your single-node OpenShift clusters. Evaluating these metrics ensures your environment is optimized for high-performance workloads.

Note

When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. To make the podman command work, append :Z to the volumes creation; for example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.
You have applied a cluster performance profile by using the Node Tuning Operator.

Procedure

To run the latency tests on a single-node OpenShift cluster, run the following command:
```
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e LATENCY_TEST_RUNTIME=<time_in_seconds> registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"
```
Note
The default runtime for each test is 300 seconds. For valid latency test results, run the tests for at least 12 hours by updating the LATENCY_TEST_RUNTIME variable.
To run the buckets latency validation step, you must specify a maximum latency. For details on maximum latency variables, see the table in the "Measuring latency" section.
After running the test suite, all the dangling resources are cleaned up.

21.7. Running latency tests in a disconnected cluster
Copier lien

The CNF tests image can run tests in a disconnected cluster that is not able to reach external registries. This requires two steps:

Mirroring the cnf-tests image to the custom disconnected registry.
Instructing the tests to consume the images from the custom disconnected registry.

21.7.1. Mirroring the images to a custom registry accessible from the cluster
Copier lien

To make required images accessible from your cluster, mirror them to a custom registry. Performing this synchronization ensures that your deployment has the necessary container files, which is particularly useful in restricted or disconnected network environments.

A mirror executable is shipped in the image to provide the input required by oc to mirror the test image to a local registry.

Procedure

Run the following command from an intermediate machine that has access to the cluster and registry.redhat.io:
```
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/mirror -registry <disconnected_registry> | oc image mirror -f -
```
where:
<disconnected_registry>
Specifies the disconnected mirror registry you have configured, such as my.local.registry:5000/.

When you have mirrored the cnf-tests image into the disconnected registry, you must override the original registry used to fetch the images when running the tests by a command similar to the following example:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e IMAGE_REGISTRY="<disconnected_registry>" \
-e CNF_TESTS_IMAGE="cnf-tests-rhel9:v4.20" \
-e LATENCY_TEST_RUNTIME=<time_in_seconds> \
<disconnected_registry>/cnf-tests-rhel9:v4.20 /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"

21.7.2. Configuring the tests to consume images from a custom registry
Copier lien

You can run the latency tests by using a custom test image and image registry using CNF_TESTS_IMAGE and IMAGE_REGISTRY variables.

Procedure

To configure the latency tests to use a custom test image and image registry, run a command similar to the following example:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e IMAGE_REGISTRY="<custom_image_registry>" \
-e CNF_TESTS_IMAGE="<custom_cnf-tests_image>" \
-e LATENCY_TEST_RUNTIME=<time_in_seconds> \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"

where:

<custom_image_registry>: Specifies the custom image registry, for example, custom.registry:5000/.
<custom_cnf-tests_image>: Specifies the custom cnf-tests image, for example, custom-cnf-tests-image:latest.

21.7.3. Mirroring images to the cluster OpenShift image registry
Copier lien

To make container images locally available for your deployment, mirror them to the built-in OpenShift image registry. This integrated component runs as a standard workload on your OpenShift Container Platform cluster to ensure continuous access to required files.

Procedure

Gain external access to the registry by exposing the registry with a route. You can do this task by running a command similar to the following example:
```
$ oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
```

Fetch the registry endpoint by running a command similar to the following example:

$ REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')

Create a namespace for exposing the images by running a command similar to the following example:
```
$ oc create ns cnftests
```

Make the image stream available to all the namespaces used for tests. This is required to allow the tests namespaces to fetch the images from the cnf-tests image stream. Run commands similar to the following examples:

$ oc policy add-role-to-user system:image-puller system:serviceaccount:cnf-features-testing:default --namespace=cnftests

$ oc policy add-role-to-user system:image-puller system:serviceaccount:performance-addon-operators-testing:default --namespace=cnftests

Retrieve the docker secret name by running a command similar to the following example:

$ SECRET=$(oc -n cnftests get secret | grep builder-docker | awk {'print $1'}

Retrieve the docker auth token by running a command similar to the following example:

$ TOKEN=$(oc -n cnftests get secret $SECRET -o jsonpath="{.data['\.dockercfg']}" | base64 --decode | jq '.["image-registry.openshift-image-registry.svc:5000"].auth')

Create a dockerauth.json file, for example:

$ echo "{\"auths\": { \"$REGISTRY\": { \"auth\": $TOKEN } }}" > dockerauth.json

Mirror the image by running a command similar to the following example:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
/usr/bin/mirror -registry $REGISTRY/cnftests |  oc image mirror --insecure=true \
-a=$(pwd)/dockerauth.json -f -

Run the tests by running a command similar to the following example:

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
-e LATENCY_TEST_RUNTIME=<time_in_seconds> \
-e IMAGE_REGISTRY=image-registry.openshift-image-registry.svc:5000/cnftests cnf-tests-local:latest /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"

21.7.4. Mirroring a different set of test images
Copier lien

You can optionally change the default upstream images that are mirrored for the latency tests.

Procedure

The mirror command tries to mirror the upstream images by default. This can be overridden by passing a file with the following format to the image:
```
[
    {
        "registry": "public.registry.io:5000",
        "image": "imageforcnftests:4.20"
    }
]
```

Pass the file to the mirror command, for example saving it locally as images.json. With the following command, the local path is mounted in /kubeconfig inside the container and that can be passed to the mirror command.

$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 /usr/bin/mirror \
--registry "my.local.registry:5000/" --images "/kubeconfig/images.json" \
|  oc image mirror -f -

21.8. Troubleshooting errors with the cnf-tests container
Copier lien

To troubleshoot errors when running latency tests, verify that your cluster is accessible from within the cnf-tests container. Ensuring this connectivity resolves common test execution failures.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.

Procedure

Verify that the cluster is accessible from inside the cnf-tests container by running the following command:
```
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \
oc get nodes
```
If this command does not work, an error related to spanning across DNS, MTU size, or firewall access might be occurring.

Ce contenu n'est pas disponible dans la langue sélectionnée.

21.1. Prerequisites for running latency tests
Copier lien

21.2. Measuring latency
Copier lien

21.3. Running the latency tests
Copier lien

21.3.1. Running hwlatdetect
Copier lien

21.3.2. Example hwlatdetect test results
Copier lien

21.3.3. Running cyclictest
Copier lien

21.3.4. Example cyclictest results
Copier lien

21.3.5. Running oslat
Copier lien

21.4. Generating a latency test failure report
Copier lien

21.5. Generating a JUnit latency test report
Copier lien

21.6. Running latency tests on a single-node OpenShift cluster
Copier lien

21.7. Running latency tests in a disconnected cluster
Copier lien

21.7.1. Mirroring the images to a custom registry accessible from the cluster
Copier lien

21.7.2. Configuring the tests to consume images from a custom registry
Copier lien

21.7.3. Mirroring images to the cluster OpenShift image registry
Copier lien

21.7.4. Mirroring a different set of test images
Copier lien

21.8. Troubleshooting errors with the cnf-tests container
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 21. Performing latency tests for platform verification

21.1. Prerequisites for running latency testsCopier lienLien copié sur presse-papiers!

21.2. Measuring latencyCopier lienLien copié sur presse-papiers!

21.3. Running the latency testsCopier lienLien copié sur presse-papiers!

21.3.1. Running hwlatdetectCopier lienLien copié sur presse-papiers!

21.3.2. Example hwlatdetect test resultsCopier lienLien copié sur presse-papiers!

21.3.3. Running cyclictestCopier lienLien copié sur presse-papiers!

21.3.4. Example cyclictest resultsCopier lienLien copié sur presse-papiers!

21.3.5. Running oslatCopier lienLien copié sur presse-papiers!

21.4. Generating a latency test failure reportCopier lienLien copié sur presse-papiers!

21.5. Generating a JUnit latency test reportCopier lienLien copié sur presse-papiers!

21.6. Running latency tests on a single-node OpenShift clusterCopier lienLien copié sur presse-papiers!

21.7. Running latency tests in a disconnected clusterCopier lienLien copié sur presse-papiers!

21.7.1. Mirroring the images to a custom registry accessible from the clusterCopier lienLien copié sur presse-papiers!

21.7.2. Configuring the tests to consume images from a custom registryCopier lienLien copié sur presse-papiers!

21.7.3. Mirroring images to the cluster OpenShift image registryCopier lienLien copié sur presse-papiers!

21.7.4. Mirroring a different set of test imagesCopier lienLien copié sur presse-papiers!

21.8. Troubleshooting errors with the cnf-tests containerCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

21.1. Prerequisites for running latency tests
Copier lien

21.2. Measuring latency
Copier lien

21.3. Running the latency tests
Copier lien

21.3.1. Running hwlatdetect
Copier lien

21.3.2. Example hwlatdetect test results
Copier lien

21.3.3. Running cyclictest
Copier lien

21.3.4. Example cyclictest results
Copier lien

21.3.5. Running oslat
Copier lien

21.4. Generating a latency test failure report
Copier lien

21.5. Generating a JUnit latency test report
Copier lien

21.6. Running latency tests on a single-node OpenShift cluster
Copier lien

21.7. Running latency tests in a disconnected cluster
Copier lien

21.7.1. Mirroring the images to a custom registry accessible from the cluster
Copier lien

21.7.2. Configuring the tests to consume images from a custom registry
Copier lien

21.7.3. Mirroring images to the cluster OpenShift image registry
Copier lien

21.7.4. Mirroring a different set of test images
Copier lien

21.8. Troubleshooting errors with the cnf-tests container
Copier lien