Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 21. Performing latency tests for platform verification
You can use the Cloud-native Network Functions (CNF) tests image to run latency tests on a CNF-enabled OpenShift Container Platform cluster, where all the components required for running CNF workloads are installed. Run the latency tests to validate node tuning for your workload.
The cnf-tests container image is available at registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20.
21.1. Prerequisites for running latency tests Copier lienLien copié sur presse-papiers!
Your cluster must meet the following requirements before you can run the latency tests:
-
You have applied all the required CNF configurations. This includes the
PerformanceProfilecluster and other configuration according to the reference design specifications (RDS) or your specific requirements. -
You have logged in to
registry.redhat.iowith your Customer Portal credentials by using thepodman logincommand.
21.2. Measuring latency Copier lienLien copié sur presse-papiers!
To accurately measure system latency, use the hwlatdetect, cyclictest, and oslat tools provided in the cnf-tests image. Evaluating these metrics helps you identify and resolve performance delays in your environment.
Each tool has a specific use. Use the tools in sequence to achieve reliable test results.
- hwlatdetect
-
Measures the baseline that the bare-metal hardware can achieve. Before proceeding with the next latency test, ensure that the latency reported by
hwlatdetectmeets the required threshold because you cannot fix hardware latency spikes by operating system tuning. - cyclictest
-
Verifies the real-time kernel scheduler latency after
hwlatdetectpasses validation. Thecyclictesttool schedules a repeated timer and measures the difference between the desired and the actual trigger times. The difference can uncover basic issues with the tuning caused by interrupts or process priorities. The tool must run on a real-time kernel. - oslat
- Behaves similarly to a CPU-intensive DPDK application and measures all the interruptions and disruptions to the busy loop that simulates CPU heavy data processing.
The tests introduce the following environment variables:
| Environment variables | Description |
|---|---|
|
| Specifies the amount of time in seconds after which the test starts running. You can use the variable to allow the CPU manager reconcile loop to update the default CPU pool. The default value is 0. |
|
| Specifies the number of CPUs that the pod running the latency tests uses. If you do not set the variable, the default configuration includes all isolated CPUs. |
|
| Specifies the amount of time in seconds that the latency test must run. The default value is 300 seconds. Note
To prevent the Ginkgo 2.0 test suite from timing out before the latency tests complete, set the |
|
|
Specifies the maximum acceptable hardware latency in microseconds for the workload and operating system. If you do not set the value of |
|
|
Specifies the maximum latency in microseconds that all threads expect before waking up during the |
|
|
Specifies the maximum acceptable latency in microseconds for the |
|
| Unified variable that specifies the maximum acceptable latency in microseconds. Applicable for all available latency tools. |
Variables that are specific to a latency tool take precedence over unified variables. For example, if OSLAT_MAXIMUM_LATENCY is set to 30 microseconds and MAXIMUM_LATENCY is set to 10 microseconds, the oslat test will run with maximum acceptable latency of 30 microseconds.
21.3. Running the latency tests Copier lienLien copié sur presse-papiers!
Run the cluster latency tests to validate node tuning for your Cloud-native Network Functions (CNF) workload.
When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.
The procedure runs the three individual tests hwlatdetect, cyclictest, and oslat. For details on these individual tests, see their individual sections.
Procedure
Open a shell prompt in the directory containing the
kubeconfigfile.You provide the test image with a
kubeconfigfile in current directory and its related$KUBECONFIGenvironment variable, mounted through a volume. This allows the running container to use thekubeconfigfile from inside the container.NoteIn the following command, your local
kubeconfigis mounted to kubeconfig/kubeconfig in the cnf-tests container, which allows access to the cluster.To run the latency tests, run the following command, substituting variable values as appropriate:
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e LATENCY_TEST_RUNTIME=600\ -e MAXIMUM_LATENCY=20 \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 /usr/bin/test-run.sh \ --ginkgo.v --ginkgo.timeout="24h"The LATENCY_TEST_RUNTIME is shown in seconds, in this case 600 seconds (10 minutes). The test runs successfully when the maximum observed latency is lower than MAXIMUM_LATENCY (20 μs).
If the results exceed the latency threshold, the test fails.
-
Optional: Append
--ginkgo.dry-runflag to run the latency tests in dry-run mode. This is useful for checking what commands the tests run. -
Optional: Append
--ginkgo.vflag to run the tests with increased verbosity. Optional: Append
--ginkgo.timeout="24h"flag to ensure the Ginkgo 2.0 test suite does not timeout before the latency tests complete.ImportantDuring testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).
21.3.1. Running hwlatdetect Copier lienLien copié sur presse-papiers!
To measure hardware latency, run the hwlatdetect tool. This diagnostic utility is available in the rt-kernel package through your Red Hat Enterprise Linux (RHEL) 9.x subscription.
When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.
Prerequisites
- You have reviewed the prerequisites for running latency tests.
Procedure
To run the
hwlatdetecttests, run the following command, substituting variable values as appropriate:$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/test-run.sh --ginkgo.focus="hwlatdetect" --ginkgo.v --ginkgo.timeout="24h"The
hwlatdetecttest runs for 10 minutes (600 seconds). The test runs successfully when the maximum observed latency is lower thanMAXIMUM_LATENCY(20 μs).If the results exceed the latency threshold, the test fails.
ImportantDuring testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).
Example failure output
running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=hwlatdetect I0908 15:25:20.023712 27 request.go:601] Waited for 1.046586367s due to client-side throttling, not priority and fairness, request: GET:https://api.hlxcl6.lab.eng.tlv2.redhat.com:6443/apis/imageregistry.operator.openshift.io/v1?timeout=32s Running Suite: CNF Features e2e integration tests ================================================= Random Seed: 1662650718 Will run 1 of 3 specs [...] • Failure [283.574 seconds] [performance] Latency Test /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:62 with the hwlatdetect image /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:228 should succeed [It] /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:236 Log file created at: 2022/09/08 15:25:27 Running on machine: hwlatdetect-b6n4n Binary: Built with gc go1.17.12 for linux/amd64 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0908 15:25:27.160620 1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd1,gpt3)/ostree/rhcos-c6491e1eedf6c1f12ef7b95e14ee720bf48359750ac900b7863c625769ef5fb9/vmlinuz-4.18.0-372.19.1.el8_6.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/c6491e1eedf6c1f12ef7b95e14ee720bf48359750ac900b7863c625769ef5fb9/0 ip=dhcp root=UUID=5f80c283-f6e6-4a27-9b47-a287157483b2 rw rootflags=prjquota boot=UUID=773bf59a-bafd-48fc-9a87-f62252d739d3 skew_tick=1 nohz=on rcu_nocbs=0-3 tuned.non_isolcpus=0000ffff,ffffffff,fffffff0 systemd.cpu_affinity=4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 intel_iommu=on iommu=pt isolcpus=managed_irq,0-3 nohz_full=0-3 tsc=nowatchdog nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 + + I0908 15:25:27.160830 1 node.go:46] Environment information: kernel version 4.18.0-372.19.1.el8_6.x86_64 I0908 15:25:27.160857 1 main.go:50] running the hwlatdetect command with arguments [/usr/bin/hwlatdetect --threshold 1 --hardlimit 1 --duration 100 --window 10000000us --width 950000us] F0908 15:27:10.603523 1 main.go:53] failed to run hwlatdetect command; out: hwlatdetect: test duration 100 seconds detector: tracer parameters: Latency threshold: 1us Sample window: 10000000us Sample width: 950000us Non-sampling period: 9050000us Output File: None Starting test test finished Max Latency: 326us Samples recorded: 5 Samples exceeding threshold: 5 ts: 1662650739.017274507, inner:6, outer:6 ts: 1662650749.257272414, inner:14, outer:326 ts: 1662650779.977272835, inner:314, outer:12 ts: 1662650800.457272384, inner:3, outer:9 ts: 1662650810.697273520, inner:3, outer:2 [...] JUnit report was created: /junit.xml/cnftests-junit.xml Summarizing 1 Failure: [Fail] [performance] Latency Test with the hwlatdetect image [It] should succeed /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:476 Ran 1 of 194 Specs in 365.797 seconds FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped --- FAIL: TestTest (366.08s) FAIL-
Latency threshold: You can configure the latency threshold by using theMAXIMUM_LATENCYor theHWLATDETECT_MAXIMUM_LATENCYenvironment variables. -
Max Latency: The maximum latency value measured during the test.
-
21.3.2. Example hwlatdetect test results Copier lienLien copié sur presse-papiers!
To track the impact of changes made during testing, capture the raw data from each run along with a combined set of your optimal configuration settings. Retaining these metrics provides a comprehensive history of your test results.
You can capture the following types of results:
- Rough results that are gathered after each run to create a history of impact on any changes made throughout the test.
- The combined set of the rough tests with the best results and configuration settings.
Example of good results
hwlatdetect: test duration 3600 seconds
detector: tracer
parameters:
Latency threshold: 10us
Sample window: 1000000us
Sample width: 950000us
Non-sampling period: 50000us
Output File: None
Starting test
test finished
Max Latency: Below threshold
Samples recorded: 0
The hwlatdetect tool only provides output if the sample exceeds the specified threshold.
Example of bad results
hwlatdetect: test duration 3600 seconds
detector: tracer
parameters:Latency threshold: 10usSample window: 1000000us
Sample width: 950000usNon-sampling period: 50000usOutput File: None
Starting tests:1610542421.275784439, inner:78, outer:81
ts: 1610542444.330561619, inner:27, outer:28
ts: 1610542445.332549975, inner:39, outer:38
ts: 1610542541.568546097, inner:47, outer:32
ts: 1610542590.681548531, inner:13, outer:17
ts: 1610543033.818801482, inner:29, outer:30
ts: 1610543080.938801990, inner:90, outer:76
ts: 1610543129.065549639, inner:28, outer:39
ts: 1610543474.859552115, inner:28, outer:35
ts: 1610543523.973856571, inner:52, outer:49
ts: 1610543572.089799738, inner:27, outer:30
ts: 1610543573.091550771, inner:34, outer:28
ts: 1610543574.093555202, inner:116, outer:63
The output of hwlatdetect shows that multiple samples exceed the threshold. However, the same output can indicate different results based on the following factors:
- The duration of the test
- The number of CPU cores
- The host firmware settings
Before proceeding with the next latency test, ensure that the latency reported by hwlatdetect meets the required threshold. Fixing latencies introduced by hardware might require you to contact the system vendor support.
Not all latency spikes are hardware related. Ensure that you tune the host firmware to meet your workload requirements. For more information, see "Setting firmware parameters for system tuning".
21.3.3. Running cyclictest Copier lienLien copié sur presse-papiers!
To measure real-time kernel scheduler latency on specified CPUs, run the cyclictest tool. Evaluating these metrics helps you identify execution delays and optimize your system for high-performance operations.
When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.
Prerequisites
- You have reviewed the prerequisites for running latency tests.
Procedure
To perform the
cyclictest, run the following command, substituting variable values as appropriate:$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e LATENCY_TEST_CPUS=10 -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/test-run.sh --ginkgo.focus="cyclictest" --ginkgo.v --ginkgo.timeout="24h"The command runs the
cyclictesttool for 10 minutes (600 seconds). The test runs successfully when the maximum observed latency is lower thanMAXIMUM_LATENCY(in this example, 20 μs). Latency spikes of 20 μs and above are generally not acceptable for telco RAN workloads.If the results exceed the latency threshold, the test fails.
ImportantDuring testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).
Example failure output
running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=cyclictest I0908 13:01:59.193776 27 request.go:601] Waited for 1.046228824s due to client-side throttling, not priority and fairness, request: GET:https://api.compute-1.example.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s Running Suite: CNF Features e2e integration tests ================================================= Random Seed: 1662642118 Will run 1 of 3 specs [...] Summarizing 1 Failure: [Fail] [performance] Latency Test with the cyclictest image [It] should succeed /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:220 Ran 1 of 194 Specs in 161.151 seconds FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped --- FAIL: TestTest (161.48s) FAIL
21.3.4. Example cyclictest results Copier lienLien copié sur presse-papiers!
To accurately interpret latency test results, evaluate the metrics against your specific workload requirements. Acceptable performance thresholds differ significantly depending on whether you are running 4G DU or 5G DU workloads.
The following example shows a spike up to 18μs that is acceptable for 4G DU workloads, but not for 5G DU workloads:
Example of good results
running cmd: cyclictest -q -D 10m -p 1 -t 16 -a 2,4,6,8,10,12,14,16,54,56,58,60,62,64,66,68 -h 30 -i 1000 -m
# Histogram
000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
000001 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
000002 579506 535967 418614 573648 532870 529897 489306 558076 582350 585188 583793 223781 532480 569130 472250 576043
More histogram entries ...
# Total: 000600000 000600000 000600000 000599999 000599999 000599999 000599998 000599998 000599998 000599997 000599997 000599996 000599996 000599995 000599995 000599995
# Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Max Latencies: 00005 00005 00004 00005 00004 00004 00005 00005 00006 00005 00004 00005 00004 00004 00005 00004
# Histogram Overflows: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:
# Thread 2:
# Thread 3:
# Thread 4:
# Thread 5:
# Thread 6:
# Thread 7:
# Thread 8:
# Thread 9:
# Thread 10:
# Thread 11:
# Thread 12:
# Thread 13:
# Thread 14:
# Thread 15:
Example of bad results
running cmd: cyclictest -q -D 10m -p 1 -t 16 -a 2,4,6,8,10,12,14,16,54,56,58,60,62,64,66,68 -h 30 -i 1000 -m
# Histogram
000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
000001 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
000002 564632 579686 354911 563036 492543 521983 515884 378266 592621 463547 482764 591976 590409 588145 589556 353518
More histogram entries ...
# Total: 000599999 000599999 000599999 000599997 000599997 000599998 000599998 000599997 000599997 000599996 000599995 000599996 000599995 000599995 000599995 000599993
# Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Max Latencies: 00493 00387 00271 00619 00541 00513 00009 00389 00252 00215 00539 00498 00363 00204 00068 00520
# Histogram Overflows: 00001 00001 00001 00002 00002 00001 00000 00001 00001 00001 00002 00001 00001 00001 00001 00002
# Histogram Overflow at cycle number:
# Thread 0: 155922
# Thread 1: 110064
# Thread 2: 110064
# Thread 3: 110063 155921
# Thread 4: 110063 155921
# Thread 5: 155920
# Thread 6:
# Thread 7: 110062
# Thread 8: 110062
# Thread 9: 155919
# Thread 10: 110061 155919
# Thread 11: 155918
# Thread 12: 155918
# Thread 13: 110060
# Thread 14: 110060
# Thread 15: 110059 155917
21.3.5. Running oslat Copier lienLien copié sur presse-papiers!
To evaluate how your cluster handles CPU-heavy data processing, run the oslat test. This diagnostic tool simulates a CPU-intensive DPDK application to measure system interruptions and performance disruptions.
When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. Depending on your local operating system and SELinux configuration, you might also experience issues running these commands from your home directory. To make the podman commands work, run the commands from a folder that is not your home/<username> directory, and append :Z to the volumes creation. For example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.
Prerequisites
- You have reviewed the prerequisites for running latency tests.
Procedure
To perform the
oslattest, run the following command, substituting variable values as appropriate:$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e LATENCY_TEST_CPUS=10 -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/test-run.sh --ginkgo.focus="oslat" --ginkgo.v --ginkgo.timeout="24h"LATENCY_TEST_CPUSspecifies the number of CPUs to test with theoslatcommand.The command runs the
oslattool for 10 minutes (600 seconds). The test runs successfully when the maximum observed latency is lower thanMAXIMUM_LATENCY(20 μs).If the results exceed the latency threshold, the test fails.
ImportantDuring testing shorter time periods, as shown, can be used to run the tests. However, for final verification and valid results, the test should run for at least 12 hours (43200 seconds).
Example failure output
running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=oslat I0908 12:51:55.999393 27 request.go:601] Waited for 1.044848101s due to client-side throttling, not priority and fairness, request: GET:https://compute-1.example.com:6443/apis/machineconfiguration.openshift.io/v1?timeout=32s Running Suite: CNF Features e2e integration tests ================================================= Random Seed: 1662641514 Will run 1 of 3 specs [...] • Failure [77.833 seconds] [performance] Latency Test /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:62 with the oslat image /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:128 should succeed [It] /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:153 The current latency 304 is bigger than the expected one 1 :1 [...] Summarizing 1 Failure: [Fail] [performance] Latency Test with the oslat image [It] should succeed /remote-source/app/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/4_latency/latency.go:177 Ran 1 of 194 Specs in 161.091 seconds FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped --- FAIL: TestTest (161.42s) FAIL- 1
- In this example, the measured latency is outside the maximum allowed value.
21.4. Generating a latency test failure report Copier lienLien copié sur presse-papiers!
To analyze test failures and troubleshoot performance issues, generate a JUnit latency test output and test failure report. Reviewing this diagnostic data helps you pinpoint exactly where your system is experiencing delays.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges.
Procedure
Create a test failure report with information about the cluster state and resources for troubleshooting by passing the
--reportparameter with the path to where the report is dumped:$ podman run -v $(pwd)/:/kubeconfig:Z -v $(pwd)/reportdest:<report_folder_path> \ -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/test-run.sh --report <report_folder_path> --ginkgo.v-
<report_folder_path>: Specifies the path to the folder where the report is generated.
-
21.5. Generating a JUnit latency test report Copier lienLien copié sur presse-papiers!
To analyze system performance and track execution delays, generate a JUnit latency test report. Reviewing this diagnostic output helps you identify configuration issues and performance bottlenecks within your cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges.
Procedure
Create a JUnit-compliant XML report by passing the
--junitparameter together with the path to where the report is dumped:NoteYou must create the
junitfolder before running this command.$ podman run -v $(pwd)/:/kubeconfig:Z -v $(pwd)/junit:/junit \ -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/test-run.sh --ginkgo.junit-report junit/<file_name>.xml --ginkgo.vwhere:
file_name- The name of the XML report file.
21.6. Running latency tests on a single-node OpenShift cluster Copier lienLien copié sur presse-papiers!
To validate node tuning and identify performance delays, run latency tests on your single-node OpenShift clusters. Evaluating these metrics ensures your environment is optimized for high-performance workloads.
When executing podman commands as a non-root or non-privileged user, mounting paths can fail with permission denied errors. To make the podman command work, append :Z to the volumes creation; for example, -v $(pwd)/:/kubeconfig:Z. This allows podman to do the proper SELinux relabeling.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges. - You have applied a cluster performance profile by using the Node Tuning Operator.
Procedure
To run the latency tests on a single-node OpenShift cluster, run the following command:
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e LATENCY_TEST_RUNTIME=<time_in_seconds> registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"NoteThe default runtime for each test is 300 seconds. For valid latency test results, run the tests for at least 12 hours by updating the
LATENCY_TEST_RUNTIMEvariable.To run the buckets latency validation step, you must specify a maximum latency. For details on maximum latency variables, see the table in the "Measuring latency" section.
After running the test suite, all the dangling resources are cleaned up.
21.7. Running latency tests in a disconnected cluster Copier lienLien copié sur presse-papiers!
The CNF tests image can run tests in a disconnected cluster that is not able to reach external registries. This requires two steps:
-
Mirroring the
cnf-testsimage to the custom disconnected registry. - Instructing the tests to consume the images from the custom disconnected registry.
21.7.1. Mirroring the images to a custom registry accessible from the cluster Copier lienLien copié sur presse-papiers!
To make required images accessible from your cluster, mirror them to a custom registry. Performing this synchronization ensures that your deployment has the necessary container files, which is particularly useful in restricted or disconnected network environments.
A mirror executable is shipped in the image to provide the input required by oc to mirror the test image to a local registry.
Procedure
Run the following command from an intermediate machine that has access to the cluster and registry.redhat.io:
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/mirror -registry <disconnected_registry> | oc image mirror -f -where:
<disconnected_registry>-
Specifies the disconnected mirror registry you have configured, such as
my.local.registry:5000/.
When you have mirrored the
cnf-testsimage into the disconnected registry, you must override the original registry used to fetch the images when running the tests by a command similar to the following example:$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e IMAGE_REGISTRY="<disconnected_registry>" \ -e CNF_TESTS_IMAGE="cnf-tests-rhel9:v4.20" \ -e LATENCY_TEST_RUNTIME=<time_in_seconds> \ <disconnected_registry>/cnf-tests-rhel9:v4.20 /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"
21.7.2. Configuring the tests to consume images from a custom registry Copier lienLien copié sur presse-papiers!
You can run the latency tests by using a custom test image and image registry using CNF_TESTS_IMAGE and IMAGE_REGISTRY variables.
Procedure
To configure the latency tests to use a custom test image and image registry, run a command similar to the following example:
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e IMAGE_REGISTRY="<custom_image_registry>" \ -e CNF_TESTS_IMAGE="<custom_cnf-tests_image>" \ -e LATENCY_TEST_RUNTIME=<time_in_seconds> \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"where:
<custom_image_registry>-
Specifies the custom image registry, for example,
custom.registry:5000/. <custom_cnf-tests_image>-
Specifies the custom cnf-tests image, for example,
custom-cnf-tests-image:latest.
21.7.3. Mirroring images to the cluster OpenShift image registry Copier lienLien copié sur presse-papiers!
To make container images locally available for your deployment, mirror them to the built-in OpenShift image registry. This integrated component runs as a standard workload on your OpenShift Container Platform cluster to ensure continuous access to required files.
Procedure
Gain external access to the registry by exposing the registry with a route. You can do this task by running a command similar to the following example:
$ oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=mergeFetch the registry endpoint by running a command similar to the following example:
$ REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')Create a namespace for exposing the images by running a command similar to the following example:
$ oc create ns cnftestsMake the image stream available to all the namespaces used for tests. This is required to allow the tests namespaces to fetch the images from the
cnf-testsimage stream. Run commands similar to the following examples:$ oc policy add-role-to-user system:image-puller system:serviceaccount:cnf-features-testing:default --namespace=cnftests$ oc policy add-role-to-user system:image-puller system:serviceaccount:performance-addon-operators-testing:default --namespace=cnftestsRetrieve the docker secret name by running a command similar to the following example:
$ SECRET=$(oc -n cnftests get secret | grep builder-docker | awk {'print $1'}Retrieve the docker auth token by running a command similar to the following example:
$ TOKEN=$(oc -n cnftests get secret $SECRET -o jsonpath="{.data['\.dockercfg']}" | base64 --decode | jq '.["image-registry.openshift-image-registry.svc:5000"].auth')Create a
dockerauth.jsonfile, for example:$ echo "{\"auths\": { \"$REGISTRY\": { \"auth\": $TOKEN } }}" > dockerauth.jsonMirror the image by running a command similar to the following example:
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ /usr/bin/mirror -registry $REGISTRY/cnftests | oc image mirror --insecure=true \ -a=$(pwd)/dockerauth.json -f -Run the tests by running a command similar to the following example:
$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ -e LATENCY_TEST_RUNTIME=<time_in_seconds> \ -e IMAGE_REGISTRY=image-registry.openshift-image-registry.svc:5000/cnftests cnf-tests-local:latest /usr/bin/test-run.sh --ginkgo.v --ginkgo.timeout="24h"
21.7.4. Mirroring a different set of test images Copier lienLien copié sur presse-papiers!
You can optionally change the default upstream images that are mirrored for the latency tests.
Procedure
The
mirrorcommand tries to mirror the upstream images by default. This can be overridden by passing a file with the following format to the image:[ { "registry": "public.registry.io:5000", "image": "imageforcnftests:4.20" } ]Pass the file to the
mirrorcommand, for example saving it locally asimages.json. With the following command, the local path is mounted in/kubeconfiginside the container and that can be passed to the mirror command.$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 /usr/bin/mirror \ --registry "my.local.registry:5000/" --images "/kubeconfig/images.json" \ | oc image mirror -f -
21.8. Troubleshooting errors with the cnf-tests container Copier lienLien copié sur presse-papiers!
To troubleshoot errors when running latency tests, verify that your cluster is accessible from within the cnf-tests container. Ensuring this connectivity resolves common test execution failures.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges.
Procedure
Verify that the cluster is accessible from inside the
cnf-testscontainer by running the following command:$ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \ registry.redhat.io/openshift4/cnf-tests-rhel9:v4.20 \ oc get nodesIf this command does not work, an error related to spanning across DNS, MTU size, or firewall access might be occurring.