15.4. 运行延迟测试


运行集群延迟测试,以验证 Cloud-native Network Function (CNF) 工作负载的节点调整。

重要

始终使用 DISCOVERY_MODE=true 设置运行延迟测试。如果没有,测试套件将对正在运行的集群配置进行更改。

注意

当以非 root 用户或非特权用户执行 podman 命令时,挂载路径可能会失败,错误为 permission denied。要使 podman 命令正常工作,请将 :Z 附加到卷创建中,例如 -v $(pwd)/:/kubeconfig:Z。这允许 podman 进行正确的 SELinux 重新标记。

流程

  1. 在包含 kubeconfig 文件的目录中打开 shell 提示符。

    您可以在当前目录中为测试镜像提供 kubeconfig 文件,及其相关的 $KUBECONFIG 环境变量(通过卷挂载)。这允许运行的容器使用容器内的 kubeconfig 文件。

  2. 输入以下命令运行延迟测试:

    $ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
    -e LATENCY_TEST_RUN=true -e DISCOVERY_MODE=true registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 \
    /usr/bin/test-run.sh -ginkgo.focus="\[performance\]\ Latency\ Test"
    Copy to Clipboard Toggle word wrap
  3. 可选:附加 -ginkgo.dryRun 以空运行模式运行延迟测试。这对于检查测试运行的内容非常有用。
  4. 可选: 附加 -ginkgo.v 用来运行测试,并增加详细程度。
  5. 可选: 要针对特定的性能配置集运行延迟测试,请运行以下命令,替换适当的值:

    $ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
    -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
    -e PERF_TEST_PROFILE=<performance_profile> registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 \
    /usr/bin/test-run.sh -ginkgo.focus="[performance]\ Latency\ Test"
    Copy to Clipboard Toggle word wrap

    其中:

    <performance_profile>
    是您要对其运行延迟测试的性能配置集的名称。
    重要

    如需有效延迟测试结果,请至少运行测试 12 小时。

15.4.1. 运行 hwlatdetect

hwlatdetect 工具位于 rt-kernel 软件包中,带有常规订阅 Red Hat Enterprise Linux (RHEL) 8.x。

重要

始终使用 DISCOVERY_MODE=true 设置运行延迟测试。如果没有,测试套件将对正在运行的集群配置进行更改。

注意

当以非 root 用户或非特权用户执行 podman 命令时,挂载路径可能会失败,错误为 permission denied。要使 podman 命令正常工作,请将 :Z 附加到卷创建中,例如 -v $(pwd)/:/kubeconfig:Z。这允许 podman 进行正确的 SELinux 重新标记。

先决条件

  • 已在集群中安装了实时内核。
  • 您使用客户门户网站凭证登录到 registry.redhat.io

流程

  • 要运行 hwlatdetect 测试,请运行以下命令,并根据情况替换变量值:

    $ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
    -e LATENCY_TEST_RUN=true -e DISCOVERY_MODE=true -e ROLE_WORKER_CNF=worker-cnf \
    -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
    registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 \
    /usr/bin/test-run.sh -ginkgo.v -ginkgo.focus="hwlatdetect"
    Copy to Clipboard Toggle word wrap

    hwlatdetect 测试运行了 10 分钟 (600 秒)。当最观察到的延迟低于 MAXIMUM_LATENCY (20 FORWARD) 时,测试会成功运行。

    如果结果超过延迟阈值,测试会失败。

    重要

    对于有效结果,测试应至少运行 12 小时。

    失败输出示例

    running /usr/bin/validationsuite -ginkgo.v -ginkgo.focus=hwlatdetect
    I0210 17:08:38.607699       7 request.go:668] Waited for 1.047200253s due to client-side throttling, not priority and fairness, request: GET:https://api.ocp.demo.lab:6443/apis/apps.openshift.io/v1?timeout=32s
    Running Suite: CNF Features e2e validation
    ==========================================
    Random Seed: 1644512917
    Will run 0 of 48 specs
    
    SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
    Ran 0 of 48 Specs in 0.001 seconds
    SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 48 Skipped
    
    PASS
    Discovery mode enabled, skipping setup
    running /usr/bin/cnftests -ginkgo.v -ginkgo.focus=hwlatdetect
    I0210 17:08:41.179269      40 request.go:668] Waited for 1.046001096s due to client-side throttling, not priority and fairness, request: GET:https://api.ocp.demo.lab:6443/apis/storage.k8s.io/v1beta1?timeout=32s
    Running Suite: CNF Features e2e integration tests
    =================================================
    Random Seed: 1644512920
    Will run 1 of 151 specs
    
    SSSSSSS
    ------------------------------
    [performance] Latency Test with the hwlatdetect image
      should succeed
      /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:221
    STEP: Waiting two minutes to download the latencyTest image
    STEP: Waiting another two minutes to give enough time for the cluster to move the pod to Succeeded phase
    Feb 10 17:10:56.045: [INFO]: found mcd machine-config-daemon-dzpw7 for node ocp-worker-0.demo.lab
    Feb 10 17:10:56.259: [INFO]: found mcd machine-config-daemon-dzpw7 for node ocp-worker-0.demo.lab
    Feb 10 17:11:56.825: [ERROR]: timed out waiting for the condition
    
    • Failure [193.903 seconds]
    [performance] Latency Test
    /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:60
      with the hwlatdetect image
      /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:213
        should succeed [It]
        /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:221
    
        Log file created at: 2022/02/10 17:08:45
        Running on machine: hwlatdetect-cd8b6
        Binary: Built with gc go1.16.6 for linux/amd64
        Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
        I0210 17:08:45.716288       1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-56fabc639a679b757ebae30e5f01b2ebd38e9fde9ecae91c41be41d3e89b37f8/vmlinuz-4.18.0-305.34.2.rt7.107.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.0/rhcos/56fabc639a679b757ebae30e5f01b2ebd38e9fde9ecae91c41be41d3e89b37f8/0 root=UUID=56731f4f-f558-46a3-85d3-d1b579683385 rw rootflags=prjquota skew_tick=1 nohz=on rcu_nocbs=3-5 tuned.non_isolcpus=ffffffc7 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,3-5 systemd.cpu_affinity=0,1,2,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 + +
        I0210 17:08:45.716782       1 node.go:44] Environment information: kernel version 4.18.0-305.34.2.rt7.107.el8_4.x86_64
        I0210 17:08:45.716861       1 main.go:50] running the hwlatdetect command with arguments [/usr/bin/hwlatdetect --threshold 1 --hardlimit 1 --duration 10 --window 10000000us --width 950000us]
        F0210 17:08:56.815204       1 main.go:53] failed to run hwlatdetect command; out: hwlatdetect:  test duration 10 seconds
           detector: tracer
           parameters:
                Latency threshold: 1us 
    1
    
                Sample window:     10000000us
                Sample width:      950000us
             Non-sampling period:  9050000us
                Output File:       None
    
        Starting test
        test finished
        Max Latency: 24us 
    2
    
        Samples recorded: 1
        Samples exceeding threshold: 1
        ts: 1644512927.163556381, inner:20, outer:24
        ; err: exit status 1
        goroutine 1 [running]:
        k8s.io/klog.stacks(0xc000010001, 0xc00012e000, 0x25b, 0x2710)
            /remote-source/app/vendor/k8s.io/klog/klog.go:875 +0xb9
        k8s.io/klog.(*loggingT).output(0x5bed00, 0xc000000003, 0xc0000121c0, 0x53ea81, 0x7, 0x35, 0x0)
            /remote-source/app/vendor/k8s.io/klog/klog.go:829 +0x1b0
        k8s.io/klog.(*loggingT).printf(0x5bed00, 0x3, 0x5082da, 0x33, 0xc000113f58, 0x2, 0x2)
            /remote-source/app/vendor/k8s.io/klog/klog.go:707 +0x153
        k8s.io/klog.Fatalf(...)
            /remote-source/app/vendor/k8s.io/klog/klog.go:1276
        main.main()
            /remote-source/app/cnf-tests/pod-utils/hwlatdetect-runner/main.go:53 +0x897
    
        goroutine 6 [chan receive]:
        k8s.io/klog.(*loggingT).flushDaemon(0x5bed00)
            /remote-source/app/vendor/k8s.io/klog/klog.go:1010 +0x8b
        created by k8s.io/klog.init.0
            /remote-source/app/vendor/k8s.io/klog/klog.go:411 +0xd8
    
        goroutine 7 [chan receive]:
        k8s.io/klog/v2.(*loggingT).flushDaemon(0x5bede0)
            /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1169 +0x8b
        created by k8s.io/klog/v2.init.0
            /remote-source/app/vendor/k8s.io/klog/v2/klog.go:420 +0xdf
        Unexpected error:
            <*errors.errorString | 0xc000418ed0>: {
                s: "timed out waiting for the condition",
            }
            timed out waiting for the condition
        occurred
    
        /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:433
    ------------------------------
    SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
    JUnit report was created: /junit.xml/cnftests-junit.xml
    
    
    Summarizing 1 Failure:
    
    [Fail] [performance] Latency Test with the hwlatdetect image [It] should succeed
    /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:433
    
    Ran 1 of 151 Specs in 222.254 seconds
    FAIL! -- 0 Passed | 1 Failed | 0 Pending | 150 Skipped
    
    --- FAIL: TestTest (222.45s)
    FAIL
    Copy to Clipboard Toggle word wrap

    1
    您可以使用 MAXIMUM_LATENCYHWLATDETECT_MAXIMUM_LATENCY 环境变量来配置延迟阈值。
    2
    测试期间测量的最大延迟值。
hwlatdetect 测试结果示例

您可以捕获以下类型的结果:

  • 在每次运行后收集的粗略结果,以便对整个测试过程中所做的任何更改产生影响的历史记录。
  • 基本测试和配置设置的组合。

良好结果的示例

hwlatdetect: test duration 3600 seconds
detector: tracer
parameters:
Latency threshold: 10us
Sample window: 1000000us
Sample width: 950000us
Non-sampling period: 50000us
Output File: None

Starting test
test finished
Max Latency: Below threshold
Samples recorded: 0
Copy to Clipboard Toggle word wrap

hwlatdetect 工具仅在示例超过指定阈值时提供输出。

错误结果的示例

hwlatdetect: test duration 3600 seconds
detector: tracer
parameters:Latency threshold: 10usSample window: 1000000us
Sample width: 950000usNon-sampling period: 50000usOutput File: None

Starting tests:1610542421.275784439, inner:78, outer:81
ts: 1610542444.330561619, inner:27, outer:28
ts: 1610542445.332549975, inner:39, outer:38
ts: 1610542541.568546097, inner:47, outer:32
ts: 1610542590.681548531, inner:13, outer:17
ts: 1610543033.818801482, inner:29, outer:30
ts: 1610543080.938801990, inner:90, outer:76
ts: 1610543129.065549639, inner:28, outer:39
ts: 1610543474.859552115, inner:28, outer:35
ts: 1610543523.973856571, inner:52, outer:49
ts: 1610543572.089799738, inner:27, outer:30
ts: 1610543573.091550771, inner:34, outer:28
ts: 1610543574.093555202, inner:116, outer:63
Copy to Clipboard Toggle word wrap

hwlatdetect 的输出显示多个样本超过阈值。但是,相同的输出可能会根据以下因素显示不同的结果:

  • 测试的持续时间
  • CPU 内核数
  • 主机固件设置
警告

在继续执行下一个延迟测试前,请确保 hwlatdetect 报告的延迟满足所需的阈值。修复硬件带来的延迟可能需要您联系系统厂商支持。

并非所有延迟高峰都与硬件相关。确保调整主机固件以满足您的工作负载要求。如需更多信息,请参阅为系统调整设置固件参数

15.4.2. 运行 cyclictest

cyclictest 工具测量指定 CPU 上的实时内核调度程序延迟。

重要

始终使用 DISCOVERY_MODE=true 设置运行延迟测试。如果没有,测试套件将对正在运行的集群配置进行更改。

注意

当以非 root 用户或非特权用户执行 podman 命令时,挂载路径可能会失败,错误为 permission denied。要使 podman 命令正常工作,请将 :Z 附加到卷创建中,例如 -v $(pwd)/:/kubeconfig:Z。这允许 podman 进行正确的 SELinux 重新标记。

先决条件

  • 您使用客户门户网站凭证登录到 registry.redhat.io
  • 已在集群中安装了实时内核。
  • 已使用 Performance addon operator 应用了集群性能配置集。

流程

  • 要执行 cyclictest,请运行以下命令,并根据情况替换变量值:

    $ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
    -e LATENCY_TEST_RUN=true -e DISCOVERY_MODE=true -e ROLE_WORKER_CNF=worker-cnf \
    -e LATENCY_TEST_CPUS=10 -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
    registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 \
    /usr/bin/test-run.sh -ginkgo.v -ginkgo.focus="cyclictest"
    Copy to Clipboard Toggle word wrap

    该命令运行 cyclictest 工具 10 分钟(600 秒)。当观察到的延迟低于 MAXIMUM_LATENCY 时,测试会成功运行(在本例中,20 TOKENs)。对于电信 RAN 工作负载,对 20 个以上延迟的激增通常并不能接受。

    如果结果超过延迟阈值,测试会失败。

    重要

    对于有效结果,测试应至少运行 12 小时。

    失败输出示例

    Discovery mode enabled, skipping setup
    running /usr/bin//cnftests -ginkgo.v -ginkgo.focus=cyclictest
    I0811 15:02:36.350033      20 request.go:668] Waited for 1.049965918s due to client-side throttling, not priority and fairness, request: GET:https://api.cnfdc8.t5g.lab.eng.bos.redhat.com:6443/apis/machineconfiguration.openshift.io/v1?timeout=32s
    Running Suite: CNF Features e2e integration tests
    =================================================
    Random Seed: 1628694153
    Will run 1 of 138 specs
    
    SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
    ------------------------------
    [performance] Latency Test with the cyclictest image
      should succeed
      /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:200
    STEP: Waiting two minutes to download the latencyTest image
    STEP: Waiting another two minutes to give enough time for the cluster to move the pod to Succeeded phase
    Aug 11 15:03:06.826: [INFO]: found mcd machine-config-daemon-wf4w8 for node cnfdc8.clus2.t5g.lab.eng.bos.redhat.com
    
    • Failure [22.527 seconds]
    [performance] Latency Test
    /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:84
      with the cyclictest image
      /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:188
        should succeed [It]
        /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:200
    
        The current latency 27 is bigger than the expected one 20
        Expected
            <bool>: false
        to be true
    
        /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:219
    
    Log file created at: 2021/08/11 15:02:51
    Running on machine: cyclictest-knk7d
    Binary: Built with gc go1.16.6 for linux/amd64
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    I0811 15:02:51.092254       1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-612d89f4519a53ad0b1a132f4add78372661bfb3994f5fe115654971aa58a543/vmlinuz-4.18.0-305.10.2.rt7.83.el8_4.x86_64 ip=dhcp random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/612d89f4519a53ad0b1a132f4add78372661bfb3994f5fe115654971aa58a543/0 ignition.platform.id=openstack root=UUID=5a4ddf16-9372-44d9-ac4e-3ee329e16ab3 rw rootflags=prjquota skew_tick=1 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=000000ff,ffffffff,ffffffff,fffffff1 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,1-3 systemd.cpu_affinity=0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 default_hugepagesz=1G hugepagesz=2M hugepages=128 nmi_watchdog=0 audit=0 mce=off processor.max_cstate=1 idle=poll intel_idle.max_cstate=0
    I0811 15:02:51.092427       1 node.go:44] Environment information: kernel version 4.18.0-305.10.2.rt7.83.el8_4.x86_64
    I0811 15:02:51.092450       1 main.go:48] running the cyclictest command with arguments \
    [-D 600 -95 1 -t 10 -a 2,4,6,8,10,54,56,58,60,62 -h 30 -i 1000 --quiet]
    I0811 15:03:06.147253       1 main.go:54] succeeded to run the cyclictest command: # /dev/cpu_dma_latency set to 0us
    # Histogram
    000000 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000
    000001 000000   005561  027778  037704  011987  000000  120755  238981  081847  300186
    000002 587440   581106  564207  554323  577416  590635  474442  357940  513895  296033
    000003 011751   011441  006449  006761  008409  007904  002893  002066  003349  003089
    000004 000527   001079  000914  000712  001451  001120  000779  000283  000350  000251
    
    More histogram entries ...
    # Min Latencies: 00002 00001 00001 00001 00001 00002 00001 00001 00001 00001
    # Avg Latencies: 00002 00002 00002 00001 00002 00002 00001 00001 00001 00001
    # Max Latencies: 00018 00465 00361 00395 00208 00301 02052 00289 00327 00114
    # Histogram Overflows: 00000 00220 00159 00128 00202 00017 00069 00059 00045 00120
    # Histogram Overflow at cycle number:
    # Thread 0:
    # Thread 1: 01142 01439 05305 … # 00190 others
    # Thread 2: 20895 21351 30624# 00129 others
    # Thread 3: 01143 17921 18334# 00098 others
    # Thread 4: 30499 30622 31566 ... # 00172 others
    # Thread 5: 145221 170910 171888 ...
    # Thread 6: 01684 26291 30623 ...# 00039 others
    # Thread 7: 28983 92112 167011 … 00029 others
    # Thread 8: 45766 56169 56171 ...# 00015 others
    # Thread 9: 02974 08094 13214 ... # 00090 others
    Copy to Clipboard Toggle word wrap

cyclictest 结果示例

相同的输出可能会显示不同工作负载的结果。例如,spikes 最长为 18μs 对 4G DU 工作负载是可以接受的,但对于 5G DU 工作负载不能接受。

良好结果的示例

running cmd: cyclictest -q -D 10m -p 1 -t 16 -a 2,4,6,8,10,12,14,16,54,56,58,60,62,64,66,68 -h 30 -i 1000 -m
# Histogram
000000 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000001 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000002 579506   535967  418614  573648  532870  529897  489306  558076  582350  585188  583793  223781  532480  569130  472250  576043
More histogram entries ...
# Total: 000600000 000600000 000600000 000599999 000599999 000599999 000599998 000599998 000599998 000599997 000599997 000599996 000599996 000599995 000599995 000599995
# Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Max Latencies: 00005 00005 00004 00005 00004 00004 00005 00005 00006 00005 00004 00005 00004 00004 00005 00004
# Histogram Overflows: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:
# Thread 2:
# Thread 3:
# Thread 4:
# Thread 5:
# Thread 6:
# Thread 7:
# Thread 8:
# Thread 9:
# Thread 10:
# Thread 11:
# Thread 12:
# Thread 13:
# Thread 14:
# Thread 15:
Copy to Clipboard Toggle word wrap

错误结果的示例

running cmd: cyclictest -q -D 10m -p 1 -t 16 -a 2,4,6,8,10,12,14,16,54,56,58,60,62,64,66,68 -h 30 -i 1000 -m
# Histogram
000000 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000001 000000   000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000  000000
000002 564632   579686  354911  563036  492543  521983  515884  378266  592621  463547  482764  591976  590409  588145  589556  353518
More histogram entries ...
# Total: 000599999 000599999 000599999 000599997 000599997 000599998 000599998 000599997 000599997 000599996 000599995 000599996 000599995 000599995 000599995 000599993
# Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
# Max Latencies: 00493 00387 00271 00619 00541 00513 00009 00389 00252 00215 00539 00498 00363 00204 00068 00520
# Histogram Overflows: 00001 00001 00001 00002 00002 00001 00000 00001 00001 00001 00002 00001 00001 00001 00001 00002
# Histogram Overflow at cycle number:
# Thread 0: 155922
# Thread 1: 110064
# Thread 2: 110064
# Thread 3: 110063 155921
# Thread 4: 110063 155921
# Thread 5: 155920
# Thread 6:
# Thread 7: 110062
# Thread 8: 110062
# Thread 9: 155919
# Thread 10: 110061 155919
# Thread 11: 155918
# Thread 12: 155918
# Thread 13: 110060
# Thread 14: 110060
# Thread 15: 110059 155917
Copy to Clipboard Toggle word wrap

15.4.3. 运行 oslat

oslat 测试模拟 CPU 密集型 DPDK 应用程序,并测量所有中断和中断来测试集群处理 CPU 大量数据处理的方式。

重要

始终使用 DISCOVERY_MODE=true 设置运行延迟测试。如果没有,测试套件将对正在运行的集群配置进行更改。

注意

当以非 root 用户或非特权用户执行 podman 命令时,挂载路径可能会失败,错误为 permission denied。要使 podman 命令正常工作,请将 :Z 附加到卷创建中,例如 -v $(pwd)/:/kubeconfig:Z。这允许 podman 进行正确的 SELinux 重新标记。

先决条件

  • 您使用客户门户网站凭证登录到 registry.redhat.io
  • 已使用 Performance addon operator 应用集群性能配置集。

流程

  • 要执行 oslat 测试,请运行以下命令,根据需要替换变量值:

    $ podman run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig \
    -e LATENCY_TEST_RUN=true -e DISCOVERY_MODE=true -e ROLE_WORKER_CNF=worker-cnf \
    -e LATENCY_TEST_CPUS=7 -e LATENCY_TEST_RUNTIME=600 -e MAXIMUM_LATENCY=20 \
    registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 \
    /usr/bin/test-run.sh -ginkgo.v -ginkgo.focus="oslat"
    Copy to Clipboard Toggle word wrap

    LATENCY_TEST_CPUS 指定使用 oslat 命令测试的 CPU 列表。

    命令运行 oslat 工具 10 分钟(600 秒)。当最观察到的延迟低于 MAXIMUM_LATENCY (20 FORWARD) 时,测试会成功运行。

    如果结果超过延迟阈值,测试会失败。

    重要

    对于有效结果,测试应至少运行 12 小时。

    失败输出示例

    running /usr/bin//validationsuite -ginkgo.v -ginkgo.focus=oslat
    I0829 12:36:55.386776       8 request.go:668] Waited for 1.000303471s due to client-side throttling, not priority and fairness, request: GET:https://api.cnfdc8.t5g.lab.eng.bos.redhat.com:6443/apis/authentication.k8s.io/v1?timeout=32s
    Running Suite: CNF Features e2e validation
    ==========================================
    
    Discovery mode enabled, skipping setup
    running /usr/bin//cnftests -ginkgo.v -ginkgo.focus=oslat
    I0829 12:37:01.219077      20 request.go:668] Waited for 1.050010755s due to client-side throttling, not priority and fairness, request: GET:https://api.cnfdc8.t5g.lab.eng.bos.redhat.com:6443/apis/snapshot.storage.k8s.io/v1beta1?timeout=32s
    Running Suite: CNF Features e2e integration tests
    =================================================
    Random Seed: 1630240617
    Will run 1 of 142 specs
    
    SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
    ------------------------------
    [performance] Latency Test with the oslat image
      should succeed
      /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:134
    STEP: Waiting two minutes to download the latencyTest image
    STEP: Waiting another two minutes to give enough time for the cluster to move the pod to Succeeded phase
    Aug 29 12:37:59.324: [INFO]: found mcd machine-config-daemon-wf4w8 for node cnfdc8.clus2.t5g.lab.eng.bos.redhat.com
    
    • Failure [49.246 seconds]
    [performance] Latency Test
    /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:59
      with the oslat image
      /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:112
        should succeed [It]
        /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:134
    
        The current latency 27 is bigger than the expected one 20 
    1
    
        Expected
            <bool>: false
        to be true
     /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:168
    
    Log file created at: 2021/08/29 13:25:21
    Running on machine: oslat-57c2g
    Binary: Built with gc go1.16.6 for linux/amd64
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    I0829 13:25:21.569182       1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-612d89f4519a53ad0b1a132f4add78372661bfb3994f5fe115654971aa58a543/vmlinuz-4.18.0-305.10.2.rt7.83.el8_4.x86_64 ip=dhcp random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.0/rhcos/612d89f4519a53ad0b1a132f4add78372661bfb3994f5fe115654971aa58a543/0 ignition.platform.id=openstack root=UUID=5a4ddf16-9372-44d9-ac4e-3ee329e16ab3 rw rootflags=prjquota skew_tick=1 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=000000ff,ffffffff,ffffffff,fffffff1 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,1-3 systemd.cpu_affinity=0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 default_hugepagesz=1G hugepagesz=2M hugepages=128 nmi_watchdog=0 audit=0 mce=off processor.max_cstate=1 idle=poll intel_idle.max_cstate=0
    I0829 13:25:21.569345       1 node.go:44] Environment information: kernel version 4.18.0-305.10.2.rt7.83.el8_4.x86_64
    I0829 13:25:21.569367       1 main.go:53] Running the oslat command with arguments \
    [--duration 600 --rtprio 1 --cpu-list 4,6,52,54,56,58 --cpu-main-thread 2]
    I0829 13:35:22.632263       1 main.go:59] Succeeded to run the oslat command: oslat V 2.00
    Total runtime:    600 seconds
    Thread priority:  SCHED_FIFO:1
    CPU list:     4,6,52,54,56,58
    CPU for main thread:  2
    Workload:     no
    Workload mem:     0 (KiB)
    Preheat cores:    6
    
    Pre-heat for 1 seconds...
    Test starts...
    Test completed.
    
            Core:  4 6 52 54 56 58
        CPU Freq:  2096 2096 2096 2096 2096 2096 (Mhz)
        001 (us):  19390720316 19141129810 20265099129 20280959461 19391991159 19119877333
        002 (us):  5304 5249 5777 5947 6829 4971
        003 (us):  28 14 434 47 208 21
        004 (us):  1388 853 123568 152817 5576 0
        005 (us):  207850 223544 103827 91812 227236 231563
        006 (us):  60770 122038 277581 323120 122633 122357
        007 (us):  280023 223992 63016 25896 214194 218395
        008 (us):  40604 25152 24368 4264 24440 25115
        009 (us):  6858 3065 5815 810 3286 2116
        010 (us):  1947 936 1452 151 474 361
      ...
         Minimum:  1 1 1 1 1 1 (us)
         Average:  1.000 1.000 1.000 1.000 1.000 1.000 (us)
         Maximum:  37 38 49 28 28 19 (us)
         Max-Min:  36 37 48 27 27 18 (us)
        Duration:  599.667 599.667 599.667 599.667 599.667 599.667 (sec)
    Copy to Clipboard Toggle word wrap

    1
    在本例中,测量的延迟超出了最大允许的值。
返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat