19.7. 为 vDU 应用程序工作负载验证单节点 OpenShift 集群调整

在部署虚拟分布式单元 (vDU) 应用程序前，您需要调整并配置集群主机固件和各种其他集群配置设置。使用以下信息来验证集群配置以支持 vDU 工作负载。

其他资源

有关为 vDU 应用程序部署调整的单节点 OpenShift 集群的更多信息，请参阅在单节点 OpenShift 中部署 vDU 的参考配置。

19.7.1. vDU 集群主机的建议固件配置

使用下表为在 OpenShift Container Platform 4.10 上运行的 vDU 应用程序配置集群主机固件的基础。

注意

下表是 vDU 集群主机固件配置的一般建议。具体固件设置将取决于您的要求和特定的硬件平台。固件的自动设置不会被零接触置备管道处理。

表 19.7. 推荐的集群主机固件设置
固件设置	Configuration	描述
HyperTransport (HT)	Enabled	HyperTransport (HT) 总线是由 AMD 开发的总线技术。HT 提供主机内存中组件与其他系统外围之间的高速链接。
UEFI	Enabled	为 vDU 主机启用从 UEFI 引导。
CPU Power 和性能策略	性能	设置 CPU 电源和性能策略，以优化系统以提高能源效率。
非核心频率扩展	Disabled	禁用 Uncore Frequency 扩展，以防止单独设置 CPU 的非内核部分和频率。
Uncore Frequency	最大值	将 CPU 的非内核部分（如缓存和内存控制器）设置为操作最多可能的频率。
性能限制	Disabled	禁用性能 P-limit 以防止处理器的 Uncore 频率协调。
增强的 Intel® SpeedStep Tech	Enabled	启用增强的 Intel SpeedStep，以便系统动态调整处理器消耗和降低主机中功耗和 heat 生产的核心频率。
Intel® Turbo Boost Technology	Enabled	为基于 Intel 的 CPU 启用 Turbo Boost Technology，允许处理器内核比底层操作频率更快运行（如果它们低于 power、current 和 temperature 规格限制）。
Intel 配置的 TDP	Enabled	为 CPU 启用 Thermal Design Power (TDP)
可配置 TDP 级别	2 级	TDP 级别设置特定性能评级所需的 CPU 功耗。TDP 级别 2 以功耗为代价以实现最稳定的性能水平。
节能 Turbo	Disabled	禁用 Energy Efficient Turbo，以防止处理器使用基于能源效率的策略。
硬件 P-State	Disabled	禁用 `P-states` （性能状态）以优化操作系统和 CPU 以提高功耗。
软件包 C-State	C0/C1 状态	使用 C0 或 C1 状态将处理器设置为完全活动状态 (C0) 或停止在软件中运行的 CPU 内部时钟 (C1)。
C1E	Disabled	CPU Enhanced Halt (C1E) 是 Intel 芯片中的节能功能。禁用 C1E 可防止操作系统在不活跃时向 CPU 发送 halt 命令。
处理器 C6	Disabled	C6 节能程序是 CPU 功能，可自动禁用空闲 CPU 内核和缓存。禁用 C6 可提高系统性能。
子 NUMA 集群	Disabled	子 NUMA 集群将处理器内核、缓存和内存划分为多个 NUMA 域。禁用这个选项可以提高对延迟敏感工作负载的性能。

注意

在主机的固件中启用全局 SR-IOV 和 VT-d 设置。这些设置与裸机环境相关。

19.7.2. 推荐的集群配置来运行 vDU 应用程序

运行虚拟化分布式单元 (vDU) 应用程序的集群需要高度调整和优化的配置。以下信息描述了在 OpenShift Container Platform 4.10 集群中支持 vDU 工作负载时所需的各种元素。

19.7.2.1. 推荐的集群 MachineConfig CR

以下 MachineConfig CR 配置集群主机：

表 19.8. 推荐的 MachineConfig CR
CR 文件名	描述
`02-workload-partitioning.yaml`	配置集群的工作负载分区。安装集群时应用此 `MachineConfig` CR。
`MachineConfigSctp.yaml`	加载 SCTP 内核模块。此 `MachineConfig` CR 是可选的，如果您不需要这个内核模块，可以忽略。
`MachineConfigContainerMountNS.yaml`	配置容器挂载命名空间和 kubelet conf。
`MachineConfigAcceleratedStartup.yaml`	配置集群的加速启动。
`06-kdump-master.yaml`, `06-kdump-worker.yaml`	为集群配置 `kdump`。

19.7.2.2. 推荐的集群 Operator

运行 vDU 应用程序的集群需要以下 Operator，它是基准参考配置的一部分：

Node Tuning Operator (NTO).与 Performance Addon Operator 一起提供的 NTO 软件包功能，现在是 NTO 的一部分。
PTP Operator
Cluster Network Operator
Red Hat OpenShift Logging Operator
Local Storage Operator

19.7.2.3. 推荐的集群内核配置

始终使用集群中最新支持的实时内核版本。您还应确保在集群中应用以下配置：

确保在集群性能配置集中设置以下 additionalKernelArgs ：

spec:
  additionalKernelArgs:
  - "idle=poll"
  - "rcupdate.rcu_normal_after_boot=0"
  - "efi=runtime"

确保 Tuned CR 中的 performance-patch 配置集配置与相关 PerformanceProfile CR 中设置的隔离 CPU 的正确 CPU 隔离集，例如：

spec:
  profile:
    - name: performance-patch
      # The 'include' line must match the associated PerformanceProfile name
      # And the cmdline_crash CPU set must match the 'isolated' set in the associated PerformanceProfile
      data: |
        [main]
        summary=Configuration changes profile inherited from performance created tuned
        include=openshift-node-performance-openshift-node-performance-profile
        [bootloader]
        cmdline_crash=nohz_full=2-51,54-103 1
        [sysctl]
        kernel.timer_migration=1
        [scheduler]
        group.ice-ptp=0:f:10:*:ice-ptp.*
        [service]
        service.stalld=start,enable
        service.chronyd=stop,disable

1: 列出的 CPU 依赖于主机硬件配置，特别是系统和 CPU 拓扑中的可用 CPU 数量。

19.7.2.4. 检查实时内核版本

在 OpenShift Container Platform 集群中，始终使用最新版本的 realtime 内核。如果您不确定集群中正在使用的内核版本，您可以将当前的 realtime 内核版本与发行版本进行比较。

先决条件

已安装 OpenShift CLI(oc)。
以具有 cluster-admin 权限的用户身份登录。
已安装 podman。

流程

运行以下命令来获取集群版本：

$ OCP_VERSION=$(oc get clusterversion version -o jsonpath='{.status.desired.version}{"\n"}')

获取发行镜像 SHA 号：

$ DTK_IMAGE=$(oc adm release info --image-for=driver-toolkit quay.io/openshift-release-dev/ocp-release:$OCP_VERSION-x86_64)

运行发行镜像容器，并提取与集群当前发行版本一起打包的内核版本：
```
$ podman run --rm $DTK_IMAGE rpm -qa | grep 'kernel-rt-core-' | sed 's#kernel-rt-core-##'
```
输出示例
```
4.18.0-305.49.1.rt7.121.el8_4.x86_64
```
这是版本附带的默认 realtime 内核版本。
注意
realtime 内核由内核版本中的字符串 .rt 表示。

验证

检查为集群当前发行版本列出的内核版本是否与集群中运行的实际实时内核匹配。运行以下命令检查运行的 realtime 内核版本：

打开到集群节点的远程 shell 连接：
```
$ oc debug node/<node_name>
```

检查 realtime 内核版本：

sh-4.4# uname -r

输出示例

4.18.0-305.49.1.rt7.121.el8_4.x86_64

19.7.3. 检查是否应用推荐的集群配置

您可以检查集群是否正在运行正确的配置。以下流程描述了如何检查在 OpenShift Container Platform 4.10 集群中部署 DU 应用程序所需的各种配置。

先决条件

您已部署了集群，并根据 vDU 工作负载对其进行调整。
已安装 OpenShift CLI(oc)。
您已以具有 cluster-admin 权限的用户身份登录。

流程

检查默认 Operator Hub 源是否已禁用。运行以下命令:
```
$ oc get operatorhub cluster -o yaml
```
输出示例
```
spec:
    disableAllDefaultSources: true
```

运行以下命令，检查所有所需的 CatalogSource 资源是否标注了工作负载分区 (PreferredDuringScheduling)：

$ oc get catalogsource -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.target\.workload\.openshift\.io/management}{"\n"}{end}'

输出示例

certified-operators -- {"effect": "PreferredDuringScheduling"}
community-operators -- {"effect": "PreferredDuringScheduling"}
ran-operators 1
redhat-marketplace -- {"effect": "PreferredDuringScheduling"}
redhat-operators -- {"effect": "PreferredDuringScheduling"}

1: 未注解的 CatalogSource 资源也会返回。在本例中，ran-operators CatalogSource 资源没有被注解，它没有 PreferredDuringScheduling 注解。

注意

在正确配置的 vDU 集群中，只会列出注解的一个目录源。

检查是否为工作负载分区注解了所有适用的 OpenShift Container Platform Operator 命名空间。这包括 OpenShift Container Platform 核心安装的所有 Operator，以及参考 DU 调整配置中包含的附加 Operator 集合。运行以下命令:
```
$ oc get namespaces -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.workload\.openshift\.io/allowed}{"\n"}{end}'
```
输出示例
```
default --
openshift-apiserver -- management
openshift-apiserver-operator -- management
openshift-authentication -- management
openshift-authentication-operator -- management
```
重要
对于工作负载分区，不得为其他 Operator 进行注解。在上一命令的输出中，应列出其他 Operator（不包括 -- 分隔符右侧的值）。

检查 ClusterLogging 配置是否正确。运行以下命令：

验证是否配置了适当的输入和输出日志：

$ oc get -n openshift-logging ClusterLogForwarder instance -o yaml

输出示例

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  creationTimestamp: "2022-07-19T21:51:41Z"
  generation: 1
  name: instance
  namespace: openshift-logging
  resourceVersion: "1030342"
  uid: 8c1a842d-80c5-447a-9150-40350bdf40f0
spec:
  inputs:
  - infrastructure: {}
    name: infra-logs
  outputs:
  - name: kafka-open
    type: kafka
    url: tcp://10.46.55.190:9092/test
  pipelines:
  - inputRefs:
    - audit
    name: audit-logs
    outputRefs:
    - kafka-open
  - inputRefs:
    - infrastructure
    name: infrastructure-logs
    outputRefs:
    - kafka-open
...

检查策展调度是否适合您的应用程序：

$ oc get -n openshift-logging clusterloggings.logging.openshift.io instance -o yaml

输出示例

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  creationTimestamp: "2022-07-07T18:22:56Z"
  generation: 1
  name: instance
  namespace: openshift-logging
  resourceVersion: "235796"
  uid: ef67b9b8-0e65-4a10-88ff-ec06922ea796
spec:
  collection:
    logs:
      fluentd: {}
      type: fluentd
  curation:
    curator:
      schedule: 30 3 * * *
    type: curator
  managementState: Managed
...

运行以下命令，检查 Web 控制台是否已禁用 (managementState: Removed)：
```
$ oc get consoles.operator.openshift.io cluster -o jsonpath="{ .spec.managementState }"
```
输出示例
```
Removed
```

运行以下命令，检查集群节点中禁用了 chronyd ：

$ oc debug node/<node_name>

检查节点上的 chronyd 状态：

sh-4.4# chroot /host

sh-4.4# systemctl status chronyd

输出示例

● chronyd.service - NTP client/server
    Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: enabled)
    Active: inactive (dead)
      Docs: man:chronyd(8)
            man:chrony.conf(5)

使用连接到 linuxptp-daemon 容器和 PTP Management Client (pmc) 工具，检查 PTP 接口是否已成功同步到主时钟：

运行以下命令，使用 linuxptp-daemon pod 的名称设置 $PTP_POD_NAME 变量：
```
$ PTP_POD_NAME=$(oc get pods -n openshift-ptp -l app=linuxptp-daemon -o name)
```

运行以下命令来检查 PTP 设备的同步状态：

$ oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'

输出示例

sending: GET PORT_DATA_SET
  3cecef.fffe.7a7020-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
    portIdentity            3cecef.fffe.7a7020-1
    portState               SLAVE
    logMinDelayReqInterval  -4
    peerMeanPathDelay       0
    logAnnounceInterval     1
    announceReceiptTimeout  3
    logSyncInterval         0
    delayMechanism          1
    logMinPdelayReqInterval 0
    versionNumber           2
  3cecef.fffe.7a7020-2 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
    portIdentity            3cecef.fffe.7a7020-2
    portState               LISTENING
    logMinDelayReqInterval  0
    peerMeanPathDelay       0
    logAnnounceInterval     1
    announceReceiptTimeout  3
    logSyncInterval         0
    delayMechanism          1
    logMinPdelayReqInterval 0
    versionNumber           2

运行以下 pmc 命令来检查 PTP 时钟状态：

$ oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET TIME_STATUS_NP'

输出示例

sending: GET TIME_STATUS_NP
  3cecef.fffe.7a7020-0 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
    master_offset              10 1
    ingress_time               1657275432697400530
    cumulativeScaledRateOffset +0.000000000
    scaledLastGmPhaseChange    0
    gmTimeBaseIndicator        0
    lastGmPhaseChange          0x0000'0000000000000000.0000
    gmPresent                  true 2
    gmIdentity                 3c2c30.ffff.670e00

1: master_offset 应该介于 -100 到 100 ns 之间。
2: 这表示 PTP 时钟被同步到 master，本地时钟不是 grandmaster 时钟。

检查在 linuxptp-daemon-container 日志中有与 /var/run/ptp4l.0.config 中的值对应的 master offset ：

$ oc logs $PTP_POD_NAME -n openshift-ptp -c linuxptp-daemon-container

输出示例

phc2sys[56020.341]: [ptp4l.1.config] CLOCK_REALTIME phc offset  -1731092 s2 freq -1546242 delay    497
ptp4l[56020.390]: [ptp4l.1.config] master offset         -2 s2 freq   -5863 path delay       541
ptp4l[56020.390]: [ptp4l.0.config] master offset         -8 s2 freq  -10699 path delay       533

运行以下命令检查 SR-IOV 配置是否正确：

检查 SriovOperatorConfig 资源中的 disableDrain 值是否已设置为 true ：

$ oc get sriovoperatorconfig -n openshift-sriov-network-operator default -o jsonpath="{.spec.disableDrain}{'\n'}"

输出示例

true

运行以下命令，检查 SriovNetworkNodeState 同步状态是否为 Succeeded ：

$ oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o jsonpath="{.items[*].status.syncStatus}{'\n'}"

输出示例

Succeeded

验证为 SR-IOV 配置的每个接口下的虚拟功能（Vfs）预期数量和配置是否存在，并在 .status.interfaces 字段中是正确的。例如：

$ oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o yaml

输出示例

apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetworkNodeState
...
  status:
    interfaces:
    ...
    - Vfs:
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.0
        vendor: "8086"
        vfID: 0
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.1
        vendor: "8086"
        vfID: 1
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.2
        vendor: "8086"
        vfID: 2
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.3
        vendor: "8086"
        vfID: 3
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.4
        vendor: "8086"
        vfID: 4
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.5
        vendor: "8086"
        vfID: 5
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.6
        vendor: "8086"
        vfID: 6
      - deviceID: 154c
        driver: vfio-pci
        pciAddress: 0000:3b:0a.7
        vendor: "8086"
        vfID: 7

检查集群性能配置集是否正确。cpu 和 hugepages 部分将根据您的硬件配置而有所不同。运行以下命令:

$ oc get PerformanceProfile openshift-node-performance-profile -o yaml

输出示例

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  creationTimestamp: "2022-07-19T21:51:31Z"
  finalizers:
  - foreground-deletion
  generation: 1
  name: openshift-node-performance-profile
  resourceVersion: "33558"
  uid: 217958c0-9122-4c62-9d4d-fdc27c31118c
spec:
  additionalKernelArgs:
  - idle=poll
  - rcupdate.rcu_normal_after_boot=0
  - efi=runtime
  cpu:
    isolated: 2-51,54-103
    reserved: 0-1,52-53
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 32
      size: 1G
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  net:
    userLevelNetworking: true
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true
status:
  conditions:
  - lastHeartbeatTime: "2022-07-19T21:51:31Z"
    lastTransitionTime: "2022-07-19T21:51:31Z"
    status: "True"
    type: Available
  - lastHeartbeatTime: "2022-07-19T21:51:31Z"
    lastTransitionTime: "2022-07-19T21:51:31Z"
    status: "True"
    type: Upgradeable
  - lastHeartbeatTime: "2022-07-19T21:51:31Z"
    lastTransitionTime: "2022-07-19T21:51:31Z"
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2022-07-19T21:51:31Z"
    lastTransitionTime: "2022-07-19T21:51:31Z"
    status: "False"
    type: Degraded
  runtimeClass: performance-openshift-node-performance-profile
  tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile

注意

CPU 设置取决于服务器上可用的内核数，应当与工作负载分区设置保持一致。巨页配置取决于服务器和应用程序。

运行以下命令，检查 PerformanceProfile 是否已成功应用到集群：

$ oc get performanceprofile openshift-node-performance-profile -o jsonpath="{range .status.conditions[*]}{ @.type }{' -- '}{@.status}{'\n'}{end}"

输出示例

Available -- True
Upgradeable -- True
Progressing -- False
Degraded -- False

运行以下命令检查 Tuned 性能补丁设置：

$ oc get tuneds.tuned.openshift.io -n openshift-cluster-node-tuning-operator performance-patch -o yaml

输出示例

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  creationTimestamp: "2022-07-18T10:33:52Z"
  generation: 1
  name: performance-patch
  namespace: openshift-cluster-node-tuning-operator
  resourceVersion: "34024"
  uid: f9799811-f744-4179-bf00-32d4436c08fd
spec:
  profile:
  - data: |
      [main]
      summary=Configuration changes profile inherited from performance created tuned
      include=openshift-node-performance-openshift-node-performance-profile
      [bootloader]
      cmdline_crash=nohz_full=2-23,26-47 1
      [sysctl]
      kernel.timer_migration=1
      [scheduler]
      group.ice-ptp=0:f:10:*:ice-ptp.*
      [service]
      service.stalld=start,enable
      service.chronyd=stop,disable
    name: performance-patch
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: master
    priority: 19
    profile: performance-patch

1: cmdline=nohz_full= 中的 cpu 列表将根据您的硬件配置而有所不同。

运行以下命令，检查是否禁用了集群网络诊断：

$ oc get networks.operator.openshift.io cluster -o jsonpath='{.spec.disableNetworkDiagnostics}'

输出示例

true

检查 Kubelet housekeeping 间隔是否调整为较慢的速度。这是在 containerMountNS 机器配置中设置的。运行以下命令:

$ oc describe machineconfig container-mount-namespace-and-kubelet-conf-master | grep OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION

输出示例

Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"

运行以下命令，检查 Grafana 和 alertManagerMain 是否已禁用，Prometheus 保留周期是否已设置为 24h：

$ oc get configmap cluster-monitoring-config -n openshift-monitoring -o jsonpath="{ .data.config\.yaml }"

输出示例

grafana:
  enabled: false
alertmanagerMain:
  enabled: false
prometheusK8s:
   retention: 24h

使用以下命令验证集群中没有找到 Grafana 和 alertManagerMain 路由：
```
$ oc get route -n openshift-monitoring alertmanager-main
```
```
$ oc get route -n openshift-monitoring grafana
```
这两个查询都应返回 Error from server(NotFound) 消息。

运行以下命令，检查是否已为每个 PerformanceProfile、Tuned 性能补丁、工作负载分区和内核命令行参数分配至少 4 个保留 CPU：
```
$ oc get performanceprofile -o jsonpath="{ .items[0].spec.cpu.reserved }"
```
输出示例
```
0-1,52-53
```
注意
根据您的工作负载要求，您可能需要分配额外的保留 CPU。

19.7. 为 vDU 应用程序工作负载验证单节点 OpenShift 集群调整

19.7.1. vDU 集群主机的建议固件配置

19.7.2. 推荐的集群配置来运行 vDU 应用程序

19.7.2.1. 推荐的集群 MachineConfig CR

19.7.2.2. 推荐的集群 Operator

19.7.2.3. 推荐的集群内核配置

19.7.2.4. 检查实时内核版本

19.7.3. 检查是否应用推荐的集群配置

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links