19.6. 推荐的 vDU 应用程序工作负载的单节点 OpenShift 集群配置


使用以下引用信息,了解在集群中部署虚拟分布式单元 (vDU) 应用程序所需的单节点 OpenShift 配置。配置包括用于高性能工作负载的集群优化、启用工作负载分区以及最大程度减少安装后所需的重启数量。

其他资源

19.6.1. 在 OpenShift Container Platform 上运行低延迟应用程序

OpenShift Container Platform 通过使用几个技术和专用硬件设备,为在商业现成 (COTS) 硬件上运行的应用程序启用低延迟处理:

RHCOS 的实时内核
确保以高度的进程确定性处理工作负载。
CPU 隔离
避免 CPU 调度延迟并确保 CPU 容量一致可用。
NUMA 感知拓扑管理
将内存和巨页与 CPU 和 PCI 设备对齐,以将容器内存和巨页固定到非统一内存访问(NUMA)节点。所有服务质量 (QoS) 类的 Pod 资源保留在同一个 NUMA 节点上。这可降低延迟并提高节点的性能。
巨页内存管理
使用巨页大小可减少访问页表所需的系统资源量,从而提高系统性能。
使用 PTP 进行精确计时同步
允许以子微秒的准确性在网络中的节点之间进行同步。

19.6.2. vDU 应用程序工作负载的推荐集群主机要求

运行 vDU 应用程序工作负载需要一个具有足够资源的裸机主机来运行 OpenShift Container Platform 服务和生产工作负载。

表 19.8. 最低资源要求
profilevCPUmemoryStorage

最小值

4 到 8 个 vCPU

32GB RAM

120GB

注意

一个 vCPU 等于一个物理内核。但是,如果您启用并发多线程(SMT)或超线程,请使用以下公式来计算代表一个物理内核的 vCPU 数量:

  • (每个内核的线程数 x 内核数)x 插槽数 = vCPU
重要

使用虚拟介质引导时,服务器必须具有基板管理控制器(BMC)。

19.6.3. 为低延迟和高性能配置主机固件

裸机主机需要在置备主机前配置固件。固件配置取决于您的特定硬件和安装的具体要求。

流程

  1. UEFI/BIOS Boot Mode 设置为 UEFI
  2. 在主机引导顺序中,设置 Hard drive first
  3. 为您的硬件应用特定的固件配置。下表描述了 Intel Xeon Skylake 或 Intel Cascade Lake 服务器的代表固件配置,它基于 Intel FlexRAN 4G 和 5G 基带 PHY 参考设计。

    重要

    确切的固件配置取决于您的特定硬件和网络要求。以下示例配置仅用于说明目的。

    表 19.9. Intel Xeon Skylake 或 Cascade Lake 服务器的固件配置示例
    固件设置配置

    CPU Power 和性能策略

    性能

    非核心频率扩展

    Disabled

    性能限制

    Disabled

    增强的 Intel SpeedStep ® Tech

    Enabled

    Intel 配置的 TDP

    Enabled

    可配置 TDP 级别

    2 级

    Intel® Turbo Boost Technology

    Enabled

    节能 Turbo

    Disabled

    硬件 P-State

    Disabled

    软件包 C-State

    C0/C1 状态

    C1E

    Disabled

    处理器 C6

    Disabled

注意

在主机的固件中启用全局 SR-IOV 和 VT-d 设置。这些设置与裸机环境相关。

19.6.4. 受管集群网络的连接先决条件

在安装并置备带有 GitOps Zero Touch Provisioning (ZTP) 管道的受管集群前,受管集群主机必须满足以下网络先决条件:

  • hub 集群中的 GitOps ZTP 容器和目标裸机主机的 Baseboard Management Controller (BMC) 之间必须有双向连接。
  • 受管集群必须能够解析和访问 hub 主机名和 *.apps 主机名的 API 主机名。以下是 hub 和 *.apps 主机名的 API 主机名示例:

    • api.hub-cluster.internal.domain.com
    • console-openshift-console.apps.hub-cluster.internal.domain.com
  • hub 集群必须能够解析并访问受管集群的 API 和 *.app 主机名。以下是受管集群的 API 主机名和 *.apps 主机名示例:

    • api.sno-managed-cluster-1.internal.domain.com
    • console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com

19.6.5. 使用 GitOps ZTP 在单节点 OpenShift 中的工作负载分区

工作负载分区配置 OpenShift Container Platform 服务、集群管理工作负载和基础架构 pod,以便在保留数量的主机 CPU 上运行。

要使用 GitOps Zero Touch Provisioning (ZTP)配置工作负载分区,您可以在用于安装集群的 SiteConfig 自定义资源(CR)中配置 cpuPartitioningMode 字段,并应用在主机上配置 isolatedreserved CPU 的 PerformanceProfile CR。

配置 SiteConfig CR 在集群安装过程中启用工作负载分区,并应用 PerformanceProfile CR 将 CPU 的特定分配配置为保留和隔离的集合。这两个步骤在集群置备过程中的不同点发生。

注意

使用 SiteConfig CR 中的 cpuPartitioningMode 字段配置工作负载分区是 OpenShift Container Platform 4.13 中的技术预览功能。

另外,您可以使用 SiteConfig 自定义资源(CR)的 cpuset 字段指定集群管理 CPU 资源,以及组 PolicyGenTemplate CR 的 reserved 字段。GitOps ZTP 管道使用这些值来填充工作负载分区 MachineConfig CR (cpuset) 和配置单节点 OpenShift 集群的 PerformanceProfile CR (reserved)中的所需字段。这个方法是 OpenShift Container Platform 4.14 中的正式发行(GA)。

工作负载分区配置将 OpenShift Container Platform 基础架构 pod 固定到 reserved CPU 集。systemd、CRI-O 和 kubelet 等平台服务在 reserved CPU 集中运行。isolated CPU 集只分配给容器工作负载。隔离 CPU 可确保工作负载保证对指定 CPU 的访问,而不会与同一节点上运行的其他应用程序竞争。所有不是隔离的 CPU 都应保留。

重要

确保 reservedisolated CPU 集不会相互重叠。

其他资源

19.6.6. 推荐的集群安装清单

ZTP 管道在集群安装过程中应用以下自定义资源 (CR)。这些配置 CR 确保集群满足运行 vDU 应用程序所需的功能和性能要求。

注意

当将 GitOps ZTP 插件和 SiteConfig CR 用于集群部署时,默认包含以下 MachineConfig CR。

使用 SiteConfig extraManifests 过滤器更改默认包括的 CR。如需更多信息,请参阅使用 SiteConfig CR 的高级受管集群配置

19.6.6.1. 工作负载分区

运行 DU 工作负载的单节点 OpenShift 集群需要工作负载分区。这限制了运行平台服务的内核数,从而最大程度提高应用程序有效负载的 CPU 内核。

注意

工作负载分区只能在集群安装过程中启用。您不能在安装后禁用工作负载分区。但是,您可以通过 PerformanceProfile CR 更改分配给隔离和保留集的 CPU 集合。更改 CPU 设置会导致节点重新引导。

从 OpenShift Container Platform 4.12 升级到 4.13+

当使用 cpuPartitioningMode 启用工作负载分区时,从用来置备集群的 /extra-manifest 文件夹中删除工作负载分区 MachineConfig CR。

工作负载分区的建议 SiteConfig CR 配置

apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
  name: "<site_name>"
  namespace: "<site_name>"
spec:
  baseDomain: "example.com"
  cpuPartitioningMode: AllNodes 1

1
cpuPartitioningMode 字段设置为 AllNodes,为集群中的所有节点配置工作负载分区。

验证

检查应用程序和集群系统 CPU 固定是否正确。运行以下命令:

  1. 为受管集群打开远程 shell 提示符:

    $ oc debug node/example-sno-1
  2. 检查 OpenShift 基础架构应用程序 CPU 固定是否正确:

    sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done

    输出示例

    pid 8481's current affinity list: 0-1,52-53
    pid 8726's current affinity list: 0-1,52-53
    pid 9088's current affinity list: 0-1,52-53
    pid 9945's current affinity list: 0-1,52-53
    pid 10387's current affinity list: 0-1,52-53
    pid 12123's current affinity list: 0-1,52-53
    pid 13313's current affinity list: 0-1,52-53

  3. 检查系统应用程序 CPU 固定是否正确:

    sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done

    输出示例

    pid 1's current affinity list: 0-1,52-53
    pid 938's current affinity list: 0-1,52-53
    pid 962's current affinity list: 0-1,52-53
    pid 1197's current affinity list: 0-1,52-53

19.6.6.2. 减少平台管理占用空间

要减少平台的整体管理空间,需要一个 MachineConfig 自定义资源 (CR),它将所有特定于 Kubernetes 的挂载点放在独立于主机操作系统的新命名空间中。以下 base64 编码的示例 MachineConfig CR 演示了此配置。

推荐的容器挂载命名空间配置 (01-container-mount-ns-and-kubelet-conf-master.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: container-mount-namespace-and-kubelet-conf-master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo=
        mode: 493
        path: /usr/local/bin/extractExecStart
      - contents:
          source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo=
        mode: 493
        path: /usr/local/bin/nsenterCmns
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts

          [Service]
          Type=oneshot
          RemainAfterExit=yes
          RuntimeDirectory=container-mount-namespace
          Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace
          Environment=BIND_POINT=%t/container-mount-namespace/mnt
          ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}"
          ExecStartPre=touch ${BIND_POINT}
          ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared /
          ExecStop=umount -R ${RUNTIME_DIRECTORY}
        name: container-mount-namespace.service
      - dropins:
        - contents: |
            [Unit]
            Wants=container-mount-namespace.service
            After=container-mount-namespace.service

            [Service]
            ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
            EnvironmentFile=-/%t/%N-execstart.env
            ExecStart=
            ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
                ${ORIG_EXECSTART}"
          name: 90-container-mount-namespace.conf
        name: crio.service
      - dropins:
        - contents: |
            [Unit]
            Wants=container-mount-namespace.service
            After=container-mount-namespace.service

            [Service]
            ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
            EnvironmentFile=-/%t/%N-execstart.env
            ExecStart=
            ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
                ${ORIG_EXECSTART} --housekeeping-interval=30s"
          name: 90-container-mount-namespace.conf
        - contents: |
            [Service]
            Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
            Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s"
          name: 30-kubelet-interval-tuning.conf
        name: kubelet.service

19.6.6.3. SCTP

流控制传输协议 (SCTP) 是在 RAN 应用程序中使用的密钥协议。此 MachineConfig 对象向节点添加 SCTP 内核模块以启用此协议。

推荐的 SCTP 配置 (03-sctp-machine-config-master.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: load-sctp-module-master
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
        - contents:
            source: data:,
            verification: {}
          filesystem: root
          mode: 420
          path: /etc/modprobe.d/sctp-blacklist.conf
        - contents:
            source: data:text/plain;charset=utf-8,sctp
          filesystem: root
          mode: 420
          path: /etc/modules-load.d/sctp-load.conf

19.6.6.4. 加速容器启动

以下 MachineConfig CR 配置 OpenShift 核心进程和容器,以便在系统启动和关闭过程中使用所有可用的 CPU 内核。这会加快初始引导过程和重启过程中的系统恢复。

推荐的容器启动配置 (04-accelerated-container-startup-master.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 04-accelerated-container-startup-master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,#!/bin/bash
#
# Temporarily reset the core system processes's CPU affinity to be unrestricted to accelerate startup and shutdown
#
# The defaults below can be overridden via environment variables
#

# The default set of critical processes whose affinity should be temporarily unbound:
CRITICAL_PROCESSES=${CRITICAL_PROCESSES:-"crio kubelet NetworkManager conmon dbus"}

# Default wait time is 600s = 10m:
MAXIMUM_WAIT_TIME=${MAXIMUM_WAIT_TIME:-600}

# Default steady-state threshold = 2%
# Allowed values:
#  4  - absolute pod count (+/-)
#  4% - percent change (+/-)
#  -1 - disable the steady-state check
STEADY_STATE_THRESHOLD=${STEADY_STATE_THRESHOLD:-2%}

# Default steady-state window = 60s
# If the running pod count stays within the given threshold for this time
# period, return CPU utilization to normal before the maximum wait time has
# expires
STEADY_STATE_WINDOW=${STEADY_STATE_WINDOW:-60}

# Default steady-state allows any pod count to be "steady state"
# Increasing this will skip any steady-state checks until the count rises above
# this number to avoid false positives if there are some periods where the
# count doesn't increase but we know we can't be at steady-state yet.
STEADY_STATE_MINIMUM=${STEADY_STATE_MINIMUM:-0}

#######################################################

KUBELET_CPU_STATE=/var/lib/kubelet/cpu_manager_state
FULL_CPU_STATE=/sys/fs/cgroup/cpuset/cpuset.cpus
KUBELET_CONF=/etc/kubernetes/kubelet.conf
unrestrictedCpuset() {
  local cpus
  if [[ -e $KUBELET_CPU_STATE ]]; then
    cpus=$(jq -r '.defaultCpuSet' <$KUBELET_CPU_STATE)
    if [[ -n "${cpus}" && -e ${KUBELET_CONF} ]]; then
      reserved_cpus=$(jq -r '.reservedSystemCPUs' </etc/kubernetes/kubelet.conf)
      if [[ -n "${reserved_cpus}" ]]; then
        # Use taskset to merge the two cpusets
        cpus=$(taskset -c "${reserved_cpus},${cpus}" grep -i Cpus_allowed_list /proc/self/status | awk '{print $2}')
      fi
    fi
  fi
  if [[ -z $cpus ]]; then
    # fall back to using all cpus if the kubelet state is not configured yet
    [[ -e $FULL_CPU_STATE ]] || return 1
    cpus=$(<$FULL_CPU_STATE)
  fi
  echo $cpus
}

restrictedCpuset() {
  for arg in $(</proc/cmdline); do
    if [[ $arg =~ ^systemd.cpu_affinity= ]]; then
      echo ${arg#*=}
      return 0
    fi
  done
  return 1
}

resetAffinity() {
  local cpuset="$1"
  local failcount=0
  local successcount=0
  logger "Recovery: Setting CPU affinity for critical processes \"$CRITICAL_PROCESSES\" to $cpuset"
  for proc in $CRITICAL_PROCESSES; do
    local pids="$(pgrep $proc)"
    for pid in $pids; do
      local tasksetOutput
      tasksetOutput="$(taskset -apc "$cpuset" $pid 2>&1)"
      if [[ $? -ne 0 ]]; then
        echo "ERROR: $tasksetOutput"
        ((failcount++))
      else
        ((successcount++))
      fi
    done
  done

  logger "Recovery: Re-affined $successcount pids successfully"
  if [[ $failcount -gt 0 ]]; then
    logger "Recovery: Failed to re-affine $failcount processes"
    return 1
  fi
}

setUnrestricted() {
  logger "Recovery: Setting critical system processes to have unrestricted CPU access"
  resetAffinity "$(unrestrictedCpuset)"
}

setRestricted() {
  logger "Recovery: Resetting critical system processes back to normally restricted access"
  resetAffinity "$(restrictedCpuset)"
}

currentAffinity() {
  local pid="$1"
  taskset -pc $pid | awk -F': ' '{print $2}'
}

within() {
  local last=$1 current=$2 threshold=$3
  local delta=0 pchange
  delta=$(( current - last ))
  if [[ $current -eq $last ]]; then
    pchange=0
  elif [[ $last -eq 0 ]]; then
    pchange=1000000
  else
    pchange=$(( ( $delta * 100) / last ))
  fi
  echo -n "last:$last current:$current delta:$delta pchange:${pchange}%: "
  local absolute limit
  case $threshold in
    *%)
      absolute=${pchange##-} # absolute value
      limit=${threshold%%%}
      ;;
    *)
      absolute=${delta##-} # absolute value
      limit=$threshold
      ;;
  esac
  if [[ $absolute -le $limit ]]; then
    echo "within (+/-)$threshold"
    return 0
  else
    echo "outside (+/-)$threshold"
    return 1
  fi
}

steadystate() {
  local last=$1 current=$2
  if [[ $last -lt $STEADY_STATE_MINIMUM ]]; then
    echo "last:$last current:$current Waiting to reach $STEADY_STATE_MINIMUM before checking for steady-state"
    return 1
  fi
  within $last $current $STEADY_STATE_THRESHOLD
}

waitForReady() {
  logger "Recovery: Waiting ${MAXIMUM_WAIT_TIME}s for the initialization to complete"
  local lastSystemdCpuset="$(currentAffinity 1)"
  local lastDesiredCpuset="$(unrestrictedCpuset)"
  local t=0 s=10
  local lastCcount=0 ccount=0 steadyStateTime=0
  while [[ $t -lt $MAXIMUM_WAIT_TIME ]]; do
    sleep $s
    ((t += s))
    # Re-check the current affinity of systemd, in case some other process has changed it
    local systemdCpuset="$(currentAffinity 1)"
    # Re-check the unrestricted Cpuset, as the allowed set of unreserved cores may change as pods are assigned to cores
    local desiredCpuset="$(unrestrictedCpuset)"
    if [[ $systemdCpuset != $lastSystemdCpuset || $lastDesiredCpuset != $desiredCpuset ]]; then
      resetAffinity "$desiredCpuset"
      lastSystemdCpuset="$(currentAffinity 1)"
      lastDesiredCpuset="$desiredCpuset"
    fi

    # Detect steady-state pod count
    ccount=$(crictl ps | wc -l)
    if steadystate $lastCcount $ccount; then
      ((steadyStateTime += s))
      echo "Steady-state for ${steadyStateTime}s/${STEADY_STATE_WINDOW}s"
      if [[ $steadyStateTime -ge $STEADY_STATE_WINDOW ]]; then
        logger "Recovery: Steady-state (+/- $STEADY_STATE_THRESHOLD) for ${STEADY_STATE_WINDOW}s: Done"
        return 0
      fi
    else
      if [[ $steadyStateTime -gt 0 ]]; then
        echo "Resetting steady-state timer"
        steadyStateTime=0
      fi
    fi
    lastCcount=$ccount
  done
  logger "Recovery: Recovery Complete Timeout"
}

main() {
  if ! unrestrictedCpuset >&/dev/null; then
    logger "Recovery: No unrestricted Cpuset could be detected"
    return 1
  fi

  if ! restrictedCpuset >&/dev/null; then
    logger "Recovery: No restricted Cpuset has been configured.  We are already running unrestricted."
    return 0
  fi

  # Ensure we reset the CPU affinity when we exit this script for any reason
  # This way either after the timer expires or after the process is interrupted
  # via ^C or SIGTERM, we return things back to the way they should be.
  trap setRestricted EXIT

  logger "Recovery: Recovery Mode Starting"
  setUnrestricted
  waitForReady
}

if [[ "${BASH_SOURCE[0]}" = "${0}" ]]; then
  main "${@}"
  exit $?
fi

        mode: 493
        path: /usr/local/bin/accelerated-container-startup.sh
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Unlocks more CPUs for critical system processes during container startup

          [Service]
          Type=simple
          ExecStart=/usr/local/bin/accelerated-container-startup.sh

          # Maximum wait time is 600s = 10m:
          Environment=MAXIMUM_WAIT_TIME=600

          # Steady-state threshold = 2%
          # Allowed values:
          #  4  - absolute pod count (+/-)
          #  4% - percent change (+/-)
          #  -1 - disable the steady-state check
          # Note: '%' must be escaped as '%%' in systemd unit files
          Environment=STEADY_STATE_THRESHOLD=2%%

          # Steady-state window = 120s
          # If the running pod count stays within the given threshold for this time
          # period, return CPU utilization to normal before the maximum wait time has
          # expires
          Environment=STEADY_STATE_WINDOW=120

          # Steady-state minimum = 40
          # Increasing this will skip any steady-state checks until the count rises above
          # this number to avoid false positives if there are some periods where the
          # count doesn't increase but we know we can't be at steady-state yet.
          Environment=STEADY_STATE_MINIMUM=40

          [Install]
          WantedBy=multi-user.target
        enabled: true
        name: accelerated-container-startup.service
      - contents: |
          [Unit]
          Description=Unlocks more CPUs for critical system processes during container shutdown
          DefaultDependencies=no

          [Service]
          Type=simple
          ExecStart=/usr/local/bin/accelerated-container-startup.sh

          # Maximum wait time is 600s = 10m:
          Environment=MAXIMUM_WAIT_TIME=600

          # Steady-state threshold
          # Allowed values:
          #  4  - absolute pod count (+/-)
          #  4% - percent change (+/-)
          #  -1 - disable the steady-state check
          # Note: '%' must be escaped as '%%' in systemd unit files
          Environment=STEADY_STATE_THRESHOLD=-1

          # Steady-state window = 60s
          # If the running pod count stays within the given threshold for this time
          # period, return CPU utilization to normal before the maximum wait time has
          # expires
          Environment=STEADY_STATE_WINDOW=60

          [Install]
          WantedBy=shutdown.target reboot.target halt.target
        enabled: true
        name: accelerated-container-shutdown.service

19.6.6.5. 使用 kdump 自动内核崩溃转储

kdump 是一个 Linux 内核功能,可在内核崩溃时创建内核崩溃转储。kdump 使用以下 MachineConfig CR 启用。

推荐的 MachineConfig 删除 ice 驱动程序(05-kdump-config-master.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 05-kdump-config-master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - enabled: true
        name: kdump-remove-ice-module.service
        contents: |
          [Unit]
          Description=Remove ice module when doing kdump
          Before=kdump.service
          [Service]
          Type=oneshot
          RemainAfterExit=true
          ExecStart=/usr/local/bin/kdump-remove-ice-module.sh
          [Install]
          WantedBy=multi-user.target
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo=
          mode: 448
          path: /usr/local/bin/kdump-remove-ice-module.sh

推荐的 kdump 配置(06-kdump-master.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 06-kdump-enable-master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - enabled: true
        name: kdump.service
  kernelArguments:
    - crashkernel=512M

19.6.6.6. 禁用自动 CRI-O 缓存擦除

在不受控制的主机关闭或集群重启后,CRI-O 会自动删除整个 CRI-O 缓存,从而导致在节点重启时从 registry 中拉取所有镜像。这可能导致不可接受的恢复时间或者恢复失败。要防止这会在使用 GitOps ZTP 安装的单节点 OpenShift 集群中发生,请在集群安装过程中禁用 CRI-O 删除缓存功能。

推荐的 MachineConfig CR 在 control plane 节点上禁用 CRI-O 缓存擦除(99-crio-disable-wipe-master.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-crio-disable-wipe-master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
          mode: 420
          path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

推荐的 MachineConfig CR 在 worker 节点上禁用 CRI-O 缓存擦除(99-crio-disable-wipe-worker.yaml)

# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-crio-disable-wipe-worker
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
          mode: 420
          path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

19.6.6.7. 将 crun 配置为默认容器运行时

以下 ContainerRuntimeConfig 自定义资源 (CR) 将 crun 配置为 control plane 和 worker 节点的默认 OCI 容器运行时。crun 容器运行时快速且轻量级,内存占用较低。

重要

为获得最佳性能,请在单节点 OpenShift、三节点 OpenShift 和标准集群中为 control plane 和 worker 节点启用 crun。要避免在应用 CR 时重启集群,请将更改作为 GitOps ZTP 额外日期 0 安装清单应用。

为 control plane 节点推荐的 ContainerRuntimeConfig CR (enable-crun-master.yaml)

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: enable-crun-master
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/master: ""
 containerRuntimeConfig:
   defaultRuntime: crun

worker 节点推荐的 ContainerRuntimeConfig CR (enable-crun-worker.yaml)

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: enable-crun-worker
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: ""
 containerRuntimeConfig:
   defaultRuntime: crun

19.6.7. 推荐的安装后集群配置

当集群安装完成后,ZTP 管道会应用运行 DU 工作负载所需的以下自定义资源 (CR)。

注意

在 GitOps ZTP v4.10 及更早版本中,您可以使用 MachineConfig CR 配置 UEFI 安全引导。GitOps ZTP v4.11 及更新的版本中不再需要。在 v4.11 中,您可以通过更新用于安装集群的 SiteConfig CR 中的 spec.clusters.nodes.bootMode 字段来为单节点 OpenShift 集群配置 UEFI 安全引导。如需更多信息,请参阅使用 SiteConfig 和 GitOps ZTP 部署受管集群

19.6.7.1. Operator 命名空间和 Operator 组

运行 DU 工作负载的单节点 OpenShift 集群需要以下 OperatorGroupNamespace 自定义资源 (CR):

  • Local Storage Operator
  • Logging Operator
  • PTP Operator
  • Cluster Network Operator

需要以下 CR:

推荐的 Storage Operator 命名空间和 OperatorGroup 配置

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-local-storage
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-local-storage
  namespace: openshift-local-storage
spec:
  targetNamespaces:
  - openshift-local-storage

推荐的 Cluster Logging Operator 命名空间和 OperatorGroup 配置

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  targetNamespaces:
  - openshift-logging

推荐的 PTP Operator 命名空间和 OperatorGroup 配置

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-ptp
  annotations:
    workload.openshift.io/allowed: management
  labels:
    openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ptp-operators
  namespace: openshift-ptp
spec:
  targetNamespaces:
  - openshift-ptp

推荐的 SR-IOV Operator 命名空间和 OperatorGroup 配置

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator

19.6.7.2. Operator 订阅

运行 DU 工作负载的单节点 OpenShift 集群需要以下 Subscription CR。订阅提供下载以下 Operator 的位置:

  • Local Storage Operator
  • Logging Operator
  • PTP Operator
  • Cluster Network Operator

对于每个 Operator 订阅,指定要从中获取 Operator 的频道。推荐的频道是 stable

您可以指定 ManualAutomatic 更新。在 Automatic 模式中,Operator 会在 registry 中可用时自动更新到频道中最新版本。在 Manual 模式中,只有在被明确批准时才会安装新的 Operator 版本。

注意

对订阅使用 Manual 模式。这可让您控制 Operator 更新在计划/计划的维护窗口中适合的时间。

推荐的 Local Storage Operator 订阅

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: local-storage-operator
  namespace: openshift-local-storage
spec:
  channel: "stable"
  name: local-storage-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown

推荐的 SR-IOV Operator 订阅

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "stable"
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown

推荐的 PTP Operator 订阅

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ptp-operator-subscription
  namespace: openshift-ptp
spec:
  channel: "stable"
  name: ptp-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown

推荐的 Cluster Logging Operator 订阅

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  channel: "stable"
  name: cluster-logging
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown

19.6.7.3. 集群日志记录和日志转发

运行 DU 工作负载的单节点 OpenShift 集群需要日志记录和日志转发以进行调试。需要以下 ClusterLoggingClusterLogForwarder 自定义资源 (CR)。

推荐的集群日志记录和日志转发配置

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
 managementState: "Managed"
 curation:
   type: "curator"
   curator:
     schedule: "30 3 * * *"
 collection:
   logs:
     type: "fluentd"
     fluentd: {}

推荐的日志转发配置

apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
    - type: "kafka"
      name: kafka-open
      url: tcp://10.46.55.190:9092/test
  inputs:
    - name: infra-logs
      infrastructure: {}
  pipelines:
    - name: audit-logs
      inputRefs:
        - audit
      outputRefs:
        - kafka-open
    - name: infrastructure-logs
      inputRefs:
        - infrastructure
      outputRefs:
        - kafka-open

spec.outputs.url 字段设置为日志转发到的 Kafka 服务器的 URL。

19.6.7.4. 性能配置集

运行 DU 工作负载的单节点 OpenShift 集群需要 Node Tuning Operator 性能配置集才能使用实时主机功能和服务。

注意

在早期版本的 OpenShift Container Platform 中,Performance Addon Operator 用来实现自动性能优化,以便为 OpenShift 应用程序实现低延迟性能。在 OpenShift Container Platform 4.11 及更新的版本中,这个功能是 Node Tuning Operator 的一部分。

以下示例 PerformanceProfile CR 演示了所需的单节点 OpenShift 集群配置。

推荐的性能配置集配置

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: openshift-node-performance-profile
spec:
  additionalKernelArgs:
  - "rcupdate.rcu_normal_after_boot=0"
  - "efi=runtime"
  - "module_blacklist=irdma"
  cpu:
    isolated: 2-51,54-103
    reserved: 0-1,52-53
  hugepages:
    defaultHugepagesSize: 1G
    pages:
      - count: 32
        size: 1G
        node: 0
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ''
  numa:
    topologyPolicy: "restricted"
  realTimeKernel:
    enabled: true
  workloadHints:
    realTime: true
    highPowerConsumption: false
    perPodPowerManagement: false

表 19.10. 单节点 OpenShift 集群的 PerformanceProfile CR 选项
PerformanceProfile CR 字段描述

metadata.name

确保名称与相关 GitOps ZTP 自定义资源(CR)中设置的以下字段匹配:

  • TunedPerformancePatch.yaml 中的 include=openshift-node-performance-${PerformanceProfile.metadata.name}
  • validatorCRs/informDuValidator.yaml 中的 name: 50-performance-${PerformanceProfile.metadata.name}

spec.additionalKernelArgs

"efi=runtime" 为集群主机配置 UEFI 安全引导。

spec.cpu.isolated

设置隔离的 CPU。确保所有 Hyper-Threading 对都匹配。

重要

保留和隔离的 CPU 池不得重叠,并且必须一起跨越所有可用的内核。未考虑导致系统中未定义的 CPU 内核。

spec.cpu.reserved

设置保留的 CPU。启用工作负载分区时,系统进程、内核线程和系统容器线程仅限于这些 CPU。所有不是隔离的 CPU 都应保留。

spec.hugepages.pages

  • 设置巨页数量(数量)
  • 设置巨页大小(大小)。
  • node 设置为 NUMA 节点,它是 hugepages 分配的位置 (node)

spec.realTimeKernel

enabled 设置为 true 以使用实时内核。

spec.workloadHints

使用 workloadHints 为不同类型的工作负载定义顶级标记集合。示例配置为低延迟和高性能配置集群。

19.6.7.5. 配置集群时间同步

为 control plane 或 worker 节点运行一次性系统时间同步作业。

推荐的 control plane 节点一次同步 (99-sync-time-once-master.yaml)

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-sync-time-once-master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Sync time once
            After=network.service
            [Service]
            Type=oneshot
            TimeoutStartSec=300
            ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0'
            ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: sync-time-once.service

推荐的 worker 节点一次同步时间 (99-sync-time-once-worker.yaml)

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-sync-time-once-worker
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Sync time once
            After=network.service
            [Service]
            Type=oneshot
            TimeoutStartSec=300
            ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0'
            ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: sync-time-once.service

19.6.7.6. PTP

单节点 OpenShift 集群使用 Precision Time Protocol (PTP) 进行网络时间同步。以下示例 PtpConfig CR 演示了普通时钟所需的 PTP 配置。您应用的确切配置将取决于节点硬件和特定用例。

推荐的 PTP 配置

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: ordinary
  namespace: openshift-ptp
spec:
  profile:
  - name: "ordinary"
    # The interface name is hardware-specific
    interface: ens5f0
    ptp4lOpts: "-2 -s"
    phc2sysOpts: "-a -r -n 24"
    ptpSchedulingPolicy: SCHED_FIFO
    ptpSchedulingPriority: 10
    ptpSettings:
      logReduce: "true"
    ptp4lConf: |
      [global]
      #
      # Default Data Set
      #
      twoStepFlag 1
      slaveOnly 0
      priority1 128
      priority2 128
      domainNumber 24
      #utc_offset 37
      clockClass 255
      clockAccuracy 0xFE
      offsetScaledLogVariance 0xFFFF
      free_running 0
      freq_est_interval 1
      dscp_event 0
      dscp_general 0
      dataset_comparison G.8275.x
      G.8275.defaultDS.localPriority 128
      #
      # Port Data Set
      #
      logAnnounceInterval -3
      logSyncInterval -4
      logMinDelayReqInterval -4
      logMinPdelayReqInterval -4
      announceReceiptTimeout 3
      syncReceiptTimeout 0
      delayAsymmetry 0
      fault_reset_interval 4
      neighborPropDelayThresh 20000000
      masterOnly 0
      G.8275.portDS.localPriority 128
      #
      # Run time options
      #
      assume_two_step 0
      logging_level 6
      path_trace_enabled 0
      follow_up_info 0
      hybrid_e2e 0
      inhibit_multicast_service 0
      net_sync_monitor 0
      tc_spanning_tree 0
      tx_timestamp_timeout 50
      unicast_listen 0
      unicast_master_table 0
      unicast_req_duration 3600
      use_syslog 1
      verbose 0
      summary_interval 0
      kernel_leap 1
      check_fup_sync 0
      #
      # Servo Options
      #
      pi_proportional_const 0.0
      pi_integral_const 0.0
      pi_proportional_scale 0.0
      pi_proportional_exponent -0.3
      pi_proportional_norm_max 0.7
      pi_integral_scale 0.0
      pi_integral_exponent 0.4
      pi_integral_norm_max 0.3
      step_threshold 2.0
      first_step_threshold 0.00002
      max_frequency 900000000
      clock_servo pi
      sanity_freq_limit 200000000
      ntpshm_segment 0
      #
      # Transport options
      #
      transportSpecific 0x0
      ptp_dst_mac 01:1B:19:00:00:00
      p2p_dst_mac 01:80:C2:00:00:0E
      udp_ttl 1
      udp6_scope 0x0E
      uds_address /var/run/ptp4l
      #
      # Default interface options
      #
      clock_type OC
      network_transport L2
      delay_mechanism E2E
      time_stamping hardware
      tsproc_mode filter
      delay_filter moving_median
      delay_filter_length 10
      egressLatency 0
      ingressLatency 0
      boundary_clock_jbod 0
      #
      # Clock description
      #
      productDescription ;;
      revisionData ;;
      manufacturerIdentity 00:00:00
      userDescription ;
      timeSource 0xA0
  recommend:
  - profile: "ordinary"
    priority: 4
    match:
    - nodeLabel: "node-role.kubernetes.io/$mcp"

19.6.7.7. 扩展的 Tuned 配置集

运行 DU 工作负载的单节点 OpenShift 集群需要额外的高性能工作负载所需的性能调优配置。以下 Tuned CR 示例扩展了 Tuned 配置集:

推荐的扩展 Tuned 配置集配置

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: performance-patch
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
    - name: performance-patch
      data: |
        [main]
        summary=Configuration changes profile inherited from performance created tuned
        include=openshift-node-performance-openshift-node-performance-profile
        [sysctl]
        kernel.timer_migration=1
        [scheduler]
        group.ice-ptp=0:f:10:*:ice-ptp.*
        group.ice-gnss=0:f:10:*:ice-gnss.*
        [service]
        service.stalld=start,enable
        service.chronyd=stop,disable
  recommend:
    - machineConfigLabels:
        machineconfiguration.openshift.io/role: "master"
      priority: 19
      profile: performance-patch

表 19.11. 为单节点 OpenShift 集群调整 CR 选项
TuneD CR 字段描述

spec.profile.data

  • 您在 spec.profile.data 中设置的 include 行必须与关联的 PerformanceProfile CR 名称匹配。例如 include=openshift-node-performance-${PerformanceProfile.metadata.name}
  • 使用非实时内核时,从 [sysctl] 部分中删除 timer_migration override 行。

19.6.7.8. SR-IOV

单根 I/O 虚拟化(SR-IOV)通常用于启用前端和中间网络。以下 YAML 示例为单节点 OpenShift 集群配置 SR-IOV。

注意

SriovNetwork CR 的配置会根据您的特定网络和基础架构要求而有所不同。

推荐的 SriovOperatorConfig 配置

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  configDaemonNodeSelector:
    "node-role.kubernetes.io/master": ""
  enableInjector: true
  enableOperatorWebhook: true

表 19.12. 用于单节点 OpenShift 集群的 SriovOperatorConfig CR 选项
SriovOperatorConfig CR 字段描述

spec.enableInjector

禁用 Injector pod 以减少管理 pod 的数量。从启用 Injector pod 开始,仅在验证用户清单后禁用它们。如果 Injector 被禁用,使用 SR-IOV 资源的容器必须在容器 spec 的 requestslimits 部分中明确分配它们。

例如:

containers:
- name: my-sriov-workload-container
  resources:
    limits:
      openshift.io/<resource_name>:  "1"
    requests:
      openshift.io/<resource_name>:  "1"

spec.enableOperatorWebhook

禁用 OperatorWebhook pod 以减少管理 pod 的数量。从启用 OperatorWebhook pod 开始,仅在验证用户清单后禁用它们。

推荐的 SriovNetwork 配置

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: ""
  namespace: openshift-sriov-network-operator
spec:
  resourceName: "du_mh"
  networkNamespace:  openshift-sriov-network-operator
  vlan: "150"
  spoofChk: ""
  ipam: ""
  linkState: ""
  maxTxRate: ""
  minTxRate: ""
  vlanQoS: ""
  trust: ""
  capabilities: ""

表 19.13. 用于单节点 OpenShift 集群的 SriovNetwork CR 选项
SriovNetwork CR 字段描述

spec.vlan

为 midhaul 网络配置 VLAN 的 vlan

推荐的 SriovNetworkNodePolicy 配置

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: $name
  namespace: openshift-sriov-network-operator
spec:
  # Attributes for Mellanox/Intel based NICs
  deviceType: netdevice/vfio-pci
  isRdma: true/false
  nicSelector:
    # The exact physical function name must match the hardware used
    pfNames: [ens7f0]
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numVfs: 8
  priority: 10
  resourceName: du_mh

表 19.14. 用于单节点 OpenShift 集群的 SriovNetworkPolicy CR 选项
SriovNetworkNodePolicy CR 字段描述

spec.deviceType

deviceType 配置为 vfio-pcinetdevice

spec.nicSelector.pfNames

指定连接到前端网络的接口。

spec.numVfs

指定前端网络的 VF 数量。

19.6.7.9. Console Operator

使用集群功能来防止安装 Console Operator。当节点被集中管理时,不需要它。删除 Operator 为应用程序工作负载提供额外的空间和容量。

要在安装过程中禁用 Console Operator,请在 SiteConfig 自定义资源(CR)的 spec.clusters.0.installConfigOverrides 字段中设置以下内容:

installConfigOverrides:  "{\"capabilities\":{\"baselineCapabilitySet\": \"None\" }}"

19.6.7.10. Alertmanager

运行 DU 工作负载的单节点 OpenShift 集群需要减少 OpenShift Container Platform 监控组件所消耗的 CPU 资源。以下 ConfigMap 自定义资源(CR)禁用 Alertmanager。

推荐的集群监控配置(ReduceMonitoringFootprint.yaml)

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  annotations:
    ran.openshift.io/ztp-deploy-wave: "1"
data:
  config.yaml: |
    alertmanagerMain:
      enabled: false
    prometheusK8s:
       retention: 24h

19.6.7.11. Operator Lifecycle Manager

运行分布式单元工作负载的单节点 OpenShift 集群需要对 CPU 资源进行一致的访问。Operator Lifecycle Manager (OLM) 会定期从 Operator 收集性能数据,从而增加 CPU 利用率。以下 ConfigMap 自定义资源 (CR) 禁用 OLM 的 Operator 性能数据收集。

推荐的集群 OLM 配置 (ReduceOLMFootprint.yaml)

apiVersion: v1
kind: ConfigMap
metadata:
  name: collect-profiles-config
  namespace: openshift-operator-lifecycle-manager
data:
  pprof-config.yaml: |
    disabled: True

19.6.7.12. LVM 存储

您可以使用逻辑卷管理器(LVM)存储在单节点 OpenShift 集群上动态置备本地存储。

注意

推荐的单节点 OpenShift 存储解决方案是 Local Storage Operator。另外,您可以使用 LVM Storage,但需要额外的 CPU 资源。

以下 YAML 示例将节点的存储配置为可供 OpenShift Container Platform 应用程序使用。

推荐的 LVMCluster 配置(StorageLVMCluster.yaml)

apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  name: odf-lvmcluster
  namespace: openshift-storage
spec:
  storage:
    deviceClasses:
    - name: vg1
      deviceSelector:
        paths:
        - /usr/disk/by-path/pci-0000:11:00.0-nvme-1
      thinPoolConfig:
        name: thin-pool-1
        overprovisionRatio: 10
        sizePercent: 90

表 19.15. 单节点 OpenShift 集群的 LVMCluster CR 选项
LVMCluster CR 字段描述

deviceSelector.paths

配置用于 LVM 存储的磁盘。如果没有指定磁盘,LVM 存储将使用指定精简池中所有未使用的磁盘。

19.6.7.13. 网络诊断

运行 DU 工作负载的单节点 OpenShift 集群需要较少的 pod 网络连接检查,以减少这些 pod 创建的额外负载。以下自定义资源 (CR) 禁用这些检查。

推荐的网络诊断配置(DisableSnoNetworkDiag.yaml)

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  disableNetworkDiagnostics: true

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.