22.6. 推荐的 vDU 应用程序工作负载的单节点 OpenShift 集群配置
使用以下引用信息,了解在集群中部署虚拟分布式单元 (vDU) 应用程序所需的单节点 OpenShift 配置。配置包括用于高性能工作负载的集群优化、启用工作负载分区以及最大程度减少安装后所需的重启数量。
其他资源
- 要手动部署单个集群,请参阅使用 ZTP 手动安装单节点 OpenShift 集群。
- 要使用 GitOps 零接触置备 (ZTP) 部署集群集合,请参阅使用 ZTP 部署边缘站点。
22.6.1. 在 OpenShift Container Platform 上运行低延迟应用程序
OpenShift Container Platform 通过使用几个技术和专用硬件设备,为在商业现成 (COTS) 硬件上运行的应用程序启用低延迟处理:
- RHCOS 的实时内核
- 确保以高度的进程确定性处理工作负载。
- CPU 隔离
- 避免 CPU 调度延迟并确保 CPU 容量一致可用。
- NUMA 感知拓扑管理
- 将内存和巨页与 CPU 和 PCI 设备对齐,以将容器内存和巨页固定到非统一内存访问(NUMA)节点。所有服务质量 (QoS) 类的 Pod 资源保留在同一个 NUMA 节点上。这可降低延迟并提高节点的性能。
- 巨页内存管理
- 使用巨页大小可减少访问页表所需的系统资源量,从而提高系统性能。
- 使用 PTP 进行精确计时同步
- 允许以子微秒的准确性在网络中的节点之间进行同步。
22.6.2. vDU 应用程序工作负载的推荐集群主机要求
运行 vDU 应用程序工作负载需要一个具有足够资源的裸机主机来运行 OpenShift Container Platform 服务和生产工作负载。
profile | vCPU | memory | Storage |
---|---|---|---|
最小值 | 4 到 8 个 vCPU 内核 | 32GB RAM | 120GB |
当未启用并发多线程 (SMT) 或超线程时,一个 vCPU 相当于一个物理内核。启用后,使用以下公式来计算对应的比率:
- (每个内核的线程数 x 内核数)x 插槽数 = vCPU
使用虚拟介质引导时,服务器必须具有基板管理控制器(BMC)。
22.6.3. 为低延迟和高性能配置主机固件
裸机主机需要在置备主机前配置固件。固件配置取决于您的特定硬件和安装的具体要求。
流程
-
将 UEFI/BIOS Boot Mode 设置为
UEFI
。 - 在主机引导顺序中,设置 Hard drive first。
为您的硬件应用特定的固件配置。下表描述了 Intel Xeon Skylake 或 Intel Cascade Lake 服务器的代表固件配置,它基于 Intel FlexRAN 4G 和 5G 基带 PHY 参考设计。
重要确切的固件配置取决于您的特定硬件和网络要求。以下示例配置仅用于说明目的。
表 22.6. Intel Xeon Skylake 或 Cascade Lake 服务器的固件配置示例 固件设置 配置 CPU Power 和性能策略
性能
非核心频率扩展
Disabled
性能限制
Disabled
增强的 Intel SpeedStep ® Tech
Enabled
Intel 配置的 TDP
Enabled
可配置 TDP 级别
2 级
Intel® Turbo Boost Technology
Enabled
节能 Turbo
Disabled
硬件 P-State
Disabled
软件包 C-State
C0/C1 状态
C1E
Disabled
处理器 C6
Disabled
在主机的固件中启用全局 SR-IOV 和 VT-d 设置。这些设置与裸机环境相关。
22.6.4. 受管集群网络的连接先决条件
在安装并置备带有零接触置备 (ZTP) GitOps 管道的受管集群前,受管集群主机必须满足以下网络先决条件:
- hub 集群中的 ZTP GitOps 容器和目标裸机主机的 Baseboard Management Controller (BMC) 之间必须有双向连接。
受管集群必须能够解析和访问 hub 主机名和
*.apps
主机名的 API 主机名。以下是 hub 和*.apps
主机名的 API 主机名示例:-
api.hub-cluster.internal.domain.com
-
console-openshift-console.apps.hub-cluster.internal.domain.com
-
hub 集群必须能够解析并访问受管集群的 API 和
*.app
主机名。以下是受管集群的 API 主机名和*.apps
主机名示例:-
api.sno-managed-cluster-1.internal.domain.com
-
console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
-
22.6.5. 使用 GitOps ZTP 在单节点 OpenShift 中的工作负载分区
工作负载分区配置 OpenShift Container Platform 服务、集群管理工作负载和基础架构 pod,以便在保留数量的主机 CPU 上运行。
要使用 GitOps ZTP 配置工作负载分区,您可以使用 SiteConfig
自定义资源 (CR) 的 cpuset
字段指定集群管理 CPU 资源,以及组 PolicyGenTemplate
CR 的 reserved
字段。GitOps ZTP 管道使用这些值来填充工作负载分区 MachineConfig
CR (cpuset
) 和配置单节点 OpenShift 集群的 PerformanceProfile
CR (reserved
)中的所需字段。
为了获得最佳性能,请确保 reserved
和 isolated
CPU 集不在 NUMA 区域间共享 CPU 内核。
-
MachineConfig
CR 的工作负载分区将 OpenShift Container Platform 基础架构 pod 固定到定义的cpuset
配置。 -
PerformanceProfile
CR 将 systemd 服务固定到保留的 CPU 中。
PerformanceProfile
CR 中指定的 保留
字段的值必须与工作负载分区 MachineConfig
CR 中的 cpuset
字段匹配。
其他资源
- 有关推荐的单节点 OpenShift 工作负载分区配置,请参阅 Workload partitioning。
22.6.6. 推荐的安装时集群配置
ZTP 管道在集群安装过程中应用以下自定义资源 (CR)。这些配置 CR 确保集群满足运行 vDU 应用程序所需的功能和性能要求。
当将 ZTP GitOps 插件和 SiteConfig
CR 用于集群部署时,默认包含以下 MachineConfig
CR。
使用 SiteConfig
extraManifests
过滤器更改默认包括的 CR。如需更多信息,请参阅使用 SiteConfig CR 的高级受管集群配置。
22.6.6.1. 工作负载分区
运行 DU 工作负载的单节点 OpenShift 集群需要工作负载分区。这限制了运行平台服务的内核数,从而最大程度提高应用程序有效负载的 CPU 内核。
工作负载分区只能在集群安装过程中启用。您不能在安装后禁用工作负载分区。但是,您可以通过更新您在性能配置集中定义的 cpu
值以及相关的 MachineConfig
自定义资源 (CR) 来重新配置工作负载分区。
启用工作负载分区的 base64 编码的 CR,它包含管理工作负载受限制的 CPU 集。为 base64 中的
crio.conf
和kubelet.conf
对特定于主机的值进行编码。调整内容以匹配集群性能配置集中指定的 CPU 集。它必须与集群主机中的内核数匹配。推荐的工作负载分区配置
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 02-master-workload-partitioning spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMSw1Mi01MyIgfQo= mode: 420 overwrite: true path: /etc/crio/crio.conf.d/01-workload-partitioning user: name: root - contents: source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTEsNTItNTMiCiAgfQp9Cg== mode: 420 overwrite: true path: /etc/kubernetes/openshift-workload-pinning user: name: root
在集群主机上配置时,
/etc/crio/crio.conf.d/01-workload-partitioning
的内容应该类似如下:[crio.runtime.workloads.management] activation_annotation = "target.workload.openshift.io/management" annotation_prefix = "resources.workload.openshift.io" resources = { "cpushares" = 0, "cpuset" = "0-1,52-53" } 1
- 1
cpuset
值因安装而异。如果启用了超线程,请为每个内核指定两个线程。cpuset
值必须与您在性能配置集中的spec.cpu.reserved
字段中定义的保留 CPU 匹配。
在集群中配置时,
/etc/kubernetes/openshift-workload-pinning
的内容应如下所示:{ "management": { "cpuset": "0-1,52-53" 1 } }
- 1
cpuset
必须与/etc/crio/crio.conf.d/01-workload-partitioning
中的cpuset
值匹配。
验证
检查应用程序和集群系统 CPU 固定是否正确。运行以下命令:
打开到受管集群的远程 shell 连接:
$ oc debug node/example-sno-1
检查 OpenShift 基础架构应用程序 CPU 固定是否正确:
sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done
输出示例
pid 8481's current affinity list: 0-1,52-53 pid 8726's current affinity list: 0-1,52-53 pid 9088's current affinity list: 0-1,52-53 pid 9945's current affinity list: 0-1,52-53 pid 10387's current affinity list: 0-1,52-53 pid 12123's current affinity list: 0-1,52-53 pid 13313's current affinity list: 0-1,52-53
检查系统应用程序 CPU 固定是否正确:
sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done
输出示例
pid 1's current affinity list: 0-1,52-53 pid 938's current affinity list: 0-1,52-53 pid 962's current affinity list: 0-1,52-53 pid 1197's current affinity list: 0-1,52-53
22.6.6.2. 减少平台管理占用空间
要减少平台的整体管理空间,需要一个 MachineConfig
自定义资源 (CR),它将所有特定于 Kubernetes 的挂载点放在独立于主机操作系统的新命名空间中。以下 base64 编码的示例 MachineConfig
CR 演示了此配置。
推荐的容器挂载命名空间配置
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: container-mount-namespace-and-kubelet-conf-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo= mode: 493 path: /usr/local/bin/extractExecStart - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo= mode: 493 path: /usr/local/bin/nsenterCmns systemd: units: - contents: | [Unit] Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts [Service] Type=oneshot RemainAfterExit=yes RuntimeDirectory=container-mount-namespace Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace Environment=BIND_POINT=%t/container-mount-namespace/mnt ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}" ExecStartPre=touch ${BIND_POINT} ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared / ExecStop=umount -R ${RUNTIME_DIRECTORY} enabled: true name: container-mount-namespace.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART}" name: 90-container-mount-namespace.conf name: crio.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART} --housekeeping-interval=30s" name: 90-container-mount-namespace.conf - contents: | [Service] Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s" Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s" name: 30-kubelet-interval-tuning.conf name: kubelet.service
22.6.6.3. SCTP
流控制传输协议 (SCTP) 是在 RAN 应用程序中使用的密钥协议。此 MachineConfig
对象向节点添加 SCTP 内核模块以启用此协议。
推荐的 SCTP 配置
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: load-sctp-module spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:, verification: {} filesystem: root mode: 420 path: /etc/modprobe.d/sctp-blacklist.conf - contents: source: data:text/plain;charset=utf-8,sctp filesystem: root mode: 420 path: /etc/modules-load.d/sctp-load.conf
22.6.6.4. 加速容器启动
以下 MachineConfig
CR 配置 OpenShift 核心进程和容器,以便在系统启动和关闭过程中使用所有可用的 CPU 内核。这会加快初始引导过程和重启过程中的系统恢复。
推荐的容器启动配置
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 04-accelerated-container-startup-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIFRlbXBvcmFyaWx5IHJlc2V0IHRoZSBjb3JlIHN5c3RlbSBwcm9jZXNzZXMncyBDUFUgYWZmaW5pdHkgdG8gYmUgdW5yZXN0cmljdGVkIHRvIGFjY2VsZXJhdGUgc3RhcnR1cCBhbmQgc2h1dGRvd24KIwojIFRoZSBkZWZhdWx0cyBiZWxvdyBjYW4gYmUgb3ZlcnJpZGRlbiB2aWEgZW52aXJvbm1lbnQgdmFyaWFibGVzCiMKCiMgVGhlIGRlZmF1bHQgc2V0IG9mIGNyaXRpY2FsIHByb2Nlc3NlcyB3aG9zZSBhZmZpbml0eSBzaG91bGQgYmUgdGVtcG9yYXJpbHkgdW5ib3VuZDoKQ1JJVElDQUxfUFJPQ0VTU0VTPSR7Q1JJVElDQUxfUFJPQ0VTU0VTOi0ic3lzdGVtZCBvdnMgY3JpbyBrdWJlbGV0IE5ldHdvcmtNYW5hZ2VyIGNvbm1vbiBkYnVzIn0KCiMgRGVmYXVsdCB3YWl0IHRpbWUgaXMgNjAwcyA9IDEwbToKTUFYSU1VTV9XQUlUX1RJTUU9JHtNQVhJTVVNX1dBSVRfVElNRTotNjAwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSB0aHJlc2hvbGQgPSAyJQojIEFsbG93ZWQgdmFsdWVzOgojICA0ICAtIGFic29sdXRlIHBvZCBjb3VudCAoKy8tKQojICA0JSAtIHBlcmNlbnQgY2hhbmdlICgrLy0pCiMgIC0xIC0gZGlzYWJsZSB0aGUgc3RlYWR5LXN0YXRlIGNoZWNrClNURUFEWV9TVEFURV9USFJFU0hPTEQ9JHtTVEVBRFlfU1RBVEVfVEhSRVNIT0xEOi0yJX0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgd2luZG93ID0gNjBzCiMgSWYgdGhlIHJ1bm5pbmcgcG9kIGNvdW50IHN0YXlzIHdpdGhpbiB0aGUgZ2l2ZW4gdGhyZXNob2xkIGZvciB0aGlzIHRpbWUKIyBwZXJpb2QsIHJldHVybiBDUFUgdXRpbGl6YXRpb24gdG8gbm9ybWFsIGJlZm9yZSB0aGUgbWF4aW11bSB3YWl0IHRpbWUgaGFzCiMgZXhwaXJlcwpTVEVBRFlfU1RBVEVfV0lORE9XPSR7U1RFQURZX1NUQVRFX1dJTkRPVzotNjB9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIGFsbG93cyBhbnkgcG9kIGNvdW50IHRvIGJlICJzdGVhZHkgc3RhdGUiCiMgSW5jcmVhc2luZyB0aGlzIHdpbGwgc2tpcCBhbnkgc3RlYWR5LXN0YXRlIGNoZWNrcyB1bnRpbCB0aGUgY291bnQgcmlzZXMgYWJvdmUKIyB0aGlzIG51bWJlciB0byBhdm9pZCBmYWxzZSBwb3NpdGl2ZXMgaWYgdGhlcmUgYXJlIHNvbWUgcGVyaW9kcyB3aGVyZSB0aGUKIyBjb3VudCBkb2Vzbid0IGluY3JlYXNlIGJ1dCB3ZSBrbm93IHdlIGNhbid0IGJlIGF0IHN0ZWFkeS1zdGF0ZSB5ZXQuClNURUFEWV9TVEFURV9NSU5JTVVNPSR7U1RFQURZX1NUQVRFX01JTklNVU06LTB9CgojIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjCgpLVUJFTEVUX0NQVV9TVEFURT0vdmFyL2xpYi9rdWJlbGV0L2NwdV9tYW5hZ2VyX3N0YXRlCkZVTExfQ1BVX1NUQVRFPS9zeXMvZnMvY2dyb3VwL2NwdXNldC9jcHVzZXQuY3B1cwp1bnJlc3RyaWN0ZWRDcHVzZXQoKSB7CiAgbG9jYWwgY3B1cwogIGlmIFtbIC1lICRLVUJFTEVUX0NQVV9TVEFURSBdXTsgdGhlbgogICAgICBjcHVzPSQoanEgLXIgJy5kZWZhdWx0Q3B1U2V0JyA8JEtVQkVMRVRfQ1BVX1NUQVRFKQogIGZpCiAgaWYgW1sgLXogJGNwdXMgXV07IHRoZW4KICAgICMgZmFsbCBiYWNrIHRvIHVzaW5nIGFsbCBjcHVzIGlmIHRoZSBrdWJlbGV0IHN0YXRlIGlzIG5vdCBjb25maWd1cmVkIHlldAogICAgW1sgLWUgJEZVTExfQ1BVX1NUQVRFIF1dIHx8IHJldHVybiAxCiAgICBjcHVzPSQoPCRGVUxMX0NQVV9TVEFURSkKICBmaQogIGVjaG8gJGNwdXMKfQoKcmVzdHJpY3RlZENwdXNldCgpIHsKICBmb3IgYXJnIGluICQoPC9wcm9jL2NtZGxpbmUpOyBkbwogICAgaWYgW1sgJGFyZyA9fiBec3lzdGVtZC5jcHVfYWZmaW5pdHk9IF1dOyB0aGVuCiAgICAgIGVjaG8gJHthcmcjKj19CiAgICAgIHJldHVybiAwCiAgICBmaQogIGRvbmUKICByZXR1cm4gMQp9CgpnZXRDUFVDb3VudCAoKSB7CiAgbG9jYWwgY3B1c2V0PSIkMSIKICBsb2NhbCBjcHVsaXN0PSgpCiAgbG9jYWwgY3B1cz0wCiAgbG9jYWwgbWluY3B1cz0yCgogIGlmIFtbIC16ICRjcHVzZXQgfHwgJGNwdXNldCA9fiBbXjAtOSwtXSBdXTsgdGhlbgogICAgZWNobyAkbWluY3B1cwogICAgcmV0dXJuIDEKICBmaQoKICBJRlM9JywnIHJlYWQgLXJhIGNwdWxpc3QgPDw8ICRjcHVzZXQKCiAgZm9yIGVsbSBpbiAiJHtjcHVsaXN0W0BdfSI7IGRvCiAgICBpZiBbWyAkZWxtID1+IF5bMC05XSskIF1dOyB0aGVuCiAgICAgICgoIGNwdXMrKyApKQogICAgZWxpZiBbWyAkZWxtID1+IF5bMC05XSstWzAtOV0rJCBdXTsgdGhlbgogICAgICBsb2NhbCBsb3c9MCBoaWdoPTAKICAgICAgSUZTPSctJyByZWFkIGxvdyBoaWdoIDw8PCAkZWxtCiAgICAgICgoIGNwdXMgKz0gaGlnaCAtIGxvdyArIDEgKSkKICAgIGVsc2UKICAgICAgZWNobyAkbWluY3B1cwogICAgICByZXR1cm4gMQogICAgZmkKICBkb25lCgogICMgUmV0dXJuIGEgbWluaW11bSBvZiAyIGNwdXMKICBlY2hvICQoKCBjcHVzID4gJG1pbmNwdXMgPyBjcHVzIDogJG1pbmNwdXMgKSkKICByZXR1cm4gMAp9CgpyZXNldE9WU3RocmVhZHMgKCkgewogIGxvY2FsIGNwdWNvdW50PSIkMSIKICBsb2NhbCBjdXJSZXZhbGlkYXRvcnM9MAogIGxvY2FsIGN1ckhhbmRsZXJzPTAKICBsb2NhbCBkZXNpcmVkUmV2YWxpZGF0b3JzPTAKICBsb2NhbCBkZXNpcmVkSGFuZGxlcnM9MAogIGxvY2FsIHJjPTAKCiAgY3VyUmV2YWxpZGF0b3JzPSQocHMgLVRlbyBwaWQsdGlkLGNvbW0sY21kIHwgZ3JlcCAtZSByZXZhbGlkYXRvciB8IGdyZXAgLWMgb3ZzLXZzd2l0Y2hkKQogIGN1ckhhbmRsZXJzPSQocHMgLVRlbyBwaWQsdGlkLGNvbW0sY21kIHwgZ3JlcCAtZSBoYW5kbGVyIHwgZ3JlcCAtYyBvdnMtdnN3aXRjaGQpCgogICMgQ2FsY3VsYXRlIHRoZSBkZXNpcmVkIG51bWJlciBvZiB0aHJlYWRzIHRoZSBzYW1lIHdheSBPVlMgZG9lcy4KICAjIE9WUyB3aWxsIHNldCB0aGVzZSB0aHJlYWQgY291bnQgYXMgYSBvbmUgc2hvdCBwcm9jZXNzIG9uIHN0YXJ0dXAsIHNvIHdlCiAgIyBoYXZlIHRvIGFkanVzdCB1cCBvciBkb3duIGR1cmluZyB0aGUgYm9vdCB1cCBwcm9jZXNzLiBUaGUgZGVzaXJlZCBvdXRjb21lIGlzCiAgIyB0byBub3QgcmVzdHJpY3QgdGhlIG51bWJlciBvZiB0aHJlYWQgYXQgc3RhcnR1cCB1bnRpbCB3ZSByZWFjaCBhIHN0ZWFkeQogICMgc3RhdGUuICBBdCB3aGljaCBwb2ludCB3ZSBuZWVkIHRvIHJlc2V0IHRoZXNlIGJhc2VkIG9uIG91ciByZXN0cmljdGVkICBzZXQKICAjIG9mIGNvcmVzLgogICMgU2VlIE9WUyBmdW5jdGlvbiB0aGF0IGNhbGN1bGF0ZXMgdGhlc2UgdGhyZWFkIGNvdW50czoKICAjIGh0dHBzOi8vZ2l0aHViLmNvbS9vcGVudnN3aXRjaC9vdnMvYmxvYi9tYXN0ZXIvb2Zwcm90by9vZnByb3RvLWRwaWYtdXBjYWxsLmMjTDYzNQogICgoIGRlc2lyZWRSZXZhbGlkYXRvcnM9JGNwdWNvdW50IC8gNCArIDEgKSkKICAoKCBkZXNpcmVkSGFuZGxlcnM9JGNwdWNvdW50IC0gJGRlc2lyZWRSZXZhbGlkYXRvcnMgKSkKCgogIGlmIFtbICRjdXJSZXZhbGlkYXRvcnMgLW5lICRkZXNpcmVkUmV2YWxpZGF0b3JzIHx8ICRjdXJIYW5kbGVycyAtbmUgJGRlc2lyZWRIYW5kbGVycyBdXTsgdGhlbgoKICAgIGxvZ2dlciAiUmVjb3Zlcnk6IFJlLXNldHRpbmcgT1ZTIHJldmFsaWRhdG9yIHRocmVhZHM6ICR7Y3VyUmV2YWxpZGF0b3JzfSAtPiAke2Rlc2lyZWRSZXZhbGlkYXRvcnN9IgogICAgbG9nZ2VyICJSZWNvdmVyeTogUmUtc2V0dGluZyBPVlMgaGFuZGxlciB0aHJlYWRzOiAke2N1ckhhbmRsZXJzfSAtPiAke2Rlc2lyZWRIYW5kbGVyc30iCgogICAgb3ZzLXZzY3RsIHNldCBcCiAgICAgIE9wZW5fdlN3aXRjaCAuIFwKICAgICAgb3RoZXItY29uZmlnOm4taGFuZGxlci10aHJlYWRzPSR7ZGVzaXJlZEhhbmRsZXJzfSBcCiAgICAgIG90aGVyLWNvbmZpZzpuLXJldmFsaWRhdG9yLXRocmVhZHM9JHtkZXNpcmVkUmV2YWxpZGF0b3JzfQogICAgcmM9JD8KICBmaQoKICByZXR1cm4gJHJjCn0KCnJlc2V0QWZmaW5pdHkoKSB7CiAgbG9jYWwgY3B1c2V0PSIkMSIKICBsb2NhbCBmYWlsY291bnQ9MAogIGxvY2FsIHN1Y2Nlc3Njb3VudD0wCiAgbG9nZ2VyICJSZWNvdmVyeTogU2V0dGluZyBDUFUgYWZmaW5pdHkgZm9yIGNyaXRpY2FsIHByb2Nlc3NlcyBcIiRDUklUSUNBTF9QUk9DRVNTRVNcIiB0byAkY3B1c2V0IgogIGZvciBwcm9jIGluICRDUklUSUNBTF9QUk9DRVNTRVM7IGRvCiAgICBsb2NhbCBwaWRzPSIkKHBncmVwICRwcm9jKSIKICAgIGZvciBwaWQgaW4gJHBpZHM7IGRvCiAgICAgIGxvY2FsIHRhc2tzZXRPdXRwdXQKICAgICAgdGFza3NldE91dHB1dD0iJCh0YXNrc2V0IC1hcGMgIiRjcHVzZXQiICRwaWQgMj4mMSkiCiAgICAgIGlmIFtbICQ/IC1uZSAwIF1dOyB0aGVuCiAgICAgICAgZWNobyAiRVJST1I6ICR0YXNrc2V0T3V0cHV0IgogICAgICAgICgoZmFpbGNvdW50KyspKQogICAgICBlbHNlCiAgICAgICAgKChzdWNjZXNzY291bnQrKykpCiAgICAgIGZpCiAgICBkb25lCiAgZG9uZQoKICByZXNldE9WU3RocmVhZHMgIiQoZ2V0Q1BVQ291bnQgJHtjcHVzZXR9KSIKICBpZiBbWyAkPyAtbmUgMCBdXTsgdGhlbgogICAgKChmYWlsY291bnQrKykpCiAgZWxzZQogICAgKChzdWNjZXNzY291bnQrKykpCiAgZmkKCiAgbG9nZ2VyICJSZWNvdmVyeTogUmUtYWZmaW5lZCAkc3VjY2Vzc2NvdW50IHBpZHMgc3VjY2Vzc2Z1bGx5IgogIGlmIFtbICRmYWlsY291bnQgLWd0IDAgXV07IHRoZW4KICAgIGxvZ2dlciAiUmVjb3Zlcnk6IEZhaWxlZCB0byByZS1hZmZpbmUgJGZhaWxjb3VudCBwcm9jZXNzZXMiCiAgICByZXR1cm4gMQogIGZpCn0KCnNldFVucmVzdHJpY3RlZCgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBTZXR0aW5nIGNyaXRpY2FsIHN5c3RlbSBwcm9jZXNzZXMgdG8gaGF2ZSB1bnJlc3RyaWN0ZWQgQ1BVIGFjY2VzcyIKICByZXNldEFmZmluaXR5ICIkKHVucmVzdHJpY3RlZENwdXNldCkiCn0KCnNldFJlc3RyaWN0ZWQoKSB7CiAgbG9nZ2VyICJSZWNvdmVyeTogUmVzZXR0aW5nIGNyaXRpY2FsIHN5c3RlbSBwcm9jZXNzZXMgYmFjayB0byBub3JtYWxseSByZXN0cmljdGVkIGFjY2VzcyIKICByZXNldEFmZmluaXR5ICIkKHJlc3RyaWN0ZWRDcHVzZXQpIgp9CgpjdXJyZW50QWZmaW5pdHkoKSB7CiAgbG9jYWwgcGlkPSIkMSIKICB0YXNrc2V0IC1wYyAkcGlkIHwgYXdrIC1GJzogJyAne3ByaW50ICQyfScKfQoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggJGRlbHRhICogMTAwKSAvIGxhc3QgKSkKICBmaQogIGVjaG8gLW4gImxhc3Q6JGxhc3QgY3VycmVudDokY3VycmVudCBkZWx0YTokZGVsdGEgcGNoYW5nZToke3BjaGFuZ2V9JTogIgogIGxvY2FsIGFic29sdXRlIGxpbWl0CiAgY2FzZSAkdGhyZXNob2xkIGluCiAgICAqJSkKICAgICAgYWJzb2x1dGU9JHtwY2hhbmdlIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR7dGhyZXNob2xkJSUlfQogICAgICA7OwogICAgKikKICAgICAgYWJzb2x1dGU9JHtkZWx0YSMjLX0gIyBhYnNvbHV0ZSB2YWx1ZQogICAgICBsaW1pdD0kdGhyZXNob2xkCiAgICAgIDs7CiAgZXNhYwogIGlmIFtbICRhYnNvbHV0ZSAtbGUgJGxpbWl0IF1dOyB0aGVuCiAgICBlY2hvICJ3aXRoaW4gKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDAKICBlbHNlCiAgICBlY2hvICJvdXRzaWRlICgrLy0pJHRocmVzaG9sZCIKICAgIHJldHVybiAxCiAgZmkKfQoKc3RlYWR5c3RhdGUoKSB7CiAgbG9jYWwgbGFzdD0kMSBjdXJyZW50PSQyCiAgaWYgW1sgJGxhc3QgLWx0ICRTVEVBRFlfU1RBVEVfTUlOSU1VTSBdXTsgdGhlbgogICAgZWNobyAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IFdhaXRpbmcgdG8gcmVhY2ggJFNURUFEWV9TVEFURV9NSU5JTVVNIGJlZm9yZSBjaGVja2luZyBmb3Igc3RlYWR5LXN0YXRlIgogICAgcmV0dXJuIDEKICBmaQogIHdpdGhpbiAkbGFzdCAkY3VycmVudCAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRAp9Cgp3YWl0Rm9yUmVhZHkoKSB7CiAgbG9nZ2VyICJSZWNvdmVyeTogV2FpdGluZyAke01BWElNVU1fV0FJVF9USU1FfXMgZm9yIHRoZSBpbml0aWFsaXphdGlvbiB0byBjb21wbGV0ZSIKICBsb2NhbCBsYXN0U3lzdGVtZENwdXNldD0iJChjdXJyZW50QWZmaW5pdHkgMSkiCiAgbG9jYWwgbGFzdERlc2lyZWRDcHVzZXQ9IiQodW5yZXN0cmljdGVkQ3B1c2V0KSIKICBsb2NhbCB0PTAgcz0xMAogIGxvY2FsIGxhc3RDY291bnQ9MCBjY291bnQ9MCBzdGVhZHlTdGF0ZVRpbWU9MAogIHdoaWxlIFtbICR0IC1sdCAkTUFYSU1VTV9XQUlUX1RJTUUgXV07IGRvCiAgICBzbGVlcCAkcwogICAgKCh0ICs9IHMpKQogICAgIyBSZS1jaGVjayB0aGUgY3VycmVudCBhZmZpbml0eSBvZiBzeXN0ZW1kLCBpbiBjYXNlIHNvbWUgb3RoZXIgcHJvY2VzcyBoYXMgY2hhbmdlZCBpdAogICAgbG9jYWwgc3lzdGVtZENwdXNldD0iJChjdXJyZW50QWZmaW5pdHkgMSkiCiAgICAjIFJlLWNoZWNrIHRoZSB1bnJlc3RyaWN0ZWQgQ3B1c2V0LCBhcyB0aGUgYWxsb3dlZCBzZXQgb2YgdW5yZXNlcnZlZCBjb3JlcyBtYXkgY2hhbmdlIGFzIHBvZHMgYXJlIGFzc2lnbmVkIHRvIGNvcmVzCiAgICBsb2NhbCBkZXNpcmVkQ3B1c2V0PSIkKHVucmVzdHJpY3RlZENwdXNldCkiCiAgICBpZiBbWyAkc3lzdGVtZENwdXNldCAhPSAkbGFzdFN5c3RlbWRDcHVzZXQgfHwgJGxhc3REZXNpcmVkQ3B1c2V0ICE9ICRkZXNpcmVkQ3B1c2V0IF1dOyB0aGVuCiAgICAgIHJlc2V0QWZmaW5pdHkgIiRkZXNpcmVkQ3B1c2V0IgogICAgICBsYXN0U3lzdGVtZENwdXNldD0iJChjdXJyZW50QWZmaW5pdHkgMSkiCiAgICAgIGxhc3REZXNpcmVkQ3B1c2V0PSIkZGVzaXJlZENwdXNldCIKICAgIGZpCgogICAgIyBEZXRlY3Qgc3RlYWR5LXN0YXRlIHBvZCBjb3VudAogICAgY2NvdW50PSQoY3JpY3RsIHBzIHwgd2MgLWwpCiAgICBpZiBzdGVhZHlzdGF0ZSAkbGFzdENjb3VudCAkY2NvdW50OyB0aGVuCiAgICAgICgoc3RlYWR5U3RhdGVUaW1lICs9IHMpKQogICAgICBlY2hvICJTdGVhZHktc3RhdGUgZm9yICR7c3RlYWR5U3RhdGVUaW1lfXMvJHtTVEVBRFlfU1RBVEVfV0lORE9XfXMiCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWdlICRTVEVBRFlfU1RBVEVfV0lORE9XIF1dOyB0aGVuCiAgICAgICAgbG9nZ2VyICJSZWNvdmVyeTogU3RlYWR5LXN0YXRlICgrLy0gJFNURUFEWV9TVEFURV9USFJFU0hPTEQpIGZvciAke1NURUFEWV9TVEFURV9XSU5ET1d9czogRG9uZSIKICAgICAgICByZXR1cm4gMAogICAgICBmaQogICAgZWxzZQogICAgICBpZiBbWyAkc3RlYWR5U3RhdGVUaW1lIC1ndCAwIF1dOyB0aGVuCiAgICAgICAgZWNobyAiUmVzZXR0aW5nIHN0ZWFkeS1zdGF0ZSB0aW1lciIKICAgICAgICBzdGVhZHlTdGF0ZVRpbWU9MAogICAgICBmaQogICAgZmkKICAgIGxhc3RDY291bnQ9JGNjb3VudAogIGRvbmUKICBsb2dnZXIgIlJlY292ZXJ5OiBSZWNvdmVyeSBDb21wbGV0ZSBUaW1lb3V0Igp9CgptYWluKCkgewogIGlmICEgdW5yZXN0cmljdGVkQ3B1c2V0ID4mL2Rldi9udWxsOyB0aGVuCiAgICBsb2dnZXIgIlJlY292ZXJ5OiBObyB1bnJlc3RyaWN0ZWQgQ3B1c2V0IGNvdWxkIGJlIGRldGVjdGVkIgogICAgcmV0dXJuIDEKICBmaQoKICBpZiAhIHJlc3RyaWN0ZWRDcHVzZXQgPiYvZGV2L251bGw7IHRoZW4KICAgIGxvZ2dlciAiUmVjb3Zlcnk6IE5vIHJlc3RyaWN0ZWQgQ3B1c2V0IGhhcyBiZWVuIGNvbmZpZ3VyZWQuICBXZSBhcmUgYWxyZWFkeSBydW5uaW5nIHVucmVzdHJpY3RlZC4iCiAgICByZXR1cm4gMAogIGZpCgogICMgRW5zdXJlIHdlIHJlc2V0IHRoZSBDUFUgYWZmaW5pdHkgd2hlbiB3ZSBleGl0IHRoaXMgc2NyaXB0IGZvciBhbnkgcmVhc29uCiAgIyBUaGlzIHdheSBlaXRoZXIgYWZ0ZXIgdGhlIHRpbWVyIGV4cGlyZXMgb3IgYWZ0ZXIgdGhlIHByb2Nlc3MgaXMgaW50ZXJydXB0ZWQKICAjIHZpYSBeQyBvciBTSUdURVJNLCB3ZSByZXR1cm4gdGhpbmdzIGJhY2sgdG8gdGhlIHdheSB0aGV5IHNob3VsZCBiZS4KICB0cmFwIHNldFJlc3RyaWN0ZWQgRVhJVAoKICBsb2dnZXIgIlJlY292ZXJ5OiBSZWNvdmVyeSBNb2RlIFN0YXJ0aW5nIgogIHNldFVucmVzdHJpY3RlZAogIHdhaXRGb3JSZWFkeQp9CgppZiBbWyAiJHtCQVNIX1NPVVJDRVswXX0iID0gIiR7MH0iIF1dOyB0aGVuCiAgbWFpbiAiJHtAfSIKICBleGl0ICQ/CmZpCg== mode: 493 path: /usr/local/bin/accelerated-container-startup.sh systemd: units: - contents: | [Unit] Description=Unlocks more CPUs for critical system processes during container startup [Service] Type=simple ExecStart=/usr/local/bin/accelerated-container-startup.sh # Maximum wait time is 600s = 10m: Environment=MAXIMUM_WAIT_TIME=600 # Steady-state threshold = 2% # Allowed values: # 4 - absolute pod count (+/-) # 4% - percent change (+/-) # -1 - disable the steady-state check # Note: '%' must be escaped as '%%' in systemd unit files Environment=STEADY_STATE_THRESHOLD=2%% # Steady-state window = 120s # If the running pod count stays within the given threshold for this time # period, return CPU utilization to normal before the maximum wait time has # expires Environment=STEADY_STATE_WINDOW=120 # Steady-state minimum = 40 # Increasing this will skip any steady-state checks until the count rises above # this number to avoid false positives if there are some periods where the # count doesn't increase but we know we can't be at steady-state yet. Environment=STEADY_STATE_MINIMUM=40 [Install] WantedBy=multi-user.target enabled: true name: accelerated-container-startup.service - contents: | [Unit] Description=Unlocks more CPUs for critical system processes during container shutdown DefaultDependencies=no [Service] Type=simple ExecStart=/usr/local/bin/accelerated-container-startup.sh # Maximum wait time is 600s = 10m: Environment=MAXIMUM_WAIT_TIME=600 # Steady-state threshold # Allowed values: # 4 - absolute pod count (+/-) # 4% - percent change (+/-) # -1 - disable the steady-state check # Note: '%' must be escaped as '%%' in systemd unit files Environment=STEADY_STATE_THRESHOLD=-1 # Steady-state window = 60s # If the running pod count stays within the given threshold for this time # period, return CPU utilization to normal before the maximum wait time has # expires Environment=STEADY_STATE_WINDOW=60 [Install] WantedBy=shutdown.target reboot.target halt.target enabled: true name: accelerated-container-shutdown.service
22.6.6.5. 使用 kdump 自动内核崩溃转储
kdump
是一个 Linux 内核功能,可在内核崩溃时创建内核崩溃转储。kdump
使用以下 MachineConfig
CR 启用:
推荐的 kdump 配置
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 06-kdump-enable-master spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump.service kernelArguments: - crashkernel=512M
22.6.7. 推荐的安装后集群配置
当集群安装完成后,ZTP 管道会应用运行 DU 工作负载所需的以下自定义资源 (CR)。
在 {ztp} v4.10 及更早版本中,您可以使用 MachineConfig
CR 配置 UEFI 安全引导。{ztp} v4.11 及之后的版本不再需要。在 v4.11 中,您可以通过更新用于安装集群的 SiteConfig
CR 中的 spec.clusters.nodes.bootMode
字段来为单节点 OpenShift 集群配置 UEFI 安全引导。如需更多信息,请参阅使用 SiteConfig 和 {ztp} 部署受管集群。
22.6.7.1. Operator 命名空间和 Operator 组
运行 DU 工作负载的单节点 OpenShift 集群需要以下 OperatorGroup
和 Namespace
自定义资源 (CR):
- Local Storage Operator
- Logging Operator
- PTP Operator
- Cluster Network Operator
以下 YAML 总结了这些 CR:
推荐的 Operator 命名空间和 OperatorGroup 配置
apiVersion: v1 kind: Namespace metadata: annotations: workload.openshift.io/allowed: management name: openshift-local-storage --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-local-storage namespace: openshift-local-storage spec: targetNamespaces: - openshift-local-storage --- apiVersion: v1 kind: Namespace metadata: annotations: workload.openshift.io/allowed: management name: openshift-logging --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: cluster-logging namespace: openshift-logging spec: targetNamespaces: - openshift-logging --- apiVersion: v1 kind: Namespace metadata: annotations: workload.openshift.io/allowed: management labels: openshift.io/cluster-monitoring: "true" name: openshift-ptp --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ptp-operators namespace: openshift-ptp spec: targetNamespaces: - openshift-ptp --- apiVersion: v1 kind: Namespace metadata: annotations: workload.openshift.io/allowed: management name: openshift-sriov-network-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator
22.6.7.2. Operator 订阅
运行 DU 工作负载的单节点 OpenShift 集群需要以下 Subscription
CR。订阅提供下载以下 Operator 的位置:
- Local Storage Operator
- Logging Operator
- PTP Operator
- Cluster Network Operator
推荐的 Operator 订阅
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging spec: channel: "stable" 1 name: cluster-logging source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual 2 --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-operator namespace: openshift-local-storage spec: channel: "stable" installPlanApproval: Automatic name: local-storage-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ptp-operator-subscription namespace: openshift-ptp spec: channel: "stable" name: ptp-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator spec: channel: "stable" name: sriov-network-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual
22.6.7.3. 集群日志记录和日志转发
运行 DU 工作负载的单节点 OpenShift 集群需要日志记录和日志转发以进行调试。以下示例 YAML 演示了所需的 ClusterLogging
和 ClusterLogForwarder
CR。
推荐的集群日志记录和日志转发配置
apiVersion: logging.openshift.io/v1 kind: ClusterLogging 1 metadata: name: instance namespace: openshift-logging spec: collection: logs: fluentd: {} type: fluentd curation: type: "curator" curator: schedule: "30 3 * * *" managementState: Managed --- apiVersion: logging.openshift.io/v1 kind: ClusterLogForwarder 2 metadata: name: instance namespace: openshift-logging spec: inputs: - infrastructure: {} name: infra-logs outputs: - name: kafka-open type: kafka url: tcp://10.46.55.190:9092/test 3 pipelines: - inputRefs: - audit name: audit-logs outputRefs: - kafka-open - inputRefs: - infrastructure name: infrastructure-logs outputRefs: - kafka-open
22.6.7.4. 性能配置集
运行 DU 工作负载的单节点 OpenShift 集群需要 Node Tuning Operator 性能配置集才能使用实时主机功能和服务。
在早期版本的 OpenShift Container Platform 中,Performance Addon Operator 用来实现自动性能优化,以便为 OpenShift 应用程序实现低延迟性能。在 OpenShift Container Platform 4.11 及更新的版本中,这个功能是 Node Tuning Operator 的一部分。
以下示例 PerformanceProfile
CR 演示了所需的集群配置。
推荐的性能配置集配置
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: openshift-node-performance-profile 1 spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime" 2 cpu: isolated: 2-51,54-103 3 reserved: 0-1,52-53 4 hugepages: defaultHugepagesSize: 1G pages: - count: 32 5 size: 1G 6 node: 0 7 machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: "restricted" realTimeKernel: enabled: true 8
- 1
- 确保
name
的值与Tuned PerformancePatch.yaml
的spec.profile.data
字段中指定的值和validatorCR/informDuValidator.yaml
的status.configuration.source.name
字段匹配。 - 2
- 为集群主机配置 UEFI 安全引导。
- 3
- 设置隔离的 CPU。确保所有 Hyper-Threading 对都匹配。重要
保留和隔离的 CPU 池不得重叠,并且必须一起跨越所有可用的内核。未考虑导致系统中未定义的 CPU 内核。
- 4
- 设置保留的 CPU。启用工作负载分区时,系统进程、内核线程和系统容器线程仅限于这些 CPU。所有不是隔离的 CPU 都应保留。
- 5
- 设置巨页数量。
- 6
- 设置巨页大小。
- 7
- 将
node
设置为分配了hugepages
的 NUMA 节点。 - 8
- 将
enabled
设置为true
以安装实时 Linux 内核。
22.6.7.5. PTP
单节点 OpenShift 集群使用 Precision Time Protocol (PTP) 进行网络时间同步。以下示例 PtpConfig
CR 演示了所需的 PTP slave 配置。
推荐的 PTP 配置
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
name: du-ptp-slave
namespace: openshift-ptp
spec:
profile:
- interface: ens5f0 1
name: slave
phc2sysOpts: -a -r -n 24
ptp4lConf: |
[global]
#
# Default Data Set
#
twoStepFlag 1
slaveOnly 0
priority1 128
priority2 128
domainNumber 24
#utc_offset 37
clockClass 248
clockAccuracy 0xFE
offsetScaledLogVariance 0xFFFF
free_running 0
freq_est_interval 1
dscp_event 0
dscp_general 0
dataset_comparison ieee1588
G.8275.defaultDS.localPriority 128
#
# Port Data Set
#
logAnnounceInterval -3
logSyncInterval -4
logMinDelayReqInterval -4
logMinPdelayReqInterval -4
announceReceiptTimeout 3
syncReceiptTimeout 0
delayAsymmetry 0
fault_reset_interval 4
neighborPropDelayThresh 20000000
masterOnly 0
G.8275.portDS.localPriority 128
#
# Run time options
#
assume_two_step 0
logging_level 6
path_trace_enabled 0
follow_up_info 0
hybrid_e2e 0
inhibit_multicast_service 0
net_sync_monitor 0
tc_spanning_tree 0
tx_timestamp_timeout 1
unicast_listen 0
unicast_master_table 0
unicast_req_duration 3600
use_syslog 1
verbose 0
summary_interval 0
kernel_leap 1
check_fup_sync 0
#
# Servo Options
#
pi_proportional_const 0.0
pi_integral_const 0.0
pi_proportional_scale 0.0
pi_proportional_exponent -0.3
pi_proportional_norm_max 0.7
pi_integral_scale 0.0
pi_integral_exponent 0.4
pi_integral_norm_max 0.3
step_threshold 2.0
first_step_threshold 0.00002
max_frequency 900000000
clock_servo pi
sanity_freq_limit 200000000
ntpshm_segment 0
#
# Transport options
#
transportSpecific 0x0
ptp_dst_mac 01:1B:19:00:00:00
p2p_dst_mac 01:80:C2:00:00:0E
udp_ttl 1
udp6_scope 0x0E
uds_address /var/run/ptp4l
#
# Default interface options
#
clock_type OC
network_transport L2
delay_mechanism E2E
time_stamping hardware
tsproc_mode filter
delay_filter moving_median
delay_filter_length 10
egressLatency 0
ingressLatency 0
boundary_clock_jbod 0
#
# Clock description
#
productDescription ;;
revisionData ;;
manufacturerIdentity 00:00:00
userDescription ;
timeSource 0xA0
ptp4lOpts: -2 -s --summary_interval -4
recommend:
- match:
- nodeLabel: node-role.kubernetes.io/master
priority: 4
profile: slave
- 1
- 设置用于接收 PTP 时钟信号的接口。
22.6.7.6. 扩展的 Tuned 配置集
运行 DU 工作负载的单节点 OpenShift 集群需要额外的高性能工作负载所需的性能调优配置。以下 Tuned
CR 示例扩展了 Tuned
配置集:
推荐的扩展 Tuned 配置集配置
apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-openshift-node-performance-profile [bootloader] cmdline_crash=nohz_full=2-51,54-103 [sysctl] kernel.timer_migration=1 [scheduler] group.ice-ptp=0:f:10:*:ice-ptp.* [service] service.stalld=start,enable service.chronyd=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: master priority: 19 profile: performance-patch
22.6.7.7. SR-IOV
单根 I/O 虚拟化 (SR-IOV) 通常用于启用前端和中间网络。以下 YAML 示例为单节点 OpenShift 集群配置 SR-IOV。
推荐的 SR-IOV 配置
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: node-role.kubernetes.io/master: "" disableDrain: true enableInjector: true enableOperatorWebhook: true --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-nw-du-mh namespace: openshift-sriov-network-operator spec: networkNamespace: openshift-sriov-network-operator resourceName: du_mh vlan: 150 1 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-nnp-du-mh namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci 2 isRdma: false nicSelector: pfNames: - ens7f0 3 nodeSelector: node-role.kubernetes.io/master: "" numVfs: 8 4 priority: 10 resourceName: du_mh --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-nw-du-fh namespace: openshift-sriov-network-operator spec: networkNamespace: openshift-sriov-network-operator resourceName: du_fh vlan: 140 5 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-nnp-du-fh namespace: openshift-sriov-network-operator spec: deviceType: netdevice 6 isRdma: true nicSelector: pfNames: - ens5f0 7 nodeSelector: node-role.kubernetes.io/master: "" numVfs: 8 8 priority: 10 resourceName: du_fh
22.6.7.8. Console Operator
console-operator 在集群中安装并维护 web 控制台。当节点被集中管理时,不需要 Operator,并为应用程序工作负载腾出空间。以下 Console
自定义资源 (CR) 示例禁用控制台。
推荐的控制台配置
apiVersion: operator.openshift.io/v1 kind: Console metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "false" include.release.openshift.io/self-managed-high-availability: "false" include.release.openshift.io/single-node-developer: "false" release.openshift.io/create-only: "true" name: cluster spec: logLevel: Normal managementState: Removed operatorLogLevel: Normal
22.6.7.9. Alertmanager
运行 DU 工作负载的单节点 OpenShift 集群需要减少 OpenShift Container Platform 监控组件所消耗的 CPU 资源。以下 ConfigMap
自定义资源(CR)禁用 Alertmanager。
推荐的集群监控配置
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: enabled: false prometheusK8s: retention: 24h
22.6.7.10. Operator Lifecycle Manager
运行分布式单元工作负载的单节点 OpenShift 集群需要对 CPU 资源进行一致的访问。Operator Lifecycle Manager (OLM) 会定期从 Operator 收集性能数据,从而增加 CPU 利用率。以下 ConfigMap
自定义资源 (CR) 禁用 OLM 的 Operator 性能数据收集。
推荐的集群 OLM 配置 (ReduceOLMFootprint.yaml
)
apiVersion: v1 kind: ConfigMap metadata: name: collect-profiles-config namespace: openshift-operator-lifecycle-manager data: pprof-config.yaml: | disabled: True
22.6.7.11. 网络诊断
运行 DU 工作负载的单节点 OpenShift 集群需要较少的 pod 网络连接检查,以减少这些 pod 创建的额外负载。以下自定义资源 (CR) 禁用这些检查。
推荐的网络诊断配置
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: disableNetworkDiagnostics: true
其他资源