Chapter 15. Workload partitioning
Workload partitioning separates compute node CPU resources into distinct CPU sets. The primary objective is to keep platform pods on the specified cores to avoid interrupting the CPUs the customer workloads are running on.
Workload partitioning isolates OpenShift Container Platform services, cluster management workloads, and infrastructure pods to run on a reserved set of CPUs. This ensures that the remaining CPUs in the cluster deployment are untouched and available exclusively for non-platform workloads. The minimum number of reserved CPUs required for the cluster management is four CPU Hyper-Threads (HTs).
In the context of enabling workload partitioning and managing CPU resources effectively, nodes that are not configured correctly will not be permitted to join the cluster through a node admission webhook. When the workload partitioning feature is enabled, the machine config pools for control plane and worker will be supplied with configurations for nodes to use. Adding new nodes to these pools will make sure they are correctly configured before joining the cluster.
Currently, nodes must have uniform configurations per machine config pool to ensure that correct CPU affinity is set across all nodes within that pool. After admission, nodes within the cluster identify themselves as supporting a new resource type called management.workload.openshift.io/cores
and accurately report their CPU capacity. Workload partitioning can be enabled during cluster installation only by adding the additional field cpuPartitioningMode
to the install-config.yaml
file.
When workload partitioning is enabled, the management.workload.openshift.io/cores
resource allows the scheduler to correctly assign pods based on the cpushares
capacity of the host, not just the default cpuset
. This ensures more precise allocation of resources for workload partitioning scenarios.
Workload partitioning ensures that CPU requests and limits specified in the pod’s configuration are respected. In OpenShift Container Platform 4.16 or later, accurate CPU usage limits are set for platform pods through CPU partitioning. As workload partitioning uses the custom resource type of management.workload.openshift.io/cores
, the values for requests and limits are the same due to a requirement by Kubernetes for extended resources. However, the annotations modified by workload partitioning correctly reflect the desired limits.
Extended resources cannot be overcommitted, so request and limit must be equal if both are present in a container spec.
15.1. Enabling workload partitioning
With workload partitioning, cluster management pods are annotated to correctly partition them into a specified CPU affinity. These pods operate normally within the minimum size CPU configuration specified by the reserved value in the Performance Profile. Additional Day 2 Operators that make use of workload partitioning should be taken into account when calculating how many reserved CPU cores should be set aside for the platform.
Workload partitioning isolates user workloads from platform workloads using standard Kubernetes scheduling capabilities.
You can enable workload partitioning during cluster installation only. You cannot disable workload partitioning postinstallation. However, you can change the CPU configuration for reserved
and isolated
CPUs postinstallation.
Use this procedure to enable workload partitioning cluster wide:
Procedure
In the
install-config.yaml
file, add the additional fieldcpuPartitioningMode
and set it toAllNodes
.apiVersion: v1 baseDomain: devcluster.openshift.com cpuPartitioningMode: AllNodes 1 compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: {} replicas: 3 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: {} replicas: 3
- 1
- Sets up a cluster for CPU partitioning at install time. The default value is
None
.
15.2. Performance profiles and workload partitioning
Applying a performance profile allows you to make use of the workload partitioning feature. An appropriately configured performance profile specifies the isolated
and reserved
CPUs. The recommended way to create a performance profile is to use the Performance Profile Creator (PPC) tool to create the performance profile.
Additional resources
15.3. Sample performance profile configuration
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: # if you change this name make sure the 'include' line in TunedPerformancePatch.yaml # matches this name: include=openshift-node-performance-${PerformanceProfile.metadata.name} # Also in file 'validatorCRs/informDuValidator.yaml': # name: 50-performance-${PerformanceProfile.metadata.name} name: openshift-node-performance-profile annotations: ran.openshift.io/reference-configuration: "ran-du.redhat.com" spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime" - "vfio_pci.enable_sriov=1" - "vfio_pci.disable_idle_d3=1" - "module_blacklist=irdma" cpu: isolated: $isolated reserved: $reserved hugepages: defaultHugepagesSize: $defaultHugepagesSize pages: - size: $size count: $count node: $node machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/$mcp: "" nodeSelector: node-role.kubernetes.io/$mcp: '' numa: topologyPolicy: "restricted" # To use the standard (non-realtime) kernel, set enabled to false realTimeKernel: enabled: true workloadHints: # WorkloadHints defines the set of upper level flags for different type of workloads. # See https://github.com/openshift/cluster-node-tuning-operator/blob/master/docs/performanceprofile/performance_profile.md#workloadhints # for detailed descriptions of each item. # The configuration below is set for a low latency, performance mode. realTime: true highPowerConsumption: false perPodPowerManagement: false
PerformanceProfile CR field | Description |
---|---|
|
Ensure that
|
|
|
| Set the isolated CPUs. Ensure all of the Hyper-Threading pairs match. Important The reserved and isolated CPU pools must not overlap and together must span all available cores. CPU cores that are not accounted for cause an undefined behaviour in the system. |
| Set the reserved CPUs. When workload partitioning is enabled, system processes, kernel threads, and system container threads are restricted to these CPUs. All CPUs that are not isolated should be reserved. |
|
|
|
Set |
|
Use |