Chapter 12. Scheduling NUMA-aware workloads
To deploy high performance workloads with optimal efficiency, use NUMA-aware scheduling. This feature aligns pods with the underlying hardware topology in your OpenShift Container Platform cluster, minimizing latency and maximizing resource utilization.
By using the NUMA Resources Operator, you can schedule high-performance workloads in the same NUMA zone. The Operator deploys a node resources exporting agent that reports on available cluster node NUMA resources, and a secondary scheduler that manages the workloads.
12.1. About NUMA Copy linkLink copied to clipboard!
To reduce latency in multiprocessor systems, Non-Uniform Memory Access (NUMA) architecture allows CPUs to access local memory faster than remote memory. This design optimizes performance by prioritizing memory resources that are physically closer to the processor.
A CPU with multiple memory controllers can use any available memory across CPU complexes, regardless of where the memory is located. However, this increased flexibility comes at the expense of performance.
NUMA resource topology refers to the physical locations of CPUs, memory, and PCI devices relative to each other in a NUMA zone. In a NUMA architecture, a NUMA zone is a group of CPUs that has its own processors and memory. Colocated resources are said to be in the same NUMA zone, and CPUs in a zone have faster access to the same local memory than CPUs outside of that zone.
A CPU processing a workload using memory that is outside its NUMA zone is slower than a workload processed in a single NUMA zone. For I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application.
Applications can achieve better performance by containing data and processing within the same NUMA zone. For high-performance workloads and applications, such as telecommunications workloads, the cluster must process pod workloads in a single NUMA zone so that the workload can operate to specification.
12.2. About NUMA-aware scheduling Copy linkLink copied to clipboard!
To process latency-sensitive or high-performance workloads efficiently, use NUMA-aware scheduling. This feature aligns cluster compute resources, such as CPUs, memory, and devices, in the same NUMA zone, optimizing resource efficiency and improving pod density per compute node.
By integrating the performance profile of the Node Tuning Operator with NUMA-aware scheduling, you can further configure CPU affinity to optimize performance for latency-sensitive workloads.
The default OpenShift Container Platform pod scheduler scheduling logic considers the available resources of the entire compute node, not individual NUMA zones. If the most restrictive resource alignment is requested in the kubelet topology manager, error conditions can occur when admitting the pod to a node.
Conversely, if the most restrictive resource alignment is not requested, the pod can be admitted to the node without proper resource alignment, leading to worse or unpredictable performance. For example, runaway pod creation with Topology Affinity Error statuses can occur when the pod scheduler makes suboptimal scheduling decisions for guaranteed pod workloads without knowing if the pod’s requested resources are available. Scheduling mismatch decisions can cause indefinite pod startup delays. Also, depending on the cluster state and resource allocation, poor pod scheduling decisions can cause extra load on the cluster because of failed startup attempts.
The NUMA Resources Operator deploys a custom NUMA resources secondary scheduler and other resources to mitigate against the shortcomings of the default OpenShift Container Platform pod scheduler. The following diagram provides a high-level overview of NUMA-aware pod scheduling.
Figure 12.1. NUMA-aware scheduling overview
- NodeResourceTopology API
-
The
NodeResourceTopologyAPI describes the available NUMA zone resources in each compute node. - NUMA-aware scheduler
-
The NUMA-aware secondary scheduler receives information about the available NUMA zones from the
NodeResourceTopologyAPI and schedules high-performance workloads on a node where it can be optimally processed. - Node topology exporter
-
The node topology exporter exposes the available NUMA zone resources for each compute node to the
NodeResourceTopologyAPI. The node topology exporter daemon tracks the resource allocation from the kubelet by using thePodResourcesAPI. - PodResources API
-
The
PodResourcesAPI is local to each node and exposes the resource topology and available resources to the kubelet.
The List endpoint of the PodResources API exposes exclusive CPUs allocated to a particular container. The API does not expose CPUs that belong to a shared pool.
The GetAllocatableResources endpoint exposes allocatable resources available on a node.
12.3. NUMA resource scheduling strategies Copy linkLink copied to clipboard!
To optimize the placement of high-performance workloads, the secondary scheduler uses NUMA-aware scoring strategies to select the most suitable compute nodes. This process assigns workloads based on resource availability while allowing local node managers to handle final resource pinning.
When scheduling high-performance workloads, the secondary scheduler determines which compute node is best suited for the task based on its internal NUMA resource distribution. While the scheduler uses NUMA-level data to score and select a compute node, the actual resource pinning within that node is managed by the local Topology Manager and CPU Manager.
When a high-performance workload is scheduled in a NUMA-aware cluster, the following steps occur:
- Node filtering: The scheduler first filters the entire cluster to find a shortlist of feasible nodes. A node is only kept if the node meets all requirements, such as matching labels, respecting taints and tolerations, and, importantly, having sufficient available resources within its specific NUMA zones. If a node cannot satisfy the NUMA affinity of the workload, the node is filtered out at this stage.
- Node selection: When a shortlist of suitable nodes is established, the scheduler evaluates them to find the best fit. The scheduler applies a NUMA-aware scoring strategy to rank these candidates based on their resource distribution. The node with the highest score is then selected for the workload.
- Local Allocation: When the pod is assigned to a compute node, the node-level components (CPU, memory, device, and topology managers) perform the authoritative allocation of specific CPUs and memory. The scheduler does not influence this final selection.
The following table summarizes the different OpenShift Container Platform strategies and their outcomes:
| Strategy | Description | Outcome |
|---|---|---|
|
| Favors compute nodes that contain NUMA zones with the most available resources. | Distributes workloads across the cluster to nodes with the highest available headroom. |
|
| Favors compute nodes where the requested resources fit into NUMA zones that are already highly utilized. | Consolidates workloads on already utilized nodes, potentially leaving other nodes idle. |
|
| Favors compute nodes with the most balanced CPU and memory usage across NUMA zones. | Prevents skewed usage patterns where one resource type, such as CPU, is exhausted while another, such as memory, remains idle. |
12.4. Installing the NUMA Resources Operator Copy linkLink copied to clipboard!
NUMA Resources Operator deploys resources that allow you to schedule NUMA-aware workloads and deployments. You can install the NUMA Resources Operator using the OpenShift Container Platform CLI or the web console.
12.4.1. Installing the NUMA Resources Operator using the CLI Copy linkLink copied to clipboard!
To enable NUMA-aware scheduling for high-performance workloads, install the NUMA Resources Operator by using the OpenShift CLI (oc). As a cluster administrator, you can deploy the Operator efficiently without using the web console.
Prerequisites
-
Installed the OpenShift CLI (
oc). -
Logged in as a user with
cluster-adminprivileges.
Procedure
Create a namespace for the NUMA Resources Operator:
Save the following YAML in the
nro-namespace.yamlfile:apiVersion: v1 kind: Namespace metadata: name: openshift-numaresources # ...
apiVersion: v1 kind: Namespace metadata: name: openshift-numaresources # ...Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
NamespaceCR by running the following command:oc create -f nro-namespace.yaml
$ oc create -f nro-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the Operator group for the NUMA Resources Operator:
Save the following YAML in the
nro-operatorgroup.yamlfile:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OperatorGroupCR by running the following command:oc create -f nro-operatorgroup.yaml
$ oc create -f nro-operatorgroup.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the subscription for the NUMA Resources Operator:
Save the following YAML in the
nro-sub.yamlfile:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
SubscriptionCR by running the following command:oc create -f nro-sub.yaml
$ oc create -f nro-sub.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the installation succeeded by inspecting the CSV resource in the
openshift-numaresourcesnamespace. Run the following command:oc get csv -n openshift-numaresources
$ oc get csv -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE numaresources-operator.v4.19.2 numaresources-operator 4.19.2 Succeeded
NAME DISPLAY VERSION REPLACES PHASE numaresources-operator.v4.19.2 numaresources-operator 4.19.2 SucceededCopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.4.2. Installing the NUMA Resources Operator using the web console Copy linkLink copied to clipboard!
To enable NUMA-aware scheduling for high-performance workloads, install the NUMA Resources Operator by using the web console. As a cluster administrator, you can deploy the Operator through the graphical interface.
Procedure
Create a namespace for the NUMA Resources Operator:
-
In the OpenShift Container Platform web console, click Administration
Namespaces. -
Click Create Namespace, enter
openshift-numaresourcesin the Name field, and then click Create.
-
In the OpenShift Container Platform web console, click Administration
Install the NUMA Resources Operator:
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Choose numaresources-operator from the list of available Operators, and then click Install.
-
In the Installed Namespaces field, select the
openshift-numaresourcesnamespace, and then click Install.
-
In the OpenShift Container Platform web console, click Operators
Optional: Verify that the NUMA Resources Operator installed successfully:
-
Switch to the Operators
Installed Operators page. Ensure that NUMA Resources Operator is listed in the
openshift-numaresourcesnamespace with a Status of InstallSucceeded.NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
-
Go to the Operators
Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status. -
Go to the Workloads
Pods page and check the logs for pods in the defaultproject.
-
Go to the Operators
-
Switch to the Operators
12.5. Configuring a single NUMA node policy Copy linkLink copied to clipboard!
To enable the NUMA Resources Operator, configure a single NUMA node policy on your cluster. You can implement this policy by creating a performance profile or by configuring a KubeletConfig custom resource (CR).
The preferred way to configure a single NUMA node policy is to apply a performance profile. You can use the Performance Profile Creator (PPC) tool to create the performance profile. If a performance profile is created on the cluster, the PPC tool automatically creates other tuning components like KubeletConfig and the tuned profile.
For more information about creating a performance profile, see "About the Performance Profile Creator" in the "Additional resources" section.
12.5.1. Sample performance profile Copy linkLink copied to clipboard!
Reference an example YAML to understand how to use the performance profile creator (PPC) tool to create a performance profile.
where:
spec.pools.operator.machineconfiguration.openshift.io/worker-
Specifies the value that must match the
MachineConfigPoolvalue that you want to configure the NUMA Resources Operator on. For example, you might create aMachineConfigPoolobject namedworker-cnfthat designates a set of nodes that run telecommunications workloads. The value forMachineConfigPoolmust match themachineConfigPoolSelectorvalue in theNUMAResourcesOperatorCR that you configure later in "Creating the NUMAResourcesOperator custom resource". spec.numa.topologyPolicySpecifies that the
topologyPolicyfield is set tosingle-numa-nodeby setting thetopology-manager-policyargument tosingle-numa-nodewhen you run the PPC tool.NoteFor hosted control plane clusters, the
machineConfigPoolSelectordoes not have any functional effect. Node association is instead determined by the specifiedNodePoolobject.
12.5.2. Creating a KubeletConfig CR Copy linkLink copied to clipboard!
To configure a single NUMA node policy, create and apply a KubeletConfig custom resource (CR). While applying a performance profile is recommended, you can use the alternative method to manually manage the configuration on your cluster.
Procedure
Create the
KubeletConfigcustom resource (CR) that configures the pod admittance policy for the machine profile:Save the following YAML in the
nro-kubeletconfig.yamlfile:Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
spec.machineConfigPoolSelector.matchLabels.pools.operator.machineconfiguration.openshift.io/worker-
Specifies that this label matches the
machineConfigPoolSelectorsetting in theNUMAResourcesOperatorCR that you configure later in "Creating the NUMAResourcesOperator custom resource". spec.kubeletConfig.cpuManagerPolicy-
Specifies the
staticvalue. You must use a lowercases. spec.kubeletConfig.reservedSystemCPUs- Adjust the field based on the CPU on your nodes.
spec.kubeletConfig.memoryManagerPolicy-
Specifies
Static. You must use an uppercaseS. spec.kubeletConfig.topologyManagerPolicySpecifies the value as
single-numa-node.NoteFor hosted control plane clusters, the
machineConfigPoolSelectorsetting does not have any functional effect. Node association is instead determined by the specifiedNodePoolobject. To apply aKubeletConfigfor hosted control plane clusters, you must create aConfigMapthat contains the configuration, and then reference thatConfigMapwithin thespec.configfield of aNodePool.
Create the
KubeletConfigCR by running the following command:oc create -f nro-kubeletconfig.yaml
$ oc create -f nro-kubeletconfig.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteApplying performance profile or
KubeletConfigautomatically triggers rebooting of the nodes. If no reboot is triggered, you can troubleshoot the issue by looking at the labels inKubeletConfigthat address the node group.
12.6. Scheduling NUMA-aware workloads Copy linkLink copied to clipboard!
To process latency-sensitive and high-performance workloads efficiently, configure your OpenShift Container Platform cluster for NUMA-aware scheduling. This process aligns pods with specific NUMA zones to minimize network delays and maximize compute resource utilization.
Clusters running latency-sensitive workloads typically feature performance profiles that help to minimize workload latency and optimize performance. The NUMA-aware scheduler deploys workloads based on available node NUMA resources and with respect to any performance profile settings applied to the node. The combination of NUMA-aware deployments, and the performance profile of the workload, ensures that workloads are scheduled in a way that maximizes performance.
For the NUMA Resources Operator to be fully operational, you must deploy the NUMAResourcesOperator custom resource and the NUMA-aware secondary pod scheduler.
12.6.1. Creating the NUMAResourcesOperator custom resource Copy linkLink copied to clipboard!
After you have installed the NUMA Resources Operator, you can create the NUMAResourcesOperator custom resource (CR). This CR instructs the NUMA Resources Operator to install all the cluster infrastructure that is needed to support the NUMA-aware scheduler, including daemon sets and APIs.
Prerequisites
-
Installed the OpenShift CLI (
oc). -
Logged in as a user with
cluster-adminprivileges. - Installed the NUMA Resources Operator.
Procedure
Create the
NUMAResourcesOperatorcustom resource:Save the following minimal required YAML file example as
nrop.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
pools.operator.machineconfiguration.openshift.io/worker: Specifies a value that must match theMachineConfigPoolresource that you want to configure the NUMA Resources Operator on. For example, you might have created aMachineConfigPoolresource namedworker-cnfthat designates a set of nodes expected to run telecommunications workloads. EachNodeGroupmust match exactly oneMachineConfigPool. Configurations whereNodeGroupmatches more than oneMachineConfigPoolare not supported.
-
Create the
NUMAResourcesOperatorCR by running the following command:oc create -f nrop.yaml
$ oc create -f nrop.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: To enable NUMA-aware scheduling for multiple machine config pools (MCPs), define a separate
NodeGroupfor each pool. For example, define threeNodeGroupsforworker-cnf,worker-ht, andworker-other, in theNUMAResourcesOperatorCR as shown in the following example:Example YAML definition for a
NUMAResourcesOperatorCR with multipleNodeGroupsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the NUMA Resources Operator deployed successfully by running the following command:
oc get numaresourcesoperators.nodetopology.openshift.io
$ oc get numaresourcesoperators.nodetopology.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesoperator 27s
NAME AGE numaresourcesoperator 27sCopy to Clipboard Copied! Toggle word wrap Toggle overflow After a few minutes, run the following command to verify that the required resources deployed successfully:
oc get all -n openshift-numaresources
$ oc get all -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m pod/numaresourcesoperator-worker-7d96r 2/2 Running 0 97s pod/numaresourcesoperator-worker-crsht 2/2 Running 0 97s pod/numaresourcesoperator-worker-jp9mw 2/2 Running 0 97s
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m pod/numaresourcesoperator-worker-7d96r 2/2 Running 0 97s pod/numaresourcesoperator-worker-crsht 2/2 Running 0 97s pod/numaresourcesoperator-worker-jp9mw 2/2 Running 0 97sCopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.6.2. Creating the NUMAResourcesOperator custom resource for hosted control planes Copy linkLink copied to clipboard!
After you install the NUMA Resources Operator, create the NUMAResourcesOperator custom resource (CR). The CR instructs the NUMA Resources Operator to install all the cluster infrastructure that is needed to support the NUMA-aware scheduler on hosted control planes, including daemon sets and APIs.
Creating the NUMAResourcesOperator custom resource for hosted control planes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Prerequisites
-
Installed the OpenShift CLI (
oc). -
Logged in as a user with
cluster-adminprivileges. - Installed the NUMA Resources Operator.
Procedure
Export the management cluster kubeconfig file by running the following command:
export KUBECONFIG=<path-to-management-cluster-kubeconfig>
$ export KUBECONFIG=<path-to-management-cluster-kubeconfig>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Find the
node-pool-namefor your cluster by running the following command:oc --kubeconfig="$MGMT_KUBECONFIG" get np -A
$ oc --kubeconfig="$MGMT_KUBECONFIG" get np -ACopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE clusters democluster-us-east-1a democluster 1 1 False False 4.19.0 False False
NAMESPACE NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE clusters democluster-us-east-1a democluster 1 1 False False 4.19.0 False FalseCopy to Clipboard Copied! Toggle word wrap Toggle overflow The
node-pool-nameis theNAMEfield in the output. In this example, thenode-pool-nameisdemocluster-us-east-1a.Create a YAML file named
nrop-hcp.yamlwith at least the following content:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
spec.nodeGroups.poolName: Specifies thepoolName. The example shows thenode-pool-namepool name that was retrieved from a previous step.
-
On the management cluster, run the following command to list the available secrets:
oc get secrets -n clusters
$ oc get secrets -n clustersCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Extract the
kubeconfigfile for the hosted cluster by running the following command:oc get secret <SECRET_NAME> -n clusters -o jsonpath='{.data.kubeconfig}' | base64 -d > hosted-cluster-kubeconfig$ oc get secret <SECRET_NAME> -n clusters -o jsonpath='{.data.kubeconfig}' | base64 -d > hosted-cluster-kubeconfigCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
oc get secret democluster-admin-kubeconfig -n clusters -o jsonpath='{.data.kubeconfig}' | base64 -d > hosted-cluster-kubeconfig$ oc get secret democluster-admin-kubeconfig -n clusters -o jsonpath='{.data.kubeconfig}' | base64 -d > hosted-cluster-kubeconfigCopy to Clipboard Copied! Toggle word wrap Toggle overflow Export the hosted cluster
kubeconfigfile by running the following command:export HC_KUBECONFIG=<path_to_hosted-cluster-kubeconfig>
$ export HC_KUBECONFIG=<path_to_hosted-cluster-kubeconfig>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
NUMAResourcesOperatorCR by running the following command on the hosted cluster:oc create -f nrop-hcp.yaml
$ oc create -f nrop-hcp.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the NUMA Resources Operator deployed successfully by running the following command:
oc get numaresourcesoperators.nodetopology.openshift.io
$ oc get numaresourcesoperators.nodetopology.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesoperator 27s
NAME AGE numaresourcesoperator 27sCopy to Clipboard Copied! Toggle word wrap Toggle overflow After a few minutes, run the following command to verify that the required resources deployed successfully:
oc get all -n openshift-numaresources
$ oc get all -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m pod/numaresourcesoperator-democluster-7d96r 2/2 Running 0 97s pod/numaresourcesoperator-democluster-crsht 2/2 Running 0 97s pod/numaresourcesoperator-democluster-jp9mw 2/2 Running 0 97s
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m pod/numaresourcesoperator-democluster-7d96r 2/2 Running 0 97s pod/numaresourcesoperator-democluster-crsht 2/2 Running 0 97s pod/numaresourcesoperator-democluster-jp9mw 2/2 Running 0 97sCopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.6.3. Deploying the NUMA-aware secondary pod scheduler Copy linkLink copied to clipboard!
After you install the NUMA Resources Operator, follow this procedure to deploy the NUMA-aware secondary pod scheduler.
Procedure
Create the
NUMAResourcesSchedulercustom resource that deploys the NUMA-aware custom pod scheduler:Save the following minimal required YAML in the
nro-scheduler.yamlfile:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- In a disconnected environment, make sure to configure the resolution of this image by either:
-
Creating an
ImageTagMirrorSetcustom resource (CR). For more information, see "Configuring image registry repository mirroring" in the "Additional resources" section. - Setting the URL to the disconnected registry.
-
Creating an
Create the
NUMAResourcesSchedulerCR by running the following command:oc create -f nro-scheduler.yaml
$ oc create -f nro-scheduler.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIn a hosted control plane cluster, run this command on the hosted control plane node.
After a few seconds, run the following command to confirm the successful deployment of the required resources:
oc get all -n openshift-numaresources
$ oc get all -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.6.4. Scheduling workloads with the NUMA-aware scheduler Copy linkLink copied to clipboard!
To schedule workloads with the NUMA-aware scheduler, use deployment CRs that specify the minimum required resources. This ensures your cluster processes the workloads efficiently.
Before you schedule workloads with the NUMA-aware scheduler, ensure that you previouslu installed the topo-aware-scheduler, you applied the NUMAResourcesOperator and NUMAResourcesScheduler CRs, and that your cluster has a matching performance profile or kubeletconfig.
The example in the procedure uses NUMA-aware scheduling for a sample workload.
Prerequisites
-
Installed the OpenShift CLI (
oc). -
Logged in as a user with
cluster-adminprivileges.
Procedure
Get the name of the NUMA-aware scheduler that is deployed in the cluster by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'
$ oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
"topo-aware-scheduler"
"topo-aware-scheduler"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
DeploymentCR that uses scheduler namedtopo-aware-scheduler, for example:Save the following YAML in the
nro-deployment.yamlfile:Copy to Clipboard Copied! Toggle word wrap Toggle overflow spec.schedulerName: Specifies the scheduler name that must match the name of the NUMA-aware scheduler that is deployed in your cluster, such astopo-aware-scheduler.Create the
DeploymentCR by running the following command:oc create -f nro-deployment.yaml
$ oc create -f nro-deployment.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the deployment was successful:
oc get pods -n openshift-numaresources
$ oc get pods -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
topo-aware-scheduleris scheduling the deployed pod by running the following command:oc describe pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources
$ oc describe pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m45s topo-aware-scheduler Successfully assigned openshift-numaresources/numa-deployment-1-6c4f5bdb84-wgn6g to worker-1
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m45s topo-aware-scheduler Successfully assigned openshift-numaresources/numa-deployment-1-6c4f5bdb84-wgn6g to worker-1Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteDeployments that request more resources than is available for scheduling will fail with a
MinimumReplicasUnavailableerror. The deployment succeeds when the required resources become available. Pods remain in thePendingstate until the required resources are available.Verify that the expected allocated resources are listed for the node.
Identify the node that is running the deployment pod by running the following command:
oc get pods -n openshift-numaresources -o wide
$ oc get pods -n openshift-numaresources -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES numa-deployment-1-6c4f5bdb84-wgn6g 0/2 Running 0 82m 10.128.2.50 worker-1 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES numa-deployment-1-6c4f5bdb84-wgn6g 0/2 Running 0 82m 10.128.2.50 worker-1 <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command with the name of that node that is running the deployment pod.
oc describe noderesourcetopologies.topology.node.k8s.io worker-1
$ oc describe noderesourcetopologies.topology.node.k8s.io worker-1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Resources.Available: Specifies theAvailablecapacity that is reduced because of the resources that have been allocated to the guaranteed pod. Resources consumed by guaranteed pods are subtracted from the available node resources listed undernoderesourcetopologies.topology.node.k8s.io.
Resource allocations for pods with a
Best-effortorBurstablequality of service (qosClass) are not reflected in the NUMA node resources undernoderesourcetopologies.topology.node.k8s.io. If a pod’s consumed resources are not reflected in the node resource calculation, verify that the pod hasqosClassofGuaranteedand the CPU request is an integer value, not a decimal value. You can verify the that the pod has aqosClassofGuaranteedby running the following command:oc get pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources -o jsonpath="{ .status.qosClass }"$ oc get pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources -o jsonpath="{ .status.qosClass }"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Guaranteed
GuaranteedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.7. Configuring polling operations for NUMA resources updates Copy linkLink copied to clipboard!
As an optional task, you can improve scheduling behavior and troubleshoot suboptimal scheduling decisions by configuring the spec.nodeGroups specification in the NUMAResourcesOperator custom resource (CR). This configuration fine-tunes how daemons poll for available NUMA resources, providing advanced control over your polling operations.
The configuration options are listed as follows:
-
infoRefreshMode: Determines the trigger condition for polling the kubelet. The NUMA Resources Operator reports the resulting information to the API server. -
infoRefreshPeriod: Determines the duration between polling updates. -
podsFingerprinting: Determines if point-in-time information for the current set of pods running on a node is exposed in polling updates.
The default value for podsFingerprinting is EnabledExclusiveResources. To optimize scheduler performance, set podsFingerprinting to either EnabledExclusiveResources or Enabled. Additionally, configure the cacheResyncPeriod in the NUMAResourcesScheduler custom resource (CR) to a value greater than 0. The cacheResyncPeriod specification helps to report more exact resource availability by monitoring pending resources on nodes.
Prerequisites
-
Installed the OpenShift CLI (
oc). -
Logged in as a user with
cluster-adminprivileges. - Installed the NUMA Resources Operator.
Procedure
Configure the
spec.nodeGroupsspecification in yourNUMAResourcesOperatorCR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
spec.nodeGroups.config.infoRefreshMode-
Valid values are
Periodic,Events,PeriodicAndEvents. UsePeriodicto poll the kubelet at intervals that you define ininfoRefreshPeriod. UseEventsto poll the kubelet at every pod lifecycle event. UsePeriodicAndEventsto enable both methods. spec.nodeGroups.config.infoRefreshPeriod-
Specifies the polling interval for
PeriodicorPeriodicAndEventsrefresh modes. The field is ignored if the refresh mode isEvents. spec.nodeGroups.config.podsFingerprinting-
Valid values are
Enabled,Disabled, andEnabledExclusiveResources. Setting toEnabledorEnabledExclusiveResourcesis a requirement for thecacheResyncPeriodspecification in theNUMAResourcesScheduler.
Verification
After you deploy the NUMA Resources Operator, verify that the node group configurations were applied by running the following command:
oc get numaresop numaresourcesoperator -o json | jq '.status'
$ oc get numaresop numaresourcesoperator -o json | jq '.status'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8. Troubleshooting NUMA-aware scheduling Copy linkLink copied to clipboard!
To troubleshoot common problems with NUMA-aware pod scheduling, perform the following steps.
Prerequisites
-
Install the OpenShift Container Platform CLI (
oc). - Log in as a user with cluster-admin privileges.
- Install the NUMA Resources Operator and deploy the NUMA-aware secondary scheduler.
Procedure
Verify that the
noderesourcetopologiesCRD is deployed in the cluster by running the following command:oc get crd | grep noderesourcetopologies
$ oc get crd | grep noderesourcetopologiesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT noderesourcetopologies.topology.node.k8s.io 2022-01-18T08:28:06Z
NAME CREATED AT noderesourcetopologies.topology.node.k8s.io 2022-01-18T08:28:06ZCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the NUMA-aware scheduler name matches the name specified in your NUMA-aware workloads by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'
$ oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
topo-aware-scheduler
topo-aware-schedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that NUMA-aware schedulable nodes have the
noderesourcetopologiesCR applied to them. Run the following command:oc get noderesourcetopologies.topology.node.k8s.io
$ oc get noderesourcetopologies.topology.node.k8s.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE compute-0.example.com 17h compute-1.example.com 17h
NAME AGE compute-0.example.com 17h compute-1.example.com 17hCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe number of nodes should equal the number of worker nodes that are configured by the machine config pool (
mcp) worker definition.Verify the NUMA zone granularity for all schedulable nodes by running the following command:
oc get noderesourcetopologies.topology.node.k8s.io -o yaml
$ oc get noderesourcetopologies.topology.node.k8s.io -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8.1. Reporting more exact resource availability Copy linkLink copied to clipboard!
Enable the cacheResyncPeriod specification to help the NUMA Resources Operator report more exact resource availability by monitoring pending resources on nodes and synchronizing this information in the scheduler cache at a defined interval. This also helps to minimize Topology Affinity Error errors because of sub-optimal scheduling decisions. The lower the interval, the greater the network load. The cacheResyncPeriod specification is disabled by default.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Delete the currently running
NUMAResourcesSchedulerresource:Get the active
NUMAResourcesSchedulerby running the following command:oc get NUMAResourcesScheduler
$ oc get NUMAResourcesSchedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 92m
NAME AGE numaresourcesscheduler 92mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the secondary scheduler resource by running the following command:
oc delete NUMAResourcesScheduler numaresourcesscheduler
$ oc delete NUMAResourcesScheduler numaresourcesschedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Save the following YAML in the file
nro-scheduler-cacheresync.yaml. This example changes the log level toDebug:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Enter an interval value in seconds for synchronization of the scheduler cache. A value of
5sis typical for most implementations.
Create the updated
NUMAResourcesSchedulerresource by running the following command:oc create -f nro-scheduler-cacheresync.yaml
$ oc create -f nro-scheduler-cacheresync.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
Check that the NUMA-aware scheduler was successfully deployed:
Run the following command to check that the CRD is created successfully:
oc get crd | grep numaresourcesschedulers
$ oc get crd | grep numaresourcesschedulersCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03ZCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the new custom scheduler is available by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io
$ oc get numaresourcesschedulers.nodetopology.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 3h26m
NAME AGE numaresourcesscheduler 3h26mCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the logs for the scheduler show the increased log level:
Get the list of pods running in the
openshift-numaresourcesnamespace by running the following command:oc get pods -n openshift-numaresources
$ oc get pods -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the logs for the secondary scheduler pod by running the following command:
oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources
$ oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8.2. Changing where high-performance workloads run Copy linkLink copied to clipboard!
To optimize the processing of high-performance workloads, change the default placement behavior of the NUMA-aware secondary scheduler. With this configuration, you can assign workloads to a specific NUMA node within a compute node instead of relying on default resource availability.
If you want to change where the workloads run, you can add the scoringStrategy setting to the NUMAResourcesScheduler custom resource and set its value to either MostAllocated or BalancedAllocation.
Prerequisites
-
Installed the OpenShift CLI (
oc). -
Logged in as a user with
cluster-adminprivileges.
Procedure
Delete the currently running
NUMAResourcesSchedulerresource by using the following steps:Get the active
NUMAResourcesSchedulerby running the following command:oc get NUMAResourcesScheduler
$ oc get NUMAResourcesSchedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 92m
NAME AGE numaresourcesscheduler 92mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the secondary scheduler resource by running the following command:
oc delete NUMAResourcesScheduler numaresourcesscheduler
$ oc delete NUMAResourcesScheduler numaresourcesschedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Save the following YAML in the file
nro-scheduler-mostallocated.yaml. This example changes thescoringStrategytoMostAllocated:Copy to Clipboard Copied! Toggle word wrap Toggle overflow spec.imageSpec.scoringStrategy: If thescoringStrategyconfiguration is omitted, the default ofLeastAllocatedapplies.Create the updated
NUMAResourcesSchedulerresource by running the following command:oc create -f nro-scheduler-mostallocated.yaml
$ oc create -f nro-scheduler-mostallocated.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the NUMA-aware scheduler was successfully deployed by using the following steps:
Run the following command to check that the custom resource definition (CRD) is created successfully:
oc get crd | grep numaresourcesschedulers
$ oc get crd | grep numaresourcesschedulersCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03ZCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the new custom scheduler is available by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io
$ oc get numaresourcesschedulers.nodetopology.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 3h26m
NAME AGE numaresourcesscheduler 3h26mCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verify that the
ScoringStrategyhas been applied correctly by running the following command to check the relevantConfigMapresource for the scheduler:oc get -n openshift-numaresources cm topo-aware-scheduler-config -o yaml | grep scoring -A 1
$ oc get -n openshift-numaresources cm topo-aware-scheduler-config -o yaml | grep scoring -A 1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
scoringStrategy: type: MostAllocated
scoringStrategy: type: MostAllocatedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8.3. Checking the NUMA-aware scheduler logs Copy linkLink copied to clipboard!
Troubleshoot problems with the NUMA-aware scheduler by reviewing the logs. If required, you can increase the scheduler log level by modifying the spec.logLevel field of the NUMAResourcesScheduler resource. Acceptable values are Normal, Debug, and Trace, with Trace being the most verbose option.
To change the log level of the secondary scheduler, delete the running scheduler resource and re-deploy it with the changed log level. The scheduler is unavailable for scheduling new workloads during this downtime.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Delete the currently running
NUMAResourcesSchedulerresource:Get the active
NUMAResourcesSchedulerby running the following command:oc get NUMAResourcesScheduler
$ oc get NUMAResourcesSchedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 90m
NAME AGE numaresourcesscheduler 90mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the secondary scheduler resource by running the following command:
oc delete NUMAResourcesScheduler numaresourcesscheduler
$ oc delete NUMAResourcesScheduler numaresourcesschedulerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Save the following YAML in the file
nro-scheduler-debug.yaml. This example changes the log level toDebug:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the updated
DebugloggingNUMAResourcesSchedulerresource by running the following command:oc create -f nro-scheduler-debug.yaml
$ oc create -f nro-scheduler-debug.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
Check that the NUMA-aware scheduler was successfully deployed:
Run the following command to check that the CRD is created successfully:
oc get crd | grep numaresourcesschedulers
$ oc get crd | grep numaresourcesschedulersCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03ZCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the new custom scheduler is available by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io
$ oc get numaresourcesschedulers.nodetopology.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 3h26m
NAME AGE numaresourcesscheduler 3h26mCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the logs for the scheduler shows the increased log level:
Get the list of pods running in the
openshift-numaresourcesnamespace by running the following command:oc get pods -n openshift-numaresources
$ oc get pods -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the logs for the secondary scheduler pod by running the following command:
oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources
$ oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8.4. Troubleshooting the resource topology exporter Copy linkLink copied to clipboard!
Troubleshoot noderesourcetopologies objects where unexpected results are occurring by inspecting the corresponding resource-topology-exporter logs.
It is recommended that NUMA resource topology exporter instances in the cluster are named for nodes they refer to. For example, a worker node with the name worker should have a corresponding noderesourcetopologies object called worker.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Get the daemonsets managed by the NUMA Resources Operator. Each daemonset has a corresponding
nodeGroupin theNUMAResourcesOperatorCR. Run the following command:oc get numaresourcesoperators.nodetopology.openshift.io numaresourcesoperator -o jsonpath="{.status.daemonsets[0]}"$ oc get numaresourcesoperators.nodetopology.openshift.io numaresourcesoperator -o jsonpath="{.status.daemonsets[0]}"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"name":"numaresourcesoperator-worker","namespace":"openshift-numaresources"}{"name":"numaresourcesoperator-worker","namespace":"openshift-numaresources"}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the label for the daemonset of interest using the value for
namefrom the previous step:oc get ds -n openshift-numaresources numaresourcesoperator-worker -o jsonpath="{.spec.selector.matchLabels}"$ oc get ds -n openshift-numaresources numaresourcesoperator-worker -o jsonpath="{.spec.selector.matchLabels}"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"name":"resource-topology"}{"name":"resource-topology"}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the pods using the
resource-topologylabel by running the following command:oc get pods -n openshift-numaresources -l name=resource-topology -o wide
$ oc get pods -n openshift-numaresources -l name=resource-topology -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE numaresourcesoperator-worker-5wm2k 2/2 Running 0 2d1h 10.135.0.64 compute-0.example.com numaresourcesoperator-worker-pb75c 2/2 Running 0 2d1h 10.132.2.33 compute-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE numaresourcesoperator-worker-5wm2k 2/2 Running 0 2d1h 10.135.0.64 compute-0.example.com numaresourcesoperator-worker-pb75c 2/2 Running 0 2d1h 10.132.2.33 compute-1.example.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow Examine the logs of the
resource-topology-exportercontainer running on the worker pod that corresponds to the node you are troubleshooting. Run the following command:oc logs -n openshift-numaresources -c resource-topology-exporter numaresourcesoperator-worker-pb75c
$ oc logs -n openshift-numaresources -c resource-topology-exporter numaresourcesoperator-worker-pb75cCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8.5. Correcting a missing resource topology exporter config map Copy linkLink copied to clipboard!
If you install the NUMA Resources Operator in a cluster with misconfigured cluster settings, in some circumstances, the Operator is shown as active but the logs of the resource topology exporter (RTE) daemon set pods show that the configuration for the RTE is missing, for example:
Info: couldn't find configuration in "/etc/resource-topology-exporter/config.yaml"
Info: couldn't find configuration in "/etc/resource-topology-exporter/config.yaml"
This log message indicates that the kubeletconfig with the required configuration was not properly applied in the cluster, resulting in a missing RTE configmap. For example, the following cluster is missing a numaresourcesoperator-worker configmap custom resource (CR):
oc get configmap
$ oc get configmap
Example output
NAME DATA AGE 0e2a6bd3.openshift-kni.io 0 6d21h kube-root-ca.crt 1 6d21h openshift-service-ca.crt 1 6d21h topo-aware-scheduler-config 1 6d18h
NAME DATA AGE
0e2a6bd3.openshift-kni.io 0 6d21h
kube-root-ca.crt 1 6d21h
openshift-service-ca.crt 1 6d21h
topo-aware-scheduler-config 1 6d18h
In a correctly configured cluster, oc get configmap also returns a numaresourcesoperator-worker configmap CR.
Prerequisites
-
Install the OpenShift Container Platform CLI (
oc). - Log in as a user with cluster-admin privileges.
- Install the NUMA Resources Operator and deploy the NUMA-aware secondary scheduler.
Procedure
Compare the values for
spec.machineConfigPoolSelector.matchLabelsinkubeletconfigandmetadata.labelsin theMachineConfigPool(mcp) worker CR using the following commands:Check the
kubeletconfiglabels by running the following command:oc get kubeletconfig -o yaml
$ oc get kubeletconfig -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
machineConfigPoolSelector: matchLabels: cnf-worker-tuning: enabledmachineConfigPoolSelector: matchLabels: cnf-worker-tuning: enabledCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
mcplabels by running the following command:oc get mcp worker -o yaml
$ oc get mcp worker -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: ""
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: ""Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
cnf-worker-tuning: enabledlabel is not present in theMachineConfigPoolobject.
Edit the
MachineConfigPoolCR to include the missing label, for example:oc edit mcp worker -o yaml
$ oc edit mcp worker -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: "" cnf-worker-tuning: enabled
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: "" cnf-worker-tuning: enabledCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Apply the label changes and wait for the cluster to apply the updated configuration. Run the following command:
Verification
Check that the missing
numaresourcesoperator-workerconfigmapCR is applied:oc get configmap
$ oc get configmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.8.6. Collecting NUMA Resources Operator data Copy linkLink copied to clipboard!
You can use the oc adm must-gather CLI command to collect information about your cluster, including features and objects associated with the NUMA Resources Operator.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
To collect NUMA Resources Operator data with
must-gather, you must specify the NUMA Resources Operatormust-gatherimage.oc adm must-gather --image=registry.redhat.io/openshift4/numaresources-must-gather-rhel9:v4.19
$ oc adm must-gather --image=registry.redhat.io/openshift4/numaresources-must-gather-rhel9:v4.19Copy to Clipboard Copied! Toggle word wrap Toggle overflow