Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 8. Scheduling NUMA-aware workloads
Learn about NUMA-aware scheduling and how you can use it to deploy high performance workloads in an OpenShift Container Platform cluster.
The NUMA Resources Operator allows you to schedule high-performance workloads in the same NUMA zone. It deploys a node resources exporting agent that reports on available cluster node NUMA resources, and a secondary scheduler that manages the workloads.
8.1. About NUMA-aware scheduling Copier lienLien copié sur presse-papiers!
Introduction to NUMA
Non-Uniform Memory Access (NUMA) is a compute platform architecture that allows different CPUs to access different regions of memory at different speeds. NUMA resource topology refers to the locations of CPUs, memory, and PCI devices relative to each other in the compute node. Colocated resources are said to be in the same NUMA zone. For high-performance applications, the cluster needs to process pod workloads in a single NUMA zone.
Performance considerations
NUMA architecture allows a CPU with multiple memory controllers to use any available memory across CPU complexes, regardless of where the memory is located. This allows for increased flexibility at the expense of performance. A CPU processing a workload using memory that is outside its NUMA zone is slower than a workload processed in a single NUMA zone. Also, for I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application. High-performance workloads, such as telecommunications workloads, cannot operate to specification under these conditions.
NUMA-aware scheduling
NUMA-aware scheduling aligns the requested cluster compute resources (CPUs, memory, devices) in the same NUMA zone to process latency-sensitive or high-performance workloads efficiently. NUMA-aware scheduling also improves pod density per compute node for greater resource efficiency.
Integration with Node Tuning Operator
By integrating the Node Tuning Operator’s performance profile with NUMA-aware scheduling, you can further configure CPU affinity to optimize performance for latency-sensitive workloads.
Default scheduling logic
The default OpenShift Container Platform pod scheduler scheduling logic considers the available resources of the entire compute node, not individual NUMA zones. If the most restrictive resource alignment is requested in the kubelet topology manager, error conditions can occur when admitting the pod to a node. Conversely, if the most restrictive resource alignment is not requested, the pod can be admitted to the node without proper resource alignment, leading to worse or unpredictable performance. For example, runaway pod creation with Topology Affinity Error
statuses can occur when the pod scheduler makes suboptimal scheduling decisions for guaranteed pod workloads without knowing if the pod’s requested resources are available. Scheduling mismatch decisions can cause indefinite pod startup delays. Also, depending on the cluster state and resource allocation, poor pod scheduling decisions can cause extra load on the cluster because of failed startup attempts.
NUMA-aware pod scheduling diagram
The NUMA Resources Operator deploys a custom NUMA resources secondary scheduler and other resources to mitigate against the shortcomings of the default OpenShift Container Platform pod scheduler. The following diagram provides a high-level overview of NUMA-aware pod scheduling.
Figure 8.1. NUMA-aware scheduling overview
- NodeResourceTopology API
-
The
NodeResourceTopology
API describes the available NUMA zone resources in each compute node. - NUMA-aware scheduler
-
The NUMA-aware secondary scheduler receives information about the available NUMA zones from the
NodeResourceTopology
API and schedules high-performance workloads on a node where it can be optimally processed. - Node topology exporter
-
The node topology exporter exposes the available NUMA zone resources for each compute node to the
NodeResourceTopology
API. The node topology exporter daemon tracks the resource allocation from the kubelet by using thePodResources
API. - PodResources API
The
PodResources
API is local to each node and exposes the resource topology and available resources to the kubelet.NoteThe
List
endpoint of thePodResources
API exposes exclusive CPUs allocated to a particular container. The API does not expose CPUs that belong to a shared pool.The
GetAllocatableResources
endpoint exposes allocatable resources available on a node.
Additional resources
- For more information about running secondary pod schedulers in your cluster and how to deploy pods with a secondary pod scheduler, see Scheduling pods using a secondary scheduler.
8.2. Installing the NUMA Resources Operator Copier lienLien copié sur presse-papiers!
NUMA Resources Operator deploys resources that allow you to schedule NUMA-aware workloads and deployments. You can install the NUMA Resources Operator using the OpenShift Container Platform CLI or the web console.
8.2.1. Installing the NUMA Resources Operator using the CLI Copier lienLien copié sur presse-papiers!
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a namespace for the NUMA Resources Operator:
Save the following YAML in the
nro-namespace.yaml
file:apiVersion: v1 kind: Namespace metadata: name: openshift-numaresources
apiVersion: v1 kind: Namespace metadata: name: openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
Namespace
CR by running the following command:oc create -f nro-namespace.yaml
$ oc create -f nro-namespace.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the Operator group for the NUMA Resources Operator:
Save the following YAML in the
nro-operatorgroup.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OperatorGroup
CR by running the following command:oc create -f nro-operatorgroup.yaml
$ oc create -f nro-operatorgroup.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the subscription for the NUMA Resources Operator:
Save the following YAML in the
nro-sub.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
Subscription
CR by running the following command:oc create -f nro-sub.yaml
$ oc create -f nro-sub.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the installation succeeded by inspecting the CSV resource in the
openshift-numaresources
namespace. Run the following command:oc get csv -n openshift-numaresources
$ oc get csv -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE numaresources-operator.v4.13.2 numaresources-operator 4.13.2 Succeeded
NAME DISPLAY VERSION REPLACES PHASE numaresources-operator.v4.13.2 numaresources-operator 4.13.2 Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.2.2. Installing the NUMA Resources Operator using the web console Copier lienLien copié sur presse-papiers!
As a cluster administrator, you can install the NUMA Resources Operator using the web console.
Procedure
Create a namespace for the NUMA Resources Operator:
-
In the OpenShift Container Platform web console, click Administration
Namespaces. -
Click Create Namespace, enter
openshift-numaresources
in the Name field, and then click Create.
-
In the OpenShift Container Platform web console, click Administration
Install the NUMA Resources Operator:
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Choose numaresources-operator from the list of available Operators, and then click Install.
-
In the Installed Namespaces field, select the
openshift-numaresources
namespace, and then click Install.
-
In the OpenShift Container Platform web console, click Operators
Optional: Verify that the NUMA Resources Operator installed successfully:
-
Switch to the Operators
Installed Operators page. Ensure that NUMA Resources Operator is listed in the
openshift-numaresources
namespace with a Status of InstallSucceeded.NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
-
Go to the Operators
Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status. -
Go to the Workloads
Pods page and check the logs for pods in the default
project.
-
Go to the Operators
-
Switch to the Operators
8.3. Scheduling NUMA-aware workloads Copier lienLien copié sur presse-papiers!
Clusters running latency-sensitive workloads typically feature performance profiles that help to minimize workload latency and optimize performance. The NUMA-aware scheduler deploys workloads based on available node NUMA resources and with respect to any performance profile settings applied to the node. The combination of NUMA-aware deployments, and the performance profile of the workload, ensures that workloads are scheduled in a way that maximizes performance.
For the NUMA Resources Operator to be fully operational, you must deploy the NUMAResourcesOperator
custom resource and the NUMA-aware secondary pod scheduler.
8.3.1. Creating the NUMAResourcesOperator custom resource Copier lienLien copié sur presse-papiers!
When you have installed the NUMA Resources Operator, then create the NUMAResourcesOperator
custom resource (CR) that instructs the NUMA Resources Operator to install all the cluster infrastructure needed to support the NUMA-aware scheduler, including daemon sets and APIs.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the NUMA Resources Operator.
Procedure
Create the
NUMAResourcesOperator
custom resource:Save the following minimal required YAML file example as
nrop.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- This must match the
MachineConfigPool
resource that you want to configure the NUMA Resources Operator on. For example, you might have created aMachineConfigPool
resource namedworker-cnf
that designates a set of nodes expected to run telecommunications workloads. EachNodeGroup
must match exactly oneMachineConfigPool
. Configurations whereNodeGroup
matches more than oneMachineConfigPool
are not supported.
Create the
NUMAResourcesOperator
CR by running the following command:oc create -f nrop.yaml
$ oc create -f nrop.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteCreating the
NUMAResourcesOperator
triggers a reboot on the corresponding machine config pool and therefore the affected node.
Optional: To enable NUMA-aware scheduling for multiple machine config pools (MCPs), define a separate
NodeGroup
for each pool. For example, define threeNodeGroups
forworker-cnf
,worker-ht
, andworker-other
, in theNUMAResourcesOperator
CR as shown in the following example:Example YAML definition for a
NUMAResourcesOperator
CR with multipleNodeGroups
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the NUMA Resources Operator deployed successfully by running the following command:
oc get numaresourcesoperators.nodetopology.openshift.io
$ oc get numaresourcesoperators.nodetopology.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesoperator 27s
NAME AGE numaresourcesoperator 27s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After a few minutes, run the following command to verify that the required resources deployed successfully:
oc get all -n openshift-numaresources
$ oc get all -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m pod/numaresourcesoperator-worker-7d96r 2/2 Running 0 97s pod/numaresourcesoperator-worker-crsht 2/2 Running 0 97s pod/numaresourcesoperator-worker-jp9mw 2/2 Running 0 97s
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m pod/numaresourcesoperator-worker-7d96r 2/2 Running 0 97s pod/numaresourcesoperator-worker-crsht 2/2 Running 0 97s pod/numaresourcesoperator-worker-jp9mw 2/2 Running 0 97s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.3.2. Deploying the NUMA-aware secondary pod scheduler Copier lienLien copié sur presse-papiers!
After you install the NUMA Resources Operator, do the following to deploy the NUMA-aware secondary pod scheduler:
Procedure
Create the
NUMAResourcesScheduler
custom resource that deploys the NUMA-aware custom pod scheduler:Save the following minimal required YAML in the
nro-scheduler.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
NUMAResourcesScheduler
CR by running the following command:oc create -f nro-scheduler.yaml
$ oc create -f nro-scheduler.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
After a few seconds, run the following command to confirm the successful deployment of the required resources:
oc get all -n openshift-numaresources
$ oc get all -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.3.3. Configuring a single NUMA node policy Copier lienLien copié sur presse-papiers!
The NUMA Resources Operator requires a single NUMA node policy to be configured on the cluster. This can be achieved in two ways: by creating and applying a performance profile, or by configuring a KubeletConfig.
The preferred way to configure a single NUMA node policy is to apply a performance profile. You can use the Performance Profile Creator (PPC) tool to create the performance profile. If a performance profile is created on the cluster, it automatically creates other tuning components like KubeletConfig
and the tuned
profile.
For more information about creating a performance profile, see "About the Performance Profile Creator" in the "Additional resources" section.
Additional resources
8.3.4. Sample performance profile Copier lienLien copié sur presse-papiers!
This example YAML shows a performance profile created by using the performance profile creator (PPC) tool:
- 1
- This should match the
MachineConfigPool
that you want to configure the NUMA Resources Operator on. For example, you might have created aMachineConfigPool
namedworker-cnf
that designates a set of nodes that run telecommunications workloads. - 2
- The
topologyPolicy
must be set tosingle-numa-node
. Ensure that this is the case by setting thetopology-manager-policy
argument tosingle-numa-node
when running the PPC tool.
8.3.5. Creating a KubeletConfig CR Copier lienLien copié sur presse-papiers!
The recommended way to configure a single NUMA node policy is to apply a performance profile. Another way is by creating and applying a KubeletConfig
custom resource (CR), as shown in the following procedure.
Procedure
Create the
KubeletConfig
custom resource (CR) that configures the pod admittance policy for the machine profile:Save the following YAML in the
nro-kubeletconfig.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Adjust this label to match the
machineConfigPoolSelector
in theNUMAResourcesOperator
CR. - 2
- For
cpuManagerPolicy
,static
must use a lowercases
. - 3
- Adjust this based on the CPU on your nodes.
- 4
- For
memoryManagerPolicy
,Static
must use an uppercaseS
. - 5
topologyManagerPolicy
must be set tosingle-numa-node
.
Create the
KubeletConfig
CR by running the following command:oc create -f nro-kubeletconfig.yaml
$ oc create -f nro-kubeletconfig.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteApplying performance profile or
KubeletConfig
automatically triggers rebooting of the nodes. If no reboot is triggered, you can troubleshoot the issue by looking at the labels inKubeletConfig
that address the node group.
8.3.6. Scheduling workloads with the NUMA-aware scheduler Copier lienLien copié sur presse-papiers!
Now that topo-aware-scheduler
is installed, the NUMAResourcesOperator
and NUMAResourcesScheduler
CRs are applied and your cluster has a matching performance profile or kubeletconfig
, you can schedule workloads with the NUMA-aware scheduler using deployment CRs that specify the minimum required resources to process the workload.
The following example deployment uses NUMA-aware scheduling for a sample workload.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Get the name of the NUMA-aware scheduler that is deployed in the cluster by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'
$ oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
"topo-aware-scheduler"
"topo-aware-scheduler"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
Deployment
CR that uses scheduler namedtopo-aware-scheduler
, for example:Save the following YAML in the
nro-deployment.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
schedulerName
must match the name of the NUMA-aware scheduler that is deployed in your cluster, for exampletopo-aware-scheduler
.
Create the
Deployment
CR by running the following command:oc create -f nro-deployment.yaml
$ oc create -f nro-deployment.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the deployment was successful:
oc get pods -n openshift-numaresources
$ oc get pods -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
topo-aware-scheduler
is scheduling the deployed pod by running the following command:oc describe pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources
$ oc describe pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m45s topo-aware-scheduler Successfully assigned openshift-numaresources/numa-deployment-1-6c4f5bdb84-wgn6g to worker-1
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m45s topo-aware-scheduler Successfully assigned openshift-numaresources/numa-deployment-1-6c4f5bdb84-wgn6g to worker-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteDeployments that request more resources than is available for scheduling will fail with a
MinimumReplicasUnavailable
error. The deployment succeeds when the required resources become available. Pods remain in thePending
state until the required resources are available.Verify that the expected allocated resources are listed for the node.
Identify the node that is running the deployment pod by running the following command:
oc get pods -n openshift-numaresources -o wide
$ oc get pods -n openshift-numaresources -o wide
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES numa-deployment-1-6c4f5bdb84-wgn6g 0/2 Running 0 82m 10.128.2.50 worker-1 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES numa-deployment-1-6c4f5bdb84-wgn6g 0/2 Running 0 82m 10.128.2.50 worker-1 <none> <none>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command with the name of that node that is running the deployment pod.
oc describe noderesourcetopologies.topology.node.k8s.io worker-1
$ oc describe noderesourcetopologies.topology.node.k8s.io worker-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
Available
capacity is reduced because of the resources that have been allocated to the guaranteed pod.
Resources consumed by guaranteed pods are subtracted from the available node resources listed under
noderesourcetopologies.topology.node.k8s.io
.
Resource allocations for pods with a
Best-effort
orBurstable
quality of service (qosClass
) are not reflected in the NUMA node resources undernoderesourcetopologies.topology.node.k8s.io
. If a pod’s consumed resources are not reflected in the node resource calculation, verify that the pod hasqosClass
ofGuaranteed
and the CPU request is an integer value, not a decimal value. You can verify the that the pod has aqosClass
ofGuaranteed
by running the following command:oc get pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources -o jsonpath="{ .status.qosClass }"
$ oc get pod numa-deployment-1-6c4f5bdb84-wgn6g -n openshift-numaresources -o jsonpath="{ .status.qosClass }"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Guaranteed
Guaranteed
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.4. Optional: Configuring polling operations for NUMA resources updates Copier lienLien copié sur presse-papiers!
The daemons controlled by the NUMA Resources Operator in their nodeGroup
poll resources to retrieve updates about available NUMA resources. You can fine-tune polling operations for these daemons by configuring the spec.nodeGroups
specification in the NUMAResourcesOperator
custom resource (CR). This provides advanced control of polling operations. Configure these specifications to improve scheduling behaviour and troubleshoot suboptimal scheduling decisions.
The configuration options are the following:
-
infoRefreshMode
: Determines the trigger condition for polling the kubelet. The NUMA Resources Operator reports the resulting information to the API server. -
infoRefreshPeriod
: Determines the duration between polling updates. podsFingerprinting
: Determines if point-in-time information for the current set of pods running on a node is exposed in polling updates.NotepodsFingerprinting
is enabled by default.podsFingerprinting
is a requirement for thecacheResyncPeriod
specification in theNUMAResourcesScheduler
CR. ThecacheResyncPeriod
specification helps to report more exact resource availability by monitoring pending resources on nodes.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the NUMA Resources Operator.
Procedure
Configure the
spec.nodeGroups
specification in yourNUMAResourcesOperator
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Valid values are
Periodic
,Events
,PeriodicAndEvents
. UsePeriodic
to poll the kubelet at intervals that you define ininfoRefreshPeriod
. UseEvents
to poll the kubelet at every pod lifecycle event. UsePeriodicAndEvents
to enable both methods. - 2
- Define the polling interval for
Periodic
orPeriodicAndEvents
refresh modes. The field is ignored if the refresh mode isEvents
. - 3
- Valid values are
Enabled
,Disabled
, andEnabledExclusiveResources
. Setting toEnabled
is a requirement for thecacheResyncPeriod
specification in theNUMAResourcesScheduler
.
Verification
After you deploy the NUMA Resources Operator, verify that the node group configurations were applied by running the following command:
oc get numaresop numaresourcesoperator -o json | jq '.status'
$ oc get numaresop numaresourcesoperator -o json | jq '.status'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5. Troubleshooting NUMA-aware scheduling Copier lienLien copié sur presse-papiers!
To troubleshoot common problems with NUMA-aware pod scheduling, perform the following steps.
Prerequisites
-
Install the OpenShift Container Platform CLI (
oc
). - Log in as a user with cluster-admin privileges.
- Install the NUMA Resources Operator and deploy the NUMA-aware secondary scheduler.
Procedure
Verify that the
noderesourcetopologies
CRD is deployed in the cluster by running the following command:oc get crd | grep noderesourcetopologies
$ oc get crd | grep noderesourcetopologies
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT noderesourcetopologies.topology.node.k8s.io 2022-01-18T08:28:06Z
NAME CREATED AT noderesourcetopologies.topology.node.k8s.io 2022-01-18T08:28:06Z
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the NUMA-aware scheduler name matches the name specified in your NUMA-aware workloads by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'
$ oc get numaresourcesschedulers.nodetopology.openshift.io numaresourcesscheduler -o json | jq '.status.schedulerName'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
topo-aware-scheduler
topo-aware-scheduler
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that NUMA-aware schedulable nodes have the
noderesourcetopologies
CR applied to them. Run the following command:oc get noderesourcetopologies.topology.node.k8s.io
$ oc get noderesourcetopologies.topology.node.k8s.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE compute-0.example.com 17h compute-1.example.com 17h
NAME AGE compute-0.example.com 17h compute-1.example.com 17h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe number of nodes should equal the number of worker nodes that are configured by the machine config pool (
mcp
) worker definition.Verify the NUMA zone granularity for all schedulable nodes by running the following command:
oc get noderesourcetopologies.topology.node.k8s.io -o yaml
$ oc get noderesourcetopologies.topology.node.k8s.io -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5.1. Reporting more exact resource availability Copier lienLien copié sur presse-papiers!
Enable the cacheResyncPeriod
specification to help the NUMA Resources Operator report more exact resource availability by monitoring pending resources on nodes and synchronizing this information in the scheduler cache at a defined interval. This also helps to minimize Topology Affinity Error errors because of sub-optimal scheduling decisions. The lower the interval, the greater the network load. The cacheResyncPeriod
specification is disabled by default.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Delete the currently running
NUMAResourcesScheduler
resource:Get the active
NUMAResourcesScheduler
by running the following command:oc get NUMAResourcesScheduler
$ oc get NUMAResourcesScheduler
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 92m
NAME AGE numaresourcesscheduler 92m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the secondary scheduler resource by running the following command:
oc delete NUMAResourcesScheduler numaresourcesscheduler
$ oc delete NUMAResourcesScheduler numaresourcesscheduler
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Save the following YAML in the file
nro-scheduler-cacheresync.yaml
. This example changes the log level toDebug
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Enter an interval value in seconds for synchronization of the scheduler cache. A value of
5s
is typical for most implementations.
Create the updated
NUMAResourcesScheduler
resource by running the following command:oc create -f nro-scheduler-cacheresync.yaml
$ oc create -f nro-scheduler-cacheresync.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
Check that the NUMA-aware scheduler was successfully deployed:
Run the following command to check that the CRD is created successfully:
oc get crd | grep numaresourcesschedulers
$ oc get crd | grep numaresourcesschedulers
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the new custom scheduler is available by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io
$ oc get numaresourcesschedulers.nodetopology.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 3h26m
NAME AGE numaresourcesscheduler 3h26m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the logs for the scheduler show the increased log level:
Get the list of pods running in the
openshift-numaresources
namespace by running the following command:oc get pods -n openshift-numaresources
$ oc get pods -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the logs for the secondary scheduler pod by running the following command:
oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources
$ oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5.2. Checking the NUMA-aware scheduler logs Copier lienLien copié sur presse-papiers!
Troubleshoot problems with the NUMA-aware scheduler by reviewing the logs. If required, you can increase the scheduler log level by modifying the spec.logLevel
field of the NUMAResourcesScheduler
resource. Acceptable values are Normal
, Debug
, and Trace
, with Trace
being the most verbose option.
To change the log level of the secondary scheduler, delete the running scheduler resource and re-deploy it with the changed log level. The scheduler is unavailable for scheduling new workloads during this downtime.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Delete the currently running
NUMAResourcesScheduler
resource:Get the active
NUMAResourcesScheduler
by running the following command:oc get NUMAResourcesScheduler
$ oc get NUMAResourcesScheduler
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 90m
NAME AGE numaresourcesscheduler 90m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the secondary scheduler resource by running the following command:
oc delete NUMAResourcesScheduler numaresourcesscheduler
$ oc delete NUMAResourcesScheduler numaresourcesscheduler
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Save the following YAML in the file
nro-scheduler-debug.yaml
. This example changes the log level toDebug
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the updated
Debug
loggingNUMAResourcesScheduler
resource by running the following command:oc create -f nro-scheduler-debug.yaml
$ oc create -f nro-scheduler-debug.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
Check that the NUMA-aware scheduler was successfully deployed:
Run the following command to check that the CRD is created successfully:
oc get crd | grep numaresourcesschedulers
$ oc get crd | grep numaresourcesschedulers
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
NAME CREATED AT numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the new custom scheduler is available by running the following command:
oc get numaresourcesschedulers.nodetopology.openshift.io
$ oc get numaresourcesschedulers.nodetopology.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE numaresourcesscheduler 3h26m
NAME AGE numaresourcesscheduler 3h26m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the logs for the scheduler shows the increased log level:
Get the list of pods running in the
openshift-numaresources
namespace by running the following command:oc get pods -n openshift-numaresources
$ oc get pods -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m
NAME READY STATUS RESTARTS AGE numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h numaresourcesoperator-worker-pb75c 2/2 Running 0 45h secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the logs for the secondary scheduler pod by running the following command:
oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources
$ oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5.3. Troubleshooting the resource topology exporter Copier lienLien copié sur presse-papiers!
Troubleshoot noderesourcetopologies
objects where unexpected results are occurring by inspecting the corresponding resource-topology-exporter
logs.
It is recommended that NUMA resource topology exporter instances in the cluster are named for nodes they refer to. For example, a worker node with the name worker
should have a corresponding noderesourcetopologies
object called worker
.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Get the daemonsets managed by the NUMA Resources Operator. Each daemonset has a corresponding
nodeGroup
in theNUMAResourcesOperator
CR. Run the following command:oc get numaresourcesoperators.nodetopology.openshift.io numaresourcesoperator -o jsonpath="{.status.daemonsets[0]}"
$ oc get numaresourcesoperators.nodetopology.openshift.io numaresourcesoperator -o jsonpath="{.status.daemonsets[0]}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"name":"numaresourcesoperator-worker","namespace":"openshift-numaresources"}
{"name":"numaresourcesoperator-worker","namespace":"openshift-numaresources"}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the label for the daemonset of interest using the value for
name
from the previous step:oc get ds -n openshift-numaresources numaresourcesoperator-worker -o jsonpath="{.spec.selector.matchLabels}"
$ oc get ds -n openshift-numaresources numaresourcesoperator-worker -o jsonpath="{.spec.selector.matchLabels}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"name":"resource-topology"}
{"name":"resource-topology"}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the pods using the
resource-topology
label by running the following command:oc get pods -n openshift-numaresources -l name=resource-topology -o wide
$ oc get pods -n openshift-numaresources -l name=resource-topology -o wide
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE numaresourcesoperator-worker-5wm2k 2/2 Running 0 2d1h 10.135.0.64 compute-0.example.com numaresourcesoperator-worker-pb75c 2/2 Running 0 2d1h 10.132.2.33 compute-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE numaresourcesoperator-worker-5wm2k 2/2 Running 0 2d1h 10.135.0.64 compute-0.example.com numaresourcesoperator-worker-pb75c 2/2 Running 0 2d1h 10.132.2.33 compute-1.example.com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Examine the logs of the
resource-topology-exporter
container running on the worker pod that corresponds to the node you are troubleshooting. Run the following command:oc logs -n openshift-numaresources -c resource-topology-exporter numaresourcesoperator-worker-pb75c
$ oc logs -n openshift-numaresources -c resource-topology-exporter numaresourcesoperator-worker-pb75c
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5.4. Correcting a missing resource topology exporter config map Copier lienLien copié sur presse-papiers!
If you install the NUMA Resources Operator in a cluster with misconfigured cluster settings, in some circumstances, the Operator is shown as active but the logs of the resource topology exporter (RTE) daemon set pods show that the configuration for the RTE is missing, for example:
Info: couldn't find configuration in "/etc/resource-topology-exporter/config.yaml"
Info: couldn't find configuration in "/etc/resource-topology-exporter/config.yaml"
This log message indicates that the kubeletconfig
with the required configuration was not properly applied in the cluster, resulting in a missing RTE configmap
. For example, the following cluster is missing a numaresourcesoperator-worker
configmap
custom resource (CR):
oc get configmap
$ oc get configmap
Example output
NAME DATA AGE 0e2a6bd3.openshift-kni.io 0 6d21h kube-root-ca.crt 1 6d21h openshift-service-ca.crt 1 6d21h topo-aware-scheduler-config 1 6d18h
NAME DATA AGE
0e2a6bd3.openshift-kni.io 0 6d21h
kube-root-ca.crt 1 6d21h
openshift-service-ca.crt 1 6d21h
topo-aware-scheduler-config 1 6d18h
In a correctly configured cluster, oc get configmap
also returns a numaresourcesoperator-worker
configmap
CR.
Prerequisites
-
Install the OpenShift Container Platform CLI (
oc
). - Log in as a user with cluster-admin privileges.
- Install the NUMA Resources Operator and deploy the NUMA-aware secondary scheduler.
Procedure
Compare the values for
spec.machineConfigPoolSelector.matchLabels
inkubeletconfig
andmetadata.labels
in theMachineConfigPool
(mcp
) worker CR using the following commands:Check the
kubeletconfig
labels by running the following command:oc get kubeletconfig -o yaml
$ oc get kubeletconfig -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
machineConfigPoolSelector: matchLabels: cnf-worker-tuning: enabled
machineConfigPoolSelector: matchLabels: cnf-worker-tuning: enabled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
mcp
labels by running the following command:oc get mcp worker -o yaml
$ oc get mcp worker -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: ""
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: ""
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
cnf-worker-tuning: enabled
label is not present in theMachineConfigPool
object.
Edit the
MachineConfigPool
CR to include the missing label, for example:oc edit mcp worker -o yaml
$ oc edit mcp worker -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: "" cnf-worker-tuning: enabled
labels: machineconfiguration.openshift.io/mco-built-in: "" pools.operator.machineconfiguration.openshift.io/worker: "" cnf-worker-tuning: enabled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Apply the label changes and wait for the cluster to apply the updated configuration. Run the following command:
Verification
Check that the missing
numaresourcesoperator-worker
configmap
CR is applied:oc get configmap
$ oc get configmap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow