Chapter 4. Controlling pod placement onto nodes (scheduling)
4.1. Controlling pod placement using the scheduler Copy linkLink copied to clipboard!
Pod scheduling is an internal process that determines placement of new pods onto nodes within the cluster.
The scheduler code has a clean separation that watches new pods as they get created and identifies the most suitable node to host them. It then creates bindings (pod to node bindings) for the pods using the master API.
- Default pod scheduling
- OpenShift Container Platform comes with a default scheduler that serves the needs of most users. The default scheduler uses both inherent and customization tools to determine the best fit for a pod.
- Advanced pod scheduling
In situations where you might want more control over where new pods are placed, the OpenShift Container Platform advanced scheduling features allow you to configure a pod so that the pod is required or has a preference to run on a particular node or alongside a specific pod.
You can control pod placement by using the following scheduling features:
4.1.1. About the default scheduler Copy linkLink copied to clipboard!
The default OpenShift Container Platform pod scheduler is responsible for determining the placement of new pods onto nodes within the cluster. It reads data from the pod and finds a node that is a good fit based on configured profiles. It is completely independent and exists as a standalone solution. It does not modify the pod; it creates a binding for the pod that ties the pod to the particular node.
4.1.1.1. Understanding default scheduling Copy linkLink copied to clipboard!
The existing generic scheduler is the default platform-provided scheduler engine that selects a node to host the pod in a three-step operation:
- Filters the nodes
- The available nodes are filtered based on the constraints or requirements specified. This is done by running each node through the list of filter functions called predicates, or filters.
- Prioritizes the filtered list of nodes
- This is achieved by passing each node through a series of priority, or scoring, functions that assign it a score between 0 - 10, with 0 indicating a bad fit and 10 indicating a good fit to host the pod. The scheduler configuration can also take in a simple weight (positive numeric value) for each scoring function. The node score provided by each scoring function is multiplied by the weight (default weight for most scores is 1) and then combined by adding the scores for each node provided by all the scores. This weight attribute can be used by administrators to give higher importance to some scores.
- Selects the best fit node
- The nodes are sorted based on their scores and the node with the highest score is selected to host the pod. If multiple nodes have the same high score, then one of them is selected at random.
4.1.2. Scheduler use cases Copy linkLink copied to clipboard!
One of the important use cases for scheduling within OpenShift Container Platform is to support flexible affinity and anti-affinity policies.
4.1.2.1. Infrastructure topological levels Copy linkLink copied to clipboard!
Administrators can define multiple topological levels for their infrastructure (nodes) by specifying labels on nodes. For example: region=r1, zone=z1, rack=s1.
These label names have no particular meaning and administrators are free to name their infrastructure levels anything, such as city/building/room. Also, administrators can define any number of levels for their infrastructure topology, with three levels usually being adequate (such as: regions zones racks). Administrators can specify affinity and anti-affinity rules at each of these levels in any combination.
4.1.2.2. Affinity Copy linkLink copied to clipboard!
Administrators should be able to configure the scheduler to specify affinity at any topological level, or even at multiple levels. Affinity at a particular level indicates that all pods that belong to the same service are scheduled onto nodes that belong to the same level. This handles any latency requirements of applications by allowing administrators to ensure that peer pods do not end up being too geographically separated. If no node is available within the same affinity group to host the pod, then the pod is not scheduled.
If you need greater control over where the pods are scheduled, see Controlling pod placement on nodes using node affinity rules and Placing pods relative to other pods using affinity and anti-affinity rules.
These advanced scheduling features allow administrators to specify which node a pod can be scheduled on and to force or reject scheduling relative to other pods.
4.1.2.3. Anti-affinity Copy linkLink copied to clipboard!
Administrators should be able to configure the scheduler to specify anti-affinity at any topological level, or even at multiple levels. Anti-affinity (or 'spread') at a particular level indicates that all pods that belong to the same service are spread across nodes that belong to that level. This ensures that the application is well spread for high availability purposes. The scheduler tries to balance the service pods across all applicable nodes as evenly as possible.
If you need greater control over where the pods are scheduled, see Controlling pod placement on nodes using node affinity rules and Placing pods relative to other pods using affinity and anti-affinity rules.
These advanced scheduling features allow administrators to specify which node a pod can be scheduled on and to force or reject scheduling relative to other pods.
4.2. Scheduling pods using a scheduler profile Copy linkLink copied to clipboard!
You can configure OpenShift Container Platform to use a scheduling profile to schedule pods onto nodes within the cluster.
4.2.1. About scheduler profiles Copy linkLink copied to clipboard!
You can specify a scheduler profile to control how pods are scheduled onto nodes.
The following scheduler profiles are available:
LowNodeUtilization- This profile attempts to spread pods evenly across nodes to get low resource usage per node. This profile provides the default scheduler behavior.
HighNodeUtilization- This profile attempts to place as many pods as possible on to as few nodes as possible. This minimizes node count and has high resource usage per node.
Switching to the HighNodeUtilization scheduler profile will result in all pods of a ReplicaSet object being scheduled on the same node. This will add an increased risk for pod failure if the node fails.
NoScoring- This is a low-latency profile that strives for the quickest scheduling cycle by disabling all score plugins. This might sacrifice better scheduling decisions for faster ones.
4.2.2. Configuring a scheduler profile Copy linkLink copied to clipboard!
You can configure the scheduler to use a scheduler profile.
Prerequisites
-
Access to the cluster as a user with the
cluster-adminrole.
Procedure
Edit the
Schedulerobject:oc edit scheduler cluster
$ oc edit scheduler clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Specify the profile to use in the
spec.profilefield:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Set to
LowNodeUtilization,HighNodeUtilization, orNoScoring.
- Save the file to apply the changes.
4.3. Placing pods relative to other pods using affinity and anti-affinity rules Copy linkLink copied to clipboard!
Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.
In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.
4.3.1. Understanding pod affinity Copy linkLink copied to clipboard!
Pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.
- Pod affinity can tell the scheduler to locate a new pod on the same node as other pods if the label selector on the new pod matches the label on the current pod.
- Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.
For example, using affinity rules, you could spread or pack pods within a service or relative to pods in other services. Anti-affinity rules allow you to prevent pods of a particular service from scheduling on the same nodes as pods of another service that are known to interfere with the performance of the pods of the first service. Or, you could spread the pods of a service across nodes, availability zones, or availability sets to reduce correlated failures.
A label selector might match pods with multiple pod deployments. Use unique combinations of labels when configuring anti-affinity rules to avoid matching pods.
There are two types of pod affinity rules: required and preferred.
Required rules must be met before a pod can be scheduled on a node. Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.
Depending on your pod priority and preemption settings, the scheduler might not be able to find an appropriate node for a pod without violating affinity requirements. If so, a pod might not be scheduled.
To prevent this situation, carefully configure pod affinity with equal-priority pods.
You configure pod affinity/anti-affinity through the Pod spec files. You can specify a required rule, a preferred rule, or both. If you specify both, the node must first meet the required rule, then attempts to meet the preferred rule.
The following example shows a Pod spec configured for pod affinity and anti-affinity.
In this example, the pod affinity rule indicates that the pod can schedule onto a node only if that node has at least one already-running pod with a label that has the key security and value S1. The pod anti-affinity rule says that the pod prefers to not schedule onto a node if that node is already running a pod with label having key security and value S2.
Sample Pod config file with pod affinity
- 1
- Stanza to configure pod affinity.
- 2
- Defines a required rule.
- 3 5
- The key and value (label) that must be matched to apply the rule.
- 4
- The operator represents the relationship between the label on the existing pod and the set of values in the
matchExpressionparameters in the specification for the new pod. Can beIn,NotIn,Exists, orDoesNotExist.
Sample Pod config file with pod anti-affinity
- 1
- Stanza to configure pod anti-affinity.
- 2
- Defines a preferred rule.
- 3
- Specifies a weight for a preferred rule. The node with the highest weight is preferred.
- 4
- Description of the pod label that determines when the anti-affinity rule applies. Specify a key and value for the label.
- 5
- The operator represents the relationship between the label on the existing pod and the set of values in the
matchExpressionparameters in the specification for the new pod. Can beIn,NotIn,Exists, orDoesNotExist.
If labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node.
4.3.2. Configuring a pod affinity rule Copy linkLink copied to clipboard!
The following steps demonstrate a simple two-pod configuration that creates pod with a label and a pod that uses affinity to allow scheduling with that pod.
You cannot add an affinity directly to a scheduled pod.
Procedure
Create a pod with a specific label in the pod spec:
Create a YAML file with the following content:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the pod.
oc create -f <pod-spec>.yaml
$ oc create -f <pod-spec>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
When creating other pods, configure the following parameters to add the affinity:
Create a YAML file with the following content:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Adds a pod affinity.
- 2
- Configures the
requiredDuringSchedulingIgnoredDuringExecutionparameter or thepreferredDuringSchedulingIgnoredDuringExecutionparameter. - 3
- Specifies the
keyandvaluesthat must be met. If you want the new pod to be scheduled with the other pod, use the samekeyandvaluesparameters as the label on the first pod. - 4
- Specifies an
operator. The operator can beIn,NotIn,Exists, orDoesNotExist. For example, use the operatorInto require the label to be in the node. - 5
- Specify a
topologyKey, which is a prepopulated Kubernetes label that the system uses to denote such a topology domain.
Create the pod.
oc create -f <pod-spec>.yaml
$ oc create -f <pod-spec>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3.3. Configuring a pod anti-affinity rule Copy linkLink copied to clipboard!
The following steps demonstrate a simple two-pod configuration that creates pod with a label and a pod that uses an anti-affinity preferred rule to attempt to prevent scheduling with that pod.
You cannot add an affinity directly to a scheduled pod.
Procedure
Create a pod with a specific label in the pod spec:
Create a YAML file with the following content:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the pod.
oc create -f <pod-spec>.yaml
$ oc create -f <pod-spec>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
When creating other pods, configure the following parameters:
Create a YAML file with the following content:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Adds a pod anti-affinity.
- 2
- Configures the
requiredDuringSchedulingIgnoredDuringExecutionparameter or thepreferredDuringSchedulingIgnoredDuringExecutionparameter. - 3
- For a preferred rule, specifies a weight for the node, 1-100. The node that with highest weight is preferred.
- 4
- Specifies the
keyandvaluesthat must be met. If you want the new pod to not be scheduled with the other pod, use the samekeyandvaluesparameters as the label on the first pod. - 5
- Specifies an
operator. The operator can beIn,NotIn,Exists, orDoesNotExist. For example, use the operatorInto require the label to be in the node. - 6
- Specifies a
topologyKey, which is a prepopulated Kubernetes label that the system uses to denote such a topology domain.
Create the pod.
oc create -f <pod-spec>.yaml
$ oc create -f <pod-spec>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3.4. Sample pod affinity and anti-affinity rules Copy linkLink copied to clipboard!
The following examples demonstrate pod affinity and pod anti-affinity.
4.3.4.1. Pod Affinity Copy linkLink copied to clipboard!
The following example demonstrates pod affinity for pods with matching labels and label selectors.
The pod team4 has the label
team:4.Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod team4a has the label selector
team:4underpodAffinity.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - The team4a pod is scheduled on the same node as the team4 pod.
4.3.4.2. Pod Anti-affinity Copy linkLink copied to clipboard!
The following example demonstrates pod anti-affinity for pods with matching labels and label selectors.
The pod pod-s1 has the label
security:s1.Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod pod-s2 has the label selector
security:s1underpodAntiAffinity.Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
The pod pod-s2 cannot be scheduled on the same node as
pod-s1.
4.3.4.3. Pod Affinity with no Matching Labels Copy linkLink copied to clipboard!
The following example demonstrates pod affinity for pods without matching labels and label selectors.
The pod pod-s1 has the label
security:s1.Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod pod-s2 has the label selector
security:s2.Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod pod-s2 is not scheduled unless there is a node with a pod that has the
security:s2label. If there is no other pod with that label, the new pod remains in a pending state:Example output
NAME READY STATUS RESTARTS AGE IP NODE pod-s2 0/1 Pending 0 32s <none>
NAME READY STATUS RESTARTS AGE IP NODE pod-s2 0/1 Pending 0 32s <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3.5. Using pod affinity and anti-affinity to control where an Operator is installed Copy linkLink copied to clipboard!
By default, when you install an Operator, OpenShift Container Platform installs the Operator pod to one of your worker nodes randomly. However, there might be situations where you want that pod scheduled on a specific node or set of nodes.
The following examples describe situations where you might want to schedule an Operator pod to a specific node or set of nodes:
-
If an Operator requires a particular platform, such as
amd64orarm64 - If an Operator requires a particular operating system, such as Linux or Windows
- If you want Operators that work together scheduled on the same host or on hosts located on the same rack
- If you want Operators dispersed throughout the infrastructure to avoid downtime due to network or hardware issues
You can control where an Operator pod is installed by adding a pod affinity or anti-affinity to the Operator’s Subscription object.
The following example shows how to use pod anti-affinity to prevent the installation the Custom Metrics Autoscaler Operator from any node that has pods with a specific label:
Pod affinity example that places the Operator pod on one or more specific nodes
- 1
- A pod affinity that places the Operator’s pod on a node that has pods with the
app=testlabel.
Pod anti-affinity example that prevents the Operator pod from one or more specific nodes
- 1
- A pod anti-affinity that prevents the Operator’s pod from being scheduled on a node that has pods with the
cpu=highlabel.
Procedure
To control the placement of an Operator pod, complete the following steps:
- Install the Operator as usual.
- If needed, ensure that your nodes are labeled to properly respond to the affinity.
Edit the Operator
Subscriptionobject to add an affinity:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add a
podAffinityorpodAntiAffinity.
Verification
To ensure that the pod is deployed on the specific node, run the following command:
$ oc get pods -o wide
$ oc get pods -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES custom-metrics-autoscaler-operator-5dcc45d656-bhshg 1/1 Running 0 50s 10.131.0.20 ip-10-0-185-229.ec2.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES custom-metrics-autoscaler-operator-5dcc45d656-bhshg 1/1 Running 0 50s 10.131.0.20 ip-10-0-185-229.ec2.internal <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4. Controlling pod placement on nodes using node affinity rules Copy linkLink copied to clipboard!
Affinity is a property of pods that controls the nodes on which they prefer to be scheduled.
In OpenShift Container Platform node affinity is a set of rules used by the scheduler to determine where a pod can be placed. The rules are defined using custom labels on the nodes and label selectors specified in pods.
4.4.1. Understanding node affinity Copy linkLink copied to clipboard!
Node affinity allows a pod to specify an affinity towards a group of nodes it can be placed on. The node does not have control over the placement.
For example, you could configure a pod to only run on a node with a specific CPU or in a specific availability zone.
There are two types of node affinity rules: required and preferred.
Required rules must be met before a pod can be scheduled on a node. Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.
If labels on a node change at runtime that results in an node affinity rule on a pod no longer being met, the pod continues to run on the node.
You configure node affinity through the Pod spec file. You can specify a required rule, a preferred rule, or both. If you specify both, the node must first meet the required rule, then attempts to meet the preferred rule.
The following example is a Pod spec with a rule that requires the pod be placed on a node with a label whose key is e2e-az-NorthSouth and whose value is either e2e-az-North or e2e-az-South:
Example pod configuration file with a node affinity required rule
- 1
- The stanza to configure node affinity.
- 2
- Defines a required rule.
- 3 5 6
- The key/value pair (label) that must be matched to apply the rule.
- 4
- The operator represents the relationship between the label on the node and the set of values in the
matchExpressionparameters in thePodspec. This value can beIn,NotIn,Exists, orDoesNotExist,Lt, orGt.
The following example is a node specification with a preferred rule that a node with a label whose key is e2e-az-EastWest and whose value is either e2e-az-East or e2e-az-West is preferred for the pod:
Example pod configuration file with a node affinity preferred rule
- 1
- The stanza to configure node affinity.
- 2
- Defines a preferred rule.
- 3
- Specifies a weight for a preferred rule. The node with highest weight is preferred.
- 4 6 7
- The key/value pair (label) that must be matched to apply the rule.
- 5
- The operator represents the relationship between the label on the node and the set of values in the
matchExpressionparameters in thePodspec. This value can beIn,NotIn,Exists, orDoesNotExist,Lt, orGt.
There is no explicit node anti-affinity concept, but using the NotIn or DoesNotExist operator replicates that behavior.
If you are using node affinity and node selectors in the same pod configuration, note the following:
-
If you configure both
nodeSelectorandnodeAffinity, both conditions must be satisfied for the pod to be scheduled onto a candidate node. -
If you specify multiple
nodeSelectorTermsassociated withnodeAffinitytypes, then the pod can be scheduled onto a node if one of thenodeSelectorTermsis satisfied. -
If you specify multiple
matchExpressionsassociated withnodeSelectorTerms, then the pod can be scheduled onto a node only if allmatchExpressionsare satisfied.
4.4.2. Configuring a required node affinity rule Copy linkLink copied to clipboard!
Required rules must be met before a pod can be scheduled on a node.
Procedure
The following steps demonstrate a simple configuration that creates a node and a pod that the scheduler is required to place on the node.
Add a label to a node using the
oc label nodecommand:oc label node node1 e2e-az-name=e2e-az1
$ oc label node node1 e2e-az-name=e2e-az1Copy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add the label:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a pod with a specific label in the pod spec:
Create a YAML file with the following content:
NoteYou cannot add an affinity directly to a scheduled pod.
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Adds a pod affinity.
- 2
- Configures the
requiredDuringSchedulingIgnoredDuringExecutionparameter. - 3
- Specifies the
keyandvaluesthat must be met. If you want the new pod to be scheduled on the node you edited, use the samekeyandvaluesparameters as the label in the node. - 4
- Specifies an
operator. The operator can beIn,NotIn,Exists, orDoesNotExist. For example, use the operatorInto require the label to be in the node.
Create the pod:
oc create -f <file-name>.yaml
$ oc create -f <file-name>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.3. Configuring a preferred node affinity rule Copy linkLink copied to clipboard!
Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.
Procedure
The following steps demonstrate a simple configuration that creates a node and a pod that the scheduler tries to place on the node.
Add a label to a node using the
oc label nodecommand:oc label node node1 e2e-az-name=e2e-az3
$ oc label node node1 e2e-az-name=e2e-az3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a pod with a specific label:
Create a YAML file with the following content:
NoteYou cannot add an affinity directly to a scheduled pod.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Adds a pod affinity.
- 2
- Configures the
preferredDuringSchedulingIgnoredDuringExecutionparameter. - 3
- Specifies a weight for the node, as a number 1-100. The node with highest weight is preferred.
- 4
- Specifies the
keyandvaluesthat must be met. If you want the new pod to be scheduled on the node you edited, use the samekeyandvaluesparameters as the label in the node. - 5
- Specifies an
operator. The operator can beIn,NotIn,Exists, orDoesNotExist. For example, use the operatorInto require the label to be in the node.
Create the pod.
oc create -f <file-name>.yaml
$ oc create -f <file-name>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.4. Sample node affinity rules Copy linkLink copied to clipboard!
The following examples demonstrate node affinity.
4.4.4.1. Node affinity with matching labels Copy linkLink copied to clipboard!
The following example demonstrates node affinity for a node and pod with matching labels:
The Node1 node has the label
zone:us:oc label node node1 zone=us
$ oc label node node1 zone=usCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add the label:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod-s1 pod has the
zoneanduskey/value pair under a required node affinity rule:cat pod-s1.yaml
$ cat pod-s1.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod-s1 pod can be scheduled on Node1:
oc get pod -o wide
$ oc get pod -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE pod-s1 1/1 Running 0 4m IP1 node1
NAME READY STATUS RESTARTS AGE IP NODE pod-s1 1/1 Running 0 4m IP1 node1Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.4.2. Node affinity with no matching labels Copy linkLink copied to clipboard!
The following example demonstrates node affinity for a node and pod without matching labels:
The Node1 node has the label
zone:emea:oc label node node1 zone=emea
$ oc label node node1 zone=emeaCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add the label:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod-s1 pod has the
zoneanduskey/value pair under a required node affinity rule:cat pod-s1.yaml
$ cat pod-s1.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The pod-s1 pod cannot be scheduled on Node1:
oc describe pod pod-s1
$ oc describe pod pod-s1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.5. Using node affinity to control where an Operator is installed Copy linkLink copied to clipboard!
By default, when you install an Operator, OpenShift Container Platform installs the Operator pod to one of your worker nodes randomly. However, there might be situations where you want that pod scheduled on a specific node or set of nodes.
The following examples describe situations where you might want to schedule an Operator pod to a specific node or set of nodes:
-
If an Operator requires a particular platform, such as
amd64orarm64 - If an Operator requires a particular operating system, such as Linux or Windows
- If you want Operators that work together scheduled on the same host or on hosts located on the same rack
- If you want Operators dispersed throughout the infrastructure to avoid downtime due to network or hardware issues
You can control where an Operator pod is installed by adding a node affinity constraints to the Operator’s Subscription object.
The following examples show how to use node affinity to install an instance of the Custom Metrics Autoscaler Operator to a specific node in the cluster:
Node affinity example that places the Operator pod on a specific node
- 1
- A node affinity that requires the Operator’s pod to be scheduled on a node named
ip-10-0-163-94.us-west-2.compute.internal.
Node affinity example that places the Operator pod on a node with a specific platform
- 1
- A node affinity that requires the Operator’s pod to be scheduled on a node with the
kubernetes.io/arch=arm64andkubernetes.io/os=linuxlabels.
Procedure
To control the placement of an Operator pod, complete the following steps:
- Install the Operator as usual.
- If needed, ensure that your nodes are labeled to properly respond to the affinity.
Edit the Operator
Subscriptionobject to add an affinity:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add a
nodeAffinity.
Verification
To ensure that the pod is deployed on the specific node, run the following command:
$ oc get pods -o wide
$ oc get pods -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES custom-metrics-autoscaler-operator-5dcc45d656-bhshg 1/1 Running 0 50s 10.131.0.20 ip-10-0-185-229.ec2.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES custom-metrics-autoscaler-operator-5dcc45d656-bhshg 1/1 Running 0 50s 10.131.0.20 ip-10-0-185-229.ec2.internal <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5. Placing pods onto overcommited nodes Copy linkLink copied to clipboard!
In an overcommited state, the sum of the container compute resource requests and limits exceeds the resources available on the system. Overcommitment might be desirable in development environments where a trade-off of guaranteed performance for capacity is acceptable.
Requests and limits enable administrators to allow and manage the overcommitment of resources on a node. The scheduler uses requests for scheduling your container and providing a minimum service guarantee. Limits constrain the amount of compute resource that may be consumed on your node.
4.5.1. Understanding overcommitment Copy linkLink copied to clipboard!
Requests and limits enable administrators to allow and manage the overcommitment of resources on a node. The scheduler uses requests for scheduling your container and providing a minimum service guarantee. Limits constrain the amount of compute resource that may be consumed on your node.
OpenShift Container Platform administrators can control the level of overcommit and manage container density on nodes by configuring masters to override the ratio between request and limit set on developer containers. In conjunction with a per-project LimitRange object specifying limits and defaults, this adjusts the container limit and request to achieve the desired level of overcommit.
That these overrides have no effect if no limits have been set on containers. Create a LimitRange object with default limits, per individual project, or in the project template, to ensure that the overrides apply.
After these overrides, the container limits and requests must still be validated by any LimitRange object in the project. It is possible, for example, for developers to specify a limit close to the minimum limit, and have the request then be overridden below the minimum limit, causing the pod to be forbidden. This unfortunate user experience should be addressed with future work, but for now, configure this capability and LimitRange objects with caution.
4.5.2. Understanding nodes overcommitment Copy linkLink copied to clipboard!
In an overcommitted environment, it is important to properly configure your node to provide best system behavior.
When the node starts, it ensures that the kernel tunable flags for memory management are set properly. The kernel should never fail memory allocations unless it runs out of physical memory.
To ensure this behavior, OpenShift Container Platform configures the kernel to always overcommit memory by setting the vm.overcommit_memory parameter to 1, overriding the default operating system setting.
OpenShift Container Platform also configures the kernel not to panic when it runs out of memory by setting the vm.panic_on_oom parameter to 0. A setting of 0 instructs the kernel to call oom_killer in an Out of Memory (OOM) condition, which kills processes based on priority.
You can view the current setting by running the following commands on your nodes:
sysctl -a |grep commit
$ sysctl -a |grep commit
Example output
#... vm.overcommit_memory = 0 #...
#...
vm.overcommit_memory = 0
#...
sysctl -a |grep panic
$ sysctl -a |grep panic
Example output
#... vm.panic_on_oom = 0 #...
#...
vm.panic_on_oom = 0
#...
The above flags should already be set on nodes, and no further action is required.
You can also perform the following configurations for each node:
- Disable or enforce CPU limits using CPU CFS quotas
- Reserve resources for system processes
- Reserve memory across quality of service tiers
4.6. Controlling pod placement using node taints Copy linkLink copied to clipboard!
Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.
4.6.1. Understanding taints and tolerations Copy linkLink copied to clipboard!
A taint allows a node to refuse a pod to be scheduled unless that pod has a matching toleration.
You apply taints to a node through the Node specification (NodeSpec) and apply tolerations to a pod through the Pod specification (PodSpec). When you apply a taint to a node, the scheduler cannot place a pod on that node unless the pod can tolerate the taint.
Example taint in a node specification
Example toleration in a Pod spec
Taints and tolerations consist of a key, value, and effect.
| Parameter | Description | ||||||
|---|---|---|---|---|---|---|---|
|
|
The | ||||||
|
|
The | ||||||
|
| The effect is one of the following:
| ||||||
|
|
|
If you add a
NoScheduletaint to a control plane node, the node must have thenode-role.kubernetes.io/master=:NoScheduletaint, which is added by default.For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
A toleration matches a taint:
If the
operatorparameter is set toEqual:-
the
keyparameters are the same; -
the
valueparameters are the same; -
the
effectparameters are the same.
-
the
If the
operatorparameter is set toExists:-
the
keyparameters are the same; -
the
effectparameters are the same.
-
the
The following taints are built into OpenShift Container Platform:
-
node.kubernetes.io/not-ready: The node is not ready. This corresponds to the node conditionReady=False. -
node.kubernetes.io/unreachable: The node is unreachable from the node controller. This corresponds to the node conditionReady=Unknown. -
node.kubernetes.io/memory-pressure: The node has memory pressure issues. This corresponds to the node conditionMemoryPressure=True. -
node.kubernetes.io/disk-pressure: The node has disk pressure issues. This corresponds to the node conditionDiskPressure=True. -
node.kubernetes.io/network-unavailable: The node network is unavailable. -
node.kubernetes.io/unschedulable: The node is unschedulable. -
node.cloudprovider.kubernetes.io/uninitialized: When the node controller is started with an external cloud provider, this taint is set on a node to mark it as unusable. After a controller from the cloud-controller-manager initializes this node, the kubelet removes this taint. node.kubernetes.io/pid-pressure: The node has pid pressure. This corresponds to the node conditionPIDPressure=True.ImportantOpenShift Container Platform does not set a default pid.available
evictionHard.
4.6.1.1. Understanding how to use toleration seconds to delay pod evictions Copy linkLink copied to clipboard!
You can specify how long a pod can remain bound to a node before being evicted by specifying the tolerationSeconds parameter in the Pod specification or MachineSet object. If a taint with the NoExecute effect is added to a node, a pod that does tolerate the taint, which has the tolerationSeconds parameter, the pod is not evicted until that time period expires.
Example output
Here, if this pod is running but does not have a matching toleration, the pod stays bound to the node for 3,600 seconds and then be evicted. If the taint is removed before that time, the pod is not evicted.
4.6.1.2. Understanding how to use multiple taints Copy linkLink copied to clipboard!
You can put multiple taints on the same node and multiple tolerations on the same pod. OpenShift Container Platform processes multiple taints and tolerations as follows:
- Process the taints for which the pod has a matching toleration.
The remaining unmatched taints have the indicated effects on the pod:
-
If there is at least one unmatched taint with effect
NoSchedule, OpenShift Container Platform cannot schedule a pod onto that node. -
If there is no unmatched taint with effect
NoSchedulebut there is at least one unmatched taint with effectPreferNoSchedule, OpenShift Container Platform tries to not schedule the pod onto the node. If there is at least one unmatched taint with effect
NoExecute, OpenShift Container Platform evicts the pod from the node if it is already running on the node, or the pod is not scheduled onto the node if it is not yet running on the node.- Pods that do not tolerate the taint are evicted immediately.
-
Pods that tolerate the taint without specifying
tolerationSecondsin theirPodspecification remain bound forever. -
Pods that tolerate the taint with a specified
tolerationSecondsremain bound for the specified amount of time.
-
If there is at least one unmatched taint with effect
For example:
Add the following taints to the node:
oc adm taint nodes node1 key1=value1:NoSchedule
$ oc adm taint nodes node1 key1=value1:NoScheduleCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc adm taint nodes node1 key1=value1:NoExecute
$ oc adm taint nodes node1 key1=value1:NoExecuteCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc adm taint nodes node1 key2=value2:NoSchedule
$ oc adm taint nodes node1 key2=value2:NoScheduleCopy to Clipboard Copied! Toggle word wrap Toggle overflow The pod has the following tolerations:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
In this case, the pod cannot be scheduled onto the node, because there is no toleration matching the third taint. The pod continues running if it is already running on the node when the taint is added, because the third taint is the only one of the three that is not tolerated by the pod.
4.6.1.3. Understanding pod scheduling and node conditions (taint node by condition) Copy linkLink copied to clipboard!
The Taint Nodes By Condition feature, which is enabled by default, automatically taints nodes that report conditions such as memory pressure and disk pressure. If a node reports a condition, a taint is added until the condition clears. The taints have the NoSchedule effect, which means no pod can be scheduled on the node unless the pod has a matching toleration.
The scheduler checks for these taints on nodes before scheduling pods. If the taint is present, the pod is scheduled on a different node. Because the scheduler checks for taints and not the actual node conditions, you configure the scheduler to ignore some of these node conditions by adding appropriate pod tolerations.
To ensure backward compatibility, the daemon set controller automatically adds the following tolerations to all daemons:
- node.kubernetes.io/memory-pressure
- node.kubernetes.io/disk-pressure
- node.kubernetes.io/unschedulable (1.10 or later)
- node.kubernetes.io/network-unavailable (host network only)
You can also add arbitrary tolerations to daemon sets.
The control plane also adds the node.kubernetes.io/memory-pressure toleration on pods that have a QoS class. This is because Kubernetes manages pods in the Guaranteed or Burstable QoS classes. The new BestEffort pods do not get scheduled onto the affected node.
4.6.1.4. Understanding evicting pods by condition (taint-based evictions) Copy linkLink copied to clipboard!
The Taint-Based Evictions feature, which is enabled by default, evicts pods from a node that experiences specific conditions, such as not-ready and unreachable. When a node experiences one of these conditions, OpenShift Container Platform automatically adds taints to the node, and starts evicting and rescheduling the pods on different nodes.
Taint Based Evictions have a NoExecute effect, where any pod that does not tolerate the taint is evicted immediately and any pod that does tolerate the taint will never be evicted, unless the pod uses the tolerationSeconds parameter.
The tolerationSeconds parameter allows you to specify how long a pod stays bound to a node that has a node condition. If the condition still exists after the tolerationSeconds period, the taint remains on the node and the pods with a matching toleration are evicted. If the condition clears before the tolerationSeconds period, pods with matching tolerations are not removed.
If you use the tolerationSeconds parameter with no value, pods are never evicted because of the not ready and unreachable node conditions.
OpenShift Container Platform evicts pods in a rate-limited way to prevent massive pod evictions in scenarios such as the master becoming partitioned from the nodes.
By default, if more than 55% of nodes in a given zone are unhealthy, the node lifecycle controller changes that zone’s state to PartialDisruption and the rate of pod evictions is reduced. For small clusters (by default, 50 nodes or less) in this state, nodes in this zone are not tainted and evictions are stopped.
For more information, see Rate limits on eviction in the Kubernetes documentation.
OpenShift Container Platform automatically adds a toleration for node.kubernetes.io/not-ready and node.kubernetes.io/unreachable with tolerationSeconds=300, unless the Pod configuration specifies either toleration.
- 1
- These tolerations ensure that the default pod behavior is to remain bound for five minutes after one of these node conditions problems is detected.
You can configure these tolerations as needed. For example, if you have an application with a lot of local state, you might want to keep the pods bound to node for a longer time in the event of network partition, allowing for the partition to recover and avoiding pod eviction.
Pods spawned by a daemon set are created with NoExecute tolerations for the following taints with no tolerationSeconds:
-
node.kubernetes.io/unreachable -
node.kubernetes.io/not-ready
As a result, daemon set pods are never evicted because of these node conditions.
4.6.1.5. Tolerating all taints Copy linkLink copied to clipboard!
You can configure a pod to tolerate all taints by adding an operator: "Exists" toleration with no key and values parameters. Pods with this toleration are not removed from a node that has taints.
Pod spec for tolerating all taints
4.6.2. Adding taints and tolerations Copy linkLink copied to clipboard!
You add tolerations to pods and taints to nodes to allow the node to control which pods should or should not be scheduled on them. For existing pods and nodes, you should add the toleration to the pod first, then add the taint to the node to avoid pods being removed from the node before you can add the toleration.
Procedure
Add a toleration to a pod by editing the
Podspec to include atolerationsstanza:Sample pod configuration file with an Equal operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
Sample pod configuration file with an Exists operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
Existsoperator does not take avalue.
This example places a taint on
node1that has keykey1, valuevalue1, and taint effectNoExecute.Add a taint to a node by using the following command with the parameters described in the Taint and toleration components table:
oc adm taint nodes <node_name> <key>=<value>:<effect>
$ oc adm taint nodes <node_name> <key>=<value>:<effect>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc adm taint nodes node1 key1=value1:NoExecute
$ oc adm taint nodes node1 key1=value1:NoExecuteCopy to Clipboard Copied! Toggle word wrap Toggle overflow This command places a taint on
node1that has keykey1, valuevalue1, and effectNoExecute.NoteIf you add a
NoScheduletaint to a control plane node, the node must have thenode-role.kubernetes.io/master=:NoScheduletaint, which is added by default.For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The tolerations on the pod match the taint on the node. A pod with either toleration can be scheduled onto
node1.
4.6.2.1. Adding taints and tolerations using a compute machine set Copy linkLink copied to clipboard!
You can add taints to nodes using a compute machine set. All nodes associated with the MachineSet object are updated with the taint. Tolerations respond to taints added by a compute machine set in the same manner as taints added directly to the nodes.
Procedure
Add a toleration to a pod by editing the
Podspec to include atolerationsstanza:Sample pod configuration file with
EqualoperatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
Sample pod configuration file with
ExistsoperatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the taint to the
MachineSetobject:Edit the
MachineSetYAML for the nodes you want to taint or you can create a newMachineSetobject:oc edit machineset <machineset>
$ oc edit machineset <machineset>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the taint to the
spec.template.specsection:Example taint in a compute machine set specification
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This example places a taint that has the key
key1, valuevalue1, and taint effectNoExecuteon the nodes.Scale down the compute machine set to 0:
oc scale --replicas=0 machineset <machineset> -n openshift-machine-api
$ oc scale --replicas=0 machineset <machineset> -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to scale the compute machine set:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Wait for the machines to be removed.
Scale up the compute machine set as needed:
oc scale --replicas=2 machineset <machineset> -n openshift-machine-api
$ oc scale --replicas=2 machineset <machineset> -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow Or:
oc edit machineset <machineset> -n openshift-machine-api
$ oc edit machineset <machineset> -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait for the machines to start. The taint is added to the nodes associated with the
MachineSetobject.
4.6.2.2. Binding a user to a node using taints and tolerations Copy linkLink copied to clipboard!
If you want to dedicate a set of nodes for exclusive use by a particular set of users, add a toleration to their pods. Then, add a corresponding taint to those nodes. The pods with the tolerations are allowed to use the tainted nodes or any other nodes in the cluster.
If you want ensure the pods are scheduled to only those tainted nodes, also add a label to the same set of nodes and add a node affinity to the pods so that the pods can only be scheduled onto nodes with that label.
Procedure
To configure a node so that users can use only that node:
Add a corresponding taint to those nodes:
For example:
oc adm taint nodes node1 dedicated=groupName:NoSchedule
$ oc adm taint nodes node1 dedicated=groupName:NoScheduleCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add the taint:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Add a toleration to the pods by writing a custom admission controller.
4.6.2.3. Creating a project with a node selector and toleration Copy linkLink copied to clipboard!
You can create a project that uses a node selector and toleration, which are set as annotations, to control the placement of pods onto specific nodes. Any subsequent resources created in the project are then scheduled on nodes that have a taint matching the toleration.
Prerequisites
- A label for node selection has been added to one or more nodes by using a compute machine set or editing the node directly.
- A taint has been added to one or more nodes by using a compute machine set or editing the node directly.
Procedure
Create a
Projectresource definition, specifying a node selector and toleration in themetadata.annotationssection:Example
project.yamlfileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
oc applycommand to create the project:oc apply -f project.yaml
$ oc apply -f project.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Any subsequent resources created in the <project_name> namespace should now be scheduled on the specified nodes.
4.6.2.4. Controlling nodes with special hardware using taints and tolerations Copy linkLink copied to clipboard!
In a cluster where a small subset of nodes have specialized hardware, you can use taints and tolerations to keep pods that do not need the specialized hardware off of those nodes, leaving the nodes for pods that do need the specialized hardware. You can also require pods that need specialized hardware to use specific nodes.
You can achieve this by adding a toleration to pods that need the special hardware and tainting the nodes that have the specialized hardware.
Procedure
To ensure nodes with specialized hardware are reserved for specific pods:
Add a toleration to pods that need the special hardware.
For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Taint the nodes that have the specialized hardware using one of the following commands:
oc adm taint nodes <node-name> disktype=ssd:NoSchedule
$ oc adm taint nodes <node-name> disktype=ssd:NoScheduleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Or:
oc adm taint nodes <node-name> disktype=ssd:PreferNoSchedule
$ oc adm taint nodes <node-name> disktype=ssd:PreferNoScheduleCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add the taint:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.6.3. Removing taints and tolerations Copy linkLink copied to clipboard!
You can remove taints from nodes and tolerations from pods as needed. You should add the toleration to the pod first, then add the taint to the node to avoid pods being removed from the node before you can add the toleration.
Procedure
To remove taints and tolerations:
To remove a taint from a node:
oc adm taint nodes <node-name> <key>-
$ oc adm taint nodes <node-name> <key>-Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc adm taint nodes ip-10-0-132-248.ec2.internal key1-
$ oc adm taint nodes ip-10-0-132-248.ec2.internal key1-Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
node/ip-10-0-132-248.ec2.internal untainted
node/ip-10-0-132-248.ec2.internal untaintedCopy to Clipboard Copied! Toggle word wrap Toggle overflow To remove a toleration from a pod, edit the
Podspec to remove the toleration:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.7. Placing pods on specific nodes using node selectors Copy linkLink copied to clipboard!
A node selector specifies a map of key/value pairs that are defined using custom labels on nodes and selectors specified in pods.
For the pod to be eligible to run on a node, the pod must have the same key/value node selector as the label on the node.
4.7.1. About node selectors Copy linkLink copied to clipboard!
You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OpenShift Container Platform schedules the pods on nodes that contain matching labels.
You can use a node selector to place specific pods on specific nodes, cluster-wide node selectors to place new pods on specific nodes anywhere in the cluster, and project node selectors to place new pods in a project on specific nodes.
For example, as a cluster administrator, you can create an infrastructure where application developers can deploy pods only onto the nodes closest to their geographical location by including a node selector in every pod they create. In this example, the cluster consists of five data centers spread across two regions. In the U.S., label the nodes as us-east, us-central, or us-west. In the Asia-Pacific region (APAC), label the nodes as apac-east or apac-west. The developers can add a node selector to the pods they create to ensure the pods get scheduled on those nodes.
A pod is not scheduled if the Pod object contains a node selector, but no node has a matching label.
If you are using node selectors and node affinity in the same pod configuration, the following rules control pod placement onto nodes:
-
If you configure both
nodeSelectorandnodeAffinity, both conditions must be satisfied for the pod to be scheduled onto a candidate node. -
If you specify multiple
nodeSelectorTermsassociated withnodeAffinitytypes, then the pod can be scheduled onto a node if one of thenodeSelectorTermsis satisfied. -
If you specify multiple
matchExpressionsassociated withnodeSelectorTerms, then the pod can be scheduled onto a node only if allmatchExpressionsare satisfied.
- Node selectors on specific pods and nodes
You can control which node a specific pod is scheduled on by using node selectors and labels.
To use node selectors and labels, first label the node to avoid pods being descheduled, then add the node selector to the pod.
NoteYou cannot add a node selector directly to an existing scheduled pod. You must label the object that controls the pod, such as deployment config.
For example, the following
Nodeobject has theregion: eastlabel:Sample
Nodeobject with a labelCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Labels to match the pod node selector.
A pod has the
type: user-node,region: eastnode selector:Sample
Podobject with node selectorsCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Node selectors to match the node label. The node must have a label for each node selector.
When you create the pod using the example pod spec, it can be scheduled on the example node.
- Default cluster-wide node selectors
With default cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.
For example, the following
Schedulerobject has the default cluster-wideregion=eastandtype=user-nodenode selectors:Example Scheduler Operator Custom Resource
Copy to Clipboard Copied! Toggle word wrap Toggle overflow A node in that cluster has the
type=user-node,region=eastlabels:Example
NodeobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Podobject with a node selectorCopy to Clipboard Copied! Toggle word wrap Toggle overflow When you create the pod using the example pod spec in the example cluster, the pod is created with the cluster-wide node selector and is scheduled on the labeled node:
Example pod list with the pod on the labeled node
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf the project where you create the pod has a project node selector, that selector takes preference over a cluster-wide node selector. Your pod is not created or scheduled if the pod does not have the project node selector.
- Project node selectors
With project node selectors, when you create a pod in this project, OpenShift Container Platform adds the node selectors to the pod and schedules the pods on a node with matching labels. If there is a cluster-wide default node selector, a project node selector takes preference.
For example, the following project has the
region=eastnode selector:Example
NamespaceobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow The following node has the
type=user-node,region=eastlabels:Example
NodeobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow When you create the pod using the example pod spec in this example project, the pod is created with the project node selectors and is scheduled on the labeled node:
Example
PodobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example pod list with the pod on the labeled node
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow A pod in the project is not created or scheduled if the pod contains different node selectors. For example, if you deploy the following pod into the example project, it is not created:
Example
Podobject with an invalid node selectorCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.7.2. Using node selectors to control pod placement Copy linkLink copied to clipboard!
You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OpenShift Container Platform schedules the pods on nodes that contain matching labels.
You add labels to a node, a compute machine set, or a machine config. Adding the label to the compute machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
To add node selectors to an existing pod, add a node selector to the controlling object for that pod, such as a ReplicaSet object, DaemonSet object, StatefulSet object, Deployment object, or DeploymentConfig object. Any existing pods under that controlling object are recreated on a node with a matching label. If you are creating a new pod, you can add the node selector directly to the pod spec. If the pod does not have a controlling object, you must delete the pod, edit the pod spec, and recreate the pod.
You cannot add a node selector directly to an existing scheduled pod.
Prerequisites
To add a node selector to existing pods, determine the controlling object for that pod. For example, the router-default-66d5cf9464-m2g75 pod is controlled by the router-default-66d5cf9464 replica set:
oc describe pod router-default-66d5cf9464-7pwkc
$ oc describe pod router-default-66d5cf9464-7pwkc
Example output
The web console lists the controlling object under ownerReferences in the pod YAML:
Procedure
Add labels to a node by using a compute machine set or editing the node directly:
Use a
MachineSetobject to add labels to nodes managed by the compute machine set when a node is created:Run the following command to add labels to a
MachineSetobject:oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc patch MachineSet abc612-msrtw-worker-us-east-1c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api$ oc patch MachineSet abc612-msrtw-worker-us-east-1c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add labels to a compute machine set:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the labels are added to the
MachineSetobject by using theoc editcommand:For example:
oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-api
$ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
MachineSetobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Add labels directly to a node:
Edit the
Nodeobject for the node:oc label nodes <name> <key>=<value>
$ oc label nodes <name> <key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to label a node:
oc label nodes ip-10-0-142-25.ec2.internal type=user-node region=east
$ oc label nodes ip-10-0-142-25.ec2.internal type=user-node region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add labels to a node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the labels are added to the node:
oc get nodes -l type=user-node,region=east
$ oc get nodes -l type=user-node,region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION ip-10-0-142-25.ec2.internal Ready worker 17m v1.31.3
NAME STATUS ROLES AGE VERSION ip-10-0-142-25.ec2.internal Ready worker 17m v1.31.3Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Add the matching node selector to a pod:
To add a node selector to existing and future pods, add a node selector to the controlling object for the pods:
Example
ReplicaSetobject with labelsCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add the node selector.
To add a node selector to a specific, new pod, add the selector to the
Podobject directly:Example
Podobject with a node selectorCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou cannot add a node selector directly to an existing scheduled pod.
4.7.3. Creating default cluster-wide node selectors Copy linkLink copied to clipboard!
You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.
With cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.
You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a compute machine set, or a machine config. Adding the label to the compute machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
You can add additional key/value pairs to a pod. But you cannot add a different value for a default key.
Procedure
To add a default cluster-wide node selector:
Edit the Scheduler Operator CR to add the default cluster-wide node selectors:
oc edit scheduler cluster
$ oc edit scheduler clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example Scheduler Operator CR with a node selector
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add a node selector with the appropriate
<key>:<value>pairs.
After making this change, wait for the pods in the
openshift-kube-apiserverproject to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.Add labels to a node by using a compute machine set or editing the node directly:
Use a compute machine set to add labels to nodes managed by the compute machine set when a node is created:
Run the following command to add labels to a
MachineSetobject:oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add a
<key>/<value>pair for each label.
For example:
oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api$ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add labels to a compute machine set:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the labels are added to the
MachineSetobject by using theoc editcommand:For example:
oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-api
$ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
MachineSetobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Redeploy the nodes associated with that compute machine set by scaling down to
0and scaling up the nodes:For example:
oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow When the nodes are ready and available, verify that the label is added to the nodes by using the
oc getcommand:oc get nodes -l <key>=<value>
$ oc get nodes -l <key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc get nodes -l type=user-node
$ oc get nodes -l type=user-nodeCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.31.3
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.31.3Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Add labels directly to a node:
Edit the
Nodeobject for the node:oc label nodes <name> <key>=<value>
$ oc label nodes <name> <key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to label a node:
oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=east
$ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add labels to a node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the labels are added to the node using the
oc getcommand:oc get nodes -l <key>=<value>,<key>=<value>
$ oc get nodes -l <key>=<value>,<key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc get nodes -l type=user-node,region=east
$ oc get nodes -l type=user-node,region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.31.3
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.31.3Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.7.4. Creating project-wide node selectors Copy linkLink copied to clipboard!
You can use node selectors in a project together with labels on nodes to constrain all pods created in that project to the labeled nodes.
When you create a pod in this project, OpenShift Container Platform adds the node selectors to the pods in the project and schedules the pods on a node with matching labels in the project. If there is a cluster-wide default node selector, a project node selector takes preference.
You add node selectors to a project by editing the Namespace object to add the openshift.io/node-selector parameter. You add labels to a node, a compute machine set, or a machine config. Adding the label to the compute machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
A pod is not scheduled if the Pod object contains a node selector, but no project has a matching node selector. When you create a pod from that spec, you receive an error similar to the following message:
Example error message
Error from server (Forbidden): error when creating "pod.yaml": pods "pod-4" is forbidden: pod node label selector conflicts with its project node label selector
Error from server (Forbidden): error when creating "pod.yaml": pods "pod-4" is forbidden: pod node label selector conflicts with its project node label selector
You can add additional key/value pairs to a pod. But you cannot add a different value for a project key.
Procedure
To add a default project node selector:
Create a namespace or edit an existing namespace to add the
openshift.io/node-selectorparameter:oc edit namespace <name>
$ oc edit namespace <name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add the
openshift.io/node-selectorwith the appropriate<key>:<value>pairs.
Add labels to a node by using a compute machine set or editing the node directly:
Use a
MachineSetobject to add labels to nodes managed by the compute machine set when a node is created:Run the following command to add labels to a
MachineSetobject:oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api$ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add labels to a compute machine set:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the labels are added to the
MachineSetobject by using theoc editcommand:For example:
oc edit MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc edit MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Redeploy the nodes associated with that compute machine set:
For example:
oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow When the nodes are ready and available, verify that the label is added to the nodes by using the
oc getcommand:oc get nodes -l <key>=<value>
$ oc get nodes -l <key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc get nodes -l type=user-node,region=east
$ oc get nodes -l type=user-node,region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.31.3
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.31.3Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Add labels directly to a node:
Edit the
Nodeobject to add labels:oc label <resource> <name> <key>=<value>
$ oc label <resource> <name> <key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to label a node:
oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-c-tgq49 type=user-node region=east
$ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-c-tgq49 type=user-node region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to add labels to a node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the labels are added to the
Nodeobject using theoc getcommand:oc get nodes -l <key>=<value>
$ oc get nodes -l <key>=<value>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc get nodes -l type=user-node,region=east
$ oc get nodes -l type=user-node,region=eastCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.31.3
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.31.3Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.8. Controlling pod placement by using pod topology spread constraints Copy linkLink copied to clipboard!
You can use pod topology spread constraints to provide fine-grained control over the placement of your pods across nodes, zones, regions, or other user-defined topology domains. Distributing pods across failure domains can help to achieve high availability and more efficient resource utilization.
4.8.1. Example use cases Copy linkLink copied to clipboard!
- As an administrator, I want my workload to automatically scale between two to fifteen pods. I want to ensure that when there are only two pods, they are not placed on the same node, to avoid a single point of failure.
- As an administrator, I want to distribute my pods evenly across multiple infrastructure zones to reduce latency and network costs. I want to ensure that my cluster can self-heal if issues arise.
4.8.2. Important considerations Copy linkLink copied to clipboard!
- Pods in an OpenShift Container Platform cluster are managed by workload controllers such as deployments, stateful sets, or daemon sets. These controllers define the desired state for a group of pods, including how they are distributed and scaled across the nodes in the cluster. You should set the same pod topology spread constraints on all pods in a group to avoid confusion. When using a workload controller, such as a deployment, the pod template typically handles this for you.
-
Mixing different pod topology spread constraints can make OpenShift Container Platform behavior confusing and troubleshooting more difficult. You can avoid this by ensuring that all nodes in a topology domain are consistently labeled. OpenShift Container Platform automatically populates well-known labels, such as
kubernetes.io/hostname. This helps avoid the need for manual labeling of nodes. These labels provide essential topology information, ensuring consistent node labeling across the cluster. - Only pods within the same namespace are matched and grouped together when spreading due to a constraint.
- You can specify multiple pod topology spread constraints, but you must ensure that they do not conflict with each other. All pod topology spread constraints must be satisfied for a pod to be placed.
4.8.3. Understanding skew and maxSkew Copy linkLink copied to clipboard!
Skew refers to the difference in the number of pods that match a specified label selector across different topology domains, such as zones or nodes.
The skew is calculated for each domain by taking the absolute difference between the number of pods in that domain and the number of pods in the domain with the lowest amount of pods scheduled. Setting a maxSkew value guides the scheduler to maintain a balanced pod distribution.
4.8.3.1. Example skew calculation Copy linkLink copied to clipboard!
You have three zones (A, B, and C), and you want to distribute your pods evenly across these zones. If zone A has 5 pods, zone B has 3 pods, and zone C has 2 pods, to find the skew, you can subtract the number of pods in the domain with the lowest amount of pods scheduled from the number of pods currently in each zone. This means that the skew for zone A is 3, the skew for zone B is 1, and the skew for zone C is 0.
4.8.3.2. The maxSkew parameter Copy linkLink copied to clipboard!
The maxSkew parameter defines the maximum allowable difference, or skew, in the number of pods between any two topology domains. If maxSkew is set to 1, the number of pods in any topology domain should not differ by more than 1 from any other domain. If the skew exceeds maxSkew, the scheduler attempts to place new pods in a way that reduces the skew, adhering to the constraints.
Using the previous example skew calculation, the skew values exceed the default maxSkew value of 1. The scheduler places new pods in zone B and zone C to reduce the skew and achieve a more balanced distribution, ensuring that no topology domain exceeds the skew of 1.
4.8.4. Example configurations for pod topology spread constraints Copy linkLink copied to clipboard!
You can specify which pods to group together, which topology domains they are spread among, and the acceptable skew.
The following examples demonstrate pod topology spread constraint configurations.
Example to distribute pods that match the specified labels based on their zone
- 1
- The maximum difference in number of pods between any two topology domains. The default is
1, and you cannot specify a value of0. - 2
- The key of a node label. Nodes with this key and identical value are considered to be in the same topology.
- 3
- How to handle a pod if it does not satisfy the spread constraint. The default is
DoNotSchedule, which tells the scheduler not to schedule the pod. Set toScheduleAnywayto still schedule the pod, but the scheduler prioritizes honoring the skew to not make the cluster more imbalanced. - 4
- Pods that match this label selector are counted and recognized as a group when spreading to satisfy the constraint. Be sure to specify a label selector, otherwise no pods can be matched.
- 5
- Be sure that this
Podspec also sets its labels to match this label selector if you want it to be counted properly in the future. - 6
- A list of pod label keys to select which pods to calculate spreading over.
Example demonstrating a single pod topology spread constraint
The previous example defines a Pod spec with a one pod topology spread constraint. It matches on pods labeled region: us-east, distributes among zones, specifies a skew of 1, and does not schedule the pod if it does not meet these requirements.
Example demonstrating multiple pod topology spread constraints
The previous example defines a Pod spec with two pod topology spread constraints. Both match on pods labeled region: us-east, specify a skew of 1, and do not schedule the pod if it does not meet these requirements.
The first constraint distributes pods based on a user-defined label node, and the second constraint distributes pods based on a user-defined label rack. Both constraints must be met for the pod to be scheduled.
4.9. Descheduler Copy linkLink copied to clipboard!
4.9.1. Descheduler overview Copy linkLink copied to clipboard!
While the scheduler is used to determine the most suitable node to host a new pod, the descheduler can be used to evict a running pod so that the pod can be rescheduled onto a more suitable node.
4.9.1.1. About the descheduler Copy linkLink copied to clipboard!
You can use the descheduler to evict pods based on specific strategies so that the pods can be rescheduled onto more appropriate nodes.
You can benefit from descheduling running pods in situations such as the following:
- Nodes are underutilized or overutilized.
- Pod and node affinity requirements, such as taints or labels, have changed and the original scheduling decisions are no longer appropriate for certain nodes.
- Node failure requires pods to be moved.
- New nodes are added to clusters.
- Pods have been restarted too many times.
The descheduler does not schedule replacement of evicted pods. The scheduler automatically performs this task for the evicted pods.
When the descheduler decides to evict pods from a node, it employs the following general mechanism:
-
Pods in the
openshift-*andkube-systemnamespaces are never evicted. -
Critical pods with
priorityClassNameset tosystem-cluster-criticalorsystem-node-criticalare never evicted. - Static, mirrored, or stand-alone pods that are not part of a replication controller, replica set, deployment, or job are never evicted because these pods will not be recreated.
- Pods associated with daemon sets are never evicted.
- Pods with local storage are never evicted.
- Best effort pods are evicted before burstable and guaranteed pods.
-
All types of pods with the
descheduler.alpha.kubernetes.io/evictannotation are eligible for eviction. This annotation is used to override checks that prevent eviction, and the user can select which pod is evicted. Users should know how and if the pod will be recreated. - Pods subject to pod disruption budget (PDB) are not evicted if descheduling violates its pod disruption budget (PDB). The pods are evicted by using eviction subresource to handle PDB.
4.9.1.2. Descheduler profiles Copy linkLink copied to clipboard!
The following descheduler profiles are available:
AffinityAndTaintsThis profile evicts pods that violate inter-pod anti-affinity, node affinity, and node taints.
It enables the following strategies:
-
RemovePodsViolatingInterPodAntiAffinity: removes pods that are violating inter-pod anti-affinity. -
RemovePodsViolatingNodeAffinity: removes pods that are violating node affinity. RemovePodsViolatingNodeTaints: removes pods that are violatingNoScheduletaints on nodes.Pods with a node affinity type of
requiredDuringSchedulingIgnoredDuringExecutionare removed.
-
TopologyAndDuplicatesThis profile evicts pods in an effort to evenly spread similar pods, or pods of the same topology domain, among nodes.
It enables the following strategies:
-
RemovePodsViolatingTopologySpreadConstraint: finds unbalanced topology domains and tries to evict pods from larger ones whenDoNotScheduleconstraints are violated. -
RemoveDuplicates: ensures that there is only one pod associated with a replica set, replication controller, deployment, or job running on same node. If there are more, those duplicate pods are evicted for better pod distribution in a cluster.
WarningDo not enable
TopologyAndDuplicateswith any of the following profiles:SoftTopologyAndDuplicatesorCompactAndScale. Enabling these profiles together results in a conflict.-
LifecycleAndUtilizationThis profile evicts long-running pods and balances resource usage between nodes.
It enables the following strategies:
RemovePodsHavingTooManyRestarts: removes pods whose containers have been restarted too many times.Pods where the sum of restarts over all containers (including Init Containers) is more than 100.
LowNodeUtilization: finds nodes that are underutilized and evicts pods, if possible, from overutilized nodes in the hope that recreation of evicted pods will be scheduled on these underutilized nodes.- A node is considered underutilized if its usage is below 20% for all thresholds (CPU, memory, and number of pods).
- A node is considered overutilized if its usage is above 50% for any of the thresholds (CPU, memory, and number of pods).
Optionally, you can adjust these underutilized/overutilized threshold percentages by setting the Technology Preview field
devLowNodeUtilizationThresholdsto one the following values:Lowfor 10%/30%,Mediumfor 20%/50%, orHighfor 40%/70%. The default value isMedium.PodLifeTime: evicts pods that are too old.By default, pods that are older than 24 hours are removed. You can customize the pod lifetime value.
WarningDo not enable
LifecycleAndUtilizationwith any of the following profiles:LongLifecycleorCompactAndScale. Enabling these profiles together results in a conflict.SoftTopologyAndDuplicatesThis profile is the same as
TopologyAndDuplicates, except that pods with soft topology constraints, such aswhenUnsatisfiable: ScheduleAnyway, are also considered for eviction.WarningDo not enable both
SoftTopologyAndDuplicatesandTopologyAndDuplicates. Enabling both results in a conflict.EvictPodsWithLocalStorage- This profile allows pods with local storage to be eligible for eviction.
EvictPodsWithPVC-
This profile allows pods with persistent volume claims to be eligible for eviction. If you are using
Kubernetes NFS Subdir External Provisioner, you must add an excluded namespace for the namespace where the provisioner is installed. CompactAndScaleThis profile enables the
HighNodeUtilizationstrategy, which attempts to evict pods from underutilized nodes to allow a workload to run on a smaller set of nodes. A node is considered underutilized if its usage is below 20% for all thresholds (CPU, memory, and number of pods).Optionally, you can adjust the underutilized percentage by setting the Technology Preview field
devHighNodeUtilizationThresholdsto one the following values:Minimalfor 10%,Modestfor 20%, orModeratefor 30%. The default value isModest.WarningDo not enable
CompactAndScalewith any of the following profiles:LifecycleAndUtilization,LongLifecycle, orTopologyAndDuplicates. Enabling these profiles together results in a conflict.DevKubeVirtRelieveAndMigrateThis profile is an enhanced version of the
LongLifeCycleprofile.ImportantThe
DevKubeVirtRelieveAndMigrateprofile is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The DevKubeVirtRelieveAndMigrate profile evicts pods from high-cost nodes to reduce overall resource expenses and enable workload migration. It also periodically rebalances workloads to help maintain similar spare capacity across nodes, which supports better handling of sudden workload spikes. Nodes can experience the following costs:
- Resource utilization: Increased resource pressure raises the overhead for running applications.
- Node maintenance: A higher number of containers on a node increases resource consumption and maintenance costs.
The profile enables the LowNodeUtilization strategy with the EvictionsInBackground alpha feature. The profile also exposes the following customization fields:
-
devActualUtilizationProfile: Enables load-aware descheduling. -
devLowNodeUtilizationThresholds: Sets experimental thresholds for theLowNodeUtilizationstrategy. Do not use this field withdevDeviationThresholds. -
devDeviationThresholds: Treats nodes with below-average resource usage as underutilized to help redistribute workloads from overutilized nodes. Do not use this field withdevLowNodeUtilizationThresholds. Supported values are:Low(10%:10%),Medium(20%:20%),High(30%:30%),AsymmetricLow(0%:10%),AsymmetricMedium(0%:20%),AsymmetricHigh(0%:30%). -
devEnableSoftTainter: Enables the soft-tainting component to dynamically apply or remove soft taints as scheduling hints.
Example configuration
The DevKubeVirtRelieveAndMigrate profile requires PSI metrics to be enabled on all worker nodes. You can enable this by applying the following MachineConfig custom resource (CR):
Example MachineConfig CR
You can use this profile with the SoftTopologyAndDuplicates profile to also rebalance pods based on soft topology constraints, which can be useful in hosted control plane environments.
LongLifecycle- This profile balances resource usage between nodes and enables the following strategies:
-
RemovePodsHavingTooManyRestarts: removes pods whose containers have been restarted too many times and pods where the sum of restarts over all containers (including Init Containers) is more than 100. Restarting the VM guest operating system does not increase this count. LowNodeUtilization: evicts pods from overutilized nodes when there are any underutilized nodes. The destination node for the evicted pod will be determined by the scheduler.- A node is considered underutilized if its usage is below 20% for all thresholds (CPU, memory, and number of pods).
- A node is considered overutilized if its usage is above 50% for any of the thresholds (CPU, memory, and number of pods).
Do not enable LongLifecycle with any of the following profiles: LifecycleAndUtilization or CompactAndScale. Enabling these profiles together results in a conflict.
4.9.2. Kube Descheduler Operator release notes Copy linkLink copied to clipboard!
The Kube Descheduler Operator allows you to evict pods so that they can be rescheduled on more appropriate nodes.
These release notes track the development of the Kube Descheduler Operator.
For more information, see About the descheduler.
4.9.2.1. Release notes for Kube Descheduler Operator 5.1.3 Copy linkLink copied to clipboard!
Issued: 7 July 2025
The following advisory is available for the Kube Descheduler Operator 5.1.3:
4.9.2.1.1. Bug fixes Copy linkLink copied to clipboard!
-
Previously, the
relatedImagesfield was not set properly in the Kube Descheduler Operator cluster service version (CSV), so the images were not properly mirrored when using oc-mirror. With this release, therelatedImagesfield is now set properly and the Kube Descheduler Operator images are now mirrored properly when using oc-mirror. (OCPBUGS-56485)
4.9.2.2. Release notes for Kube Descheduler Operator 5.1.2 Copy linkLink copied to clipboard!
Issued: 1 May 2025
The following advisory is available for the Kube Descheduler Operator 5.1.2:
4.9.2.2.1. New features and enhancements Copy linkLink copied to clipboard!
-
This release of the Kube Descheduler Operator adds a new Technology Preview descheduler profile called
DevKubeVirtRelieveAndMigrate. This profile is only available for use with OpenShift Virtualization.
4.9.2.2.2. Bug fixes Copy linkLink copied to clipboard!
-
Previously, when the
LifecycleAndUtilizationprofile was enabled, pods from protected namespaces (openshift-*,kube-system,hypershift) could be evicted. Pods in these namespaces should never be evicted. With this release, these protected namespaces are now properly excluded from eviction when theLifecycleAndUtilizationprofile is enabled. (OCPBUGS-54414)
4.9.2.3. Release notes for Kube Descheduler Operator 5.1.1 Copy linkLink copied to clipboard!
Issued: 2 December 2024
The following advisory is available for the Kube Descheduler Operator 5.1.1:
4.9.2.3.1. New features and enhancements Copy linkLink copied to clipboard!
- This release of the Kube Descheduler Operator updates the Kubernetes version to 1.31.
4.9.2.3.2. Bug fixes Copy linkLink copied to clipboard!
- This release of the Kube Descheduler Operator addresses several Common Vulnerabilities and Exposures (CVEs).
4.9.2.4. Release notes for Kube Descheduler Operator 5.1.0 Copy linkLink copied to clipboard!
Issued: 23 October 2024
The following advisory is available for the Kube Descheduler Operator 5.1.0:
4.9.2.4.1. New features and enhancements Copy linkLink copied to clipboard!
Two new descheduler profiles are now available:
-
CompactAndScale: This profile attempts to evict pods from underutilized nodes to allow a workload to run on a smaller set of nodes. -
LongLifecycle: This profile balances resource usage between nodes and enables theRemovePodsHavingTooManyRestartsandLowNodeUtilizationstrategies.
-
-
For the
CompactAndScaleprofile, you can use the Technology Preview fielddevHighNodeUtilizationThresholdsto adjust the underutilized threshold value.
4.9.2.4.2. Bug fixes Copy linkLink copied to clipboard!
- This release of the Kube Descheduler Operator addresses several Common Vulnerabilities and Exposures (CVEs).
4.9.3. Evicting pods using the descheduler Copy linkLink copied to clipboard!
You can run the descheduler in OpenShift Container Platform by installing the Kube Descheduler Operator and setting the desired profiles and other customizations.
4.9.3.1. Installing the descheduler Copy linkLink copied to clipboard!
The descheduler is not available by default. To enable the descheduler, you must install the Kube Descheduler Operator from OperatorHub and enable one or more descheduler profiles.
By default, the descheduler runs in predictive mode, which means that it only simulates pod evictions. You must change the mode to automatic for the descheduler to perform the pod evictions.
If you have enabled hosted control planes in your cluster, set a custom priority threshold to lower the chance that pods in the hosted control plane namespaces are evicted. Set the priority threshold class name to hypershift-control-plane, because it has the lowest priority value (100000000) of the hosted control plane priority classes.
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - Access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Create the required namespace for the Kube Descheduler Operator.
-
Navigate to Administration
Namespaces and click Create Namespace. -
Enter
openshift-kube-descheduler-operatorin the Name field, enteropenshift.io/cluster-monitoring=truein the Labels field to enable descheduler metrics, and click Create.
-
Navigate to Administration
Install the Kube Descheduler Operator.
-
Navigate to Operators
OperatorHub. - Type Kube Descheduler Operator into the filter box.
- Select the Kube Descheduler Operator and click Install.
- On the Install Operator page, select A specific namespace on the cluster. Select openshift-kube-descheduler-operator from the drop-down menu.
- Adjust the values for the Update Channel and Approval Strategy to the desired values.
- Click Install.
-
Navigate to Operators
Create a descheduler instance.
-
From the Operators
Installed Operators page, click the Kube Descheduler Operator. - Select the Kube Descheduler tab and click Create KubeDescheduler.
Edit the settings as necessary.
- To evict pods instead of simulating the evictions, change the Mode field to Automatic.
-
From the Operators
4.9.3.2. Configuring descheduler profiles Copy linkLink copied to clipboard!
You can configure which profiles the descheduler uses to evict pods.
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole.
Procedure
Edit the
KubeDeschedulerobject:oc edit kubedeschedulers.operator.openshift.io cluster -n openshift-kube-descheduler-operator
$ oc edit kubedeschedulers.operator.openshift.io cluster -n openshift-kube-descheduler-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Specify one or more profiles in the
spec.profilessection.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Optional: By default, the descheduler does not evict pods. To evict pods, set
modetoAutomatic. - 2
- Optional: Set a list of user-created namespaces to include or exclude from descheduler operations. Use
excludedto set a list of namespaces to exclude or useincludedto set a list of namespaces to include. Note that protected namespaces (openshift-*,kube-system,hypershift) are excluded by default. - 3
- Optional: Enable a custom pod lifetime value for the
LifecycleAndUtilizationprofile. Valid units ares,m, orh. The default pod lifetime is 24 hours. - 4
- Optional: Specify a priority threshold to consider pods for eviction only if their priority is lower than the specified level. Use the
thresholdPriorityfield to set a numerical priority threshold (for example,10000) or use thethresholdPriorityClassNamefield to specify a certain priority class name (for example,my-priority-class-name). If you specify a priority class name, it must already exist or the descheduler will throw an error. Do not set boththresholdPriorityandthresholdPriorityClassName. - 5
- Optional: Set the maximum number of pods to evict during each descheduler run.
- 6
- Add one or more profiles to enable. Available profiles:
AffinityAndTaints,TopologyAndDuplicates,LifecycleAndUtilization,SoftTopologyAndDuplicates,EvictPodsWithLocalStorage,EvictPodsWithPVC,CompactAndScale, andLongLifecycle. Ensure that you do not enable profiles that conflict with each other.
You can enable multiple profiles; the order that the profiles are specified in is not important.
- Save the file to apply the changes.
4.9.3.3. Configuring the descheduler interval Copy linkLink copied to clipboard!
You can configure the amount of time between descheduler runs. The default is 3600 seconds (one hour).
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole.
Procedure
Edit the
KubeDeschedulerobject:oc edit kubedeschedulers.operator.openshift.io cluster -n openshift-kube-descheduler-operator
$ oc edit kubedeschedulers.operator.openshift.io cluster -n openshift-kube-descheduler-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update the
deschedulingIntervalSecondsfield to the desired value:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Set the number of seconds between descheduler runs. A value of
0in this field runs the descheduler once and exits.
- Save the file to apply the changes.
4.9.4. Uninstalling the Kube Descheduler Operator Copy linkLink copied to clipboard!
You can remove the Kube Descheduler Operator from OpenShift Container Platform by uninstalling the Operator and removing its related resources.
4.9.4.1. Uninstalling the descheduler Copy linkLink copied to clipboard!
You can remove the descheduler from your cluster by removing the descheduler instance and uninstalling the Kube Descheduler Operator. This procedure also cleans up the KubeDescheduler CRD and openshift-kube-descheduler-operator namespace.
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - Access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Delete the descheduler instance.
-
From the Operators
Installed Operators page, click Kube Descheduler Operator. - Select the Kube Descheduler tab.
-
Click the Options menu
next to the cluster entry and select Delete KubeDescheduler.
- In the confirmation dialog, click Delete.
-
From the Operators
Uninstall the Kube Descheduler Operator.
-
Navigate to Operators
Installed Operators. -
Click the Options menu
next to the Kube Descheduler Operator entry and select Uninstall Operator.
- In the confirmation dialog, click Uninstall.
-
Navigate to Operators
Delete the
openshift-kube-descheduler-operatornamespace.-
Navigate to Administration
Namespaces. -
Enter
openshift-kube-descheduler-operatorinto the filter box. -
Click the Options menu
next to the openshift-kube-descheduler-operator entry and select Delete Namespace.
-
In the confirmation dialog, enter
openshift-kube-descheduler-operatorand click Delete.
-
Navigate to Administration
Delete the
KubeDeschedulerCRD.-
Navigate to Administration
Custom Resource Definitions. -
Enter
KubeDeschedulerinto the filter box. -
Click the Options menu
next to the KubeDescheduler entry and select Delete CustomResourceDefinition.
- In the confirmation dialog, click Delete.
-
Navigate to Administration
4.10. Secondary scheduler Copy linkLink copied to clipboard!
4.10.1. Secondary scheduler overview Copy linkLink copied to clipboard!
You can install the Secondary Scheduler Operator to run a custom secondary scheduler alongside the default scheduler to schedule pods.
4.10.1.1. About the Secondary Scheduler Operator Copy linkLink copied to clipboard!
The Secondary Scheduler Operator for Red Hat OpenShift provides a way to deploy a custom secondary scheduler in OpenShift Container Platform. The secondary scheduler runs alongside the default scheduler to schedule pods. Pod configurations can specify which scheduler to use.
The custom scheduler must have the /bin/kube-scheduler binary and be based on the Kubernetes scheduling framework.
You can use the Secondary Scheduler Operator to deploy a custom secondary scheduler in OpenShift Container Platform, but Red Hat does not directly support the functionality of the custom secondary scheduler.
The Secondary Scheduler Operator creates the default roles and role bindings required by the secondary scheduler. You can specify which scheduling plugins to enable or disable by configuring the KubeSchedulerConfiguration resource for the secondary scheduler.
4.10.2. Secondary Scheduler Operator for Red Hat OpenShift release notes Copy linkLink copied to clipboard!
The Secondary Scheduler Operator for Red Hat OpenShift allows you to deploy a custom secondary scheduler in your OpenShift Container Platform cluster.
These release notes track the development of the Secondary Scheduler Operator for Red Hat OpenShift.
For more information, see About the Secondary Scheduler Operator.
4.10.2.1. Release notes for Secondary Scheduler Operator for Red Hat OpenShift 1.4.1 Copy linkLink copied to clipboard!
Issued: 9 July 2025
The following advisory is available for the Secondary Scheduler Operator for Red Hat OpenShift 1.4.1:
4.10.2.1.1. New features and enhancements Copy linkLink copied to clipboard!
- This release of the Secondary Scheduler Operator updates the Kubernetes version to 1.32.
4.10.2.1.2. Bug fixes Copy linkLink copied to clipboard!
- This release of the Secondary Scheduler Operator addresses several Common Vulnerabilities and Exposures (CVEs).
- Previously, some secondary scheduler plugins could not be deployed if they needed to create temporary files. This was due to more restricted permissions that were introduced in a previous release. With this update, secondary schedulers deployed through the Operator can create temporary files again and these secondary scheduler plugins can now be deployed successfully. (OCPBUGS-58154)
4.10.2.1.3. Known issues Copy linkLink copied to clipboard!
- Currently, you cannot deploy additional resources, such as config maps, CRDs, or RBAC policies through the Secondary Scheduler Operator. Any resources other than roles and role bindings that are required by your custom secondary scheduler must be applied externally. (WRKLDS-645)
4.10.2.2. Release notes for Secondary Scheduler Operator for Red Hat OpenShift 1.4.0 Copy linkLink copied to clipboard!
Issued: 6 May 2025
The following advisory is available for the Secondary Scheduler Operator for Red Hat OpenShift 1.4.0:
4.10.2.2.1. New features and enhancements Copy linkLink copied to clipboard!
- This release of the Secondary Scheduler Operator updates the Kubernetes version to 1.31.
4.10.2.2.2. Bug fixes Copy linkLink copied to clipboard!
- This release of the Secondary Scheduler Operator addresses several Common Vulnerabilities and Exposures (CVEs).
4.10.2.2.3. Known issues Copy linkLink copied to clipboard!
- Currently, you cannot deploy additional resources, such as config maps, CRDs, or RBAC policies through the Secondary Scheduler Operator. Any resources other than roles and role bindings that are required by your custom secondary scheduler must be applied externally. (WRKLDS-645)
4.10.3. Scheduling pods using a secondary scheduler Copy linkLink copied to clipboard!
You can run a custom secondary scheduler in OpenShift Container Platform by installing the Secondary Scheduler Operator, deploying the secondary scheduler, and setting the secondary scheduler in the pod definition.
4.10.3.1. Installing the Secondary Scheduler Operator Copy linkLink copied to clipboard!
You can use the web console to install the Secondary Scheduler Operator for Red Hat OpenShift.
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - You have access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Create the required namespace for the Secondary Scheduler Operator for Red Hat OpenShift.
-
Navigate to Administration
Namespaces and click Create Namespace. -
Enter
openshift-secondary-scheduler-operatorin the Name field and click Create.
-
Navigate to Administration
Install the Secondary Scheduler Operator for Red Hat OpenShift.
-
Navigate to Operators
OperatorHub. - Enter Secondary Scheduler Operator for Red Hat OpenShift into the filter box.
- Select the Secondary Scheduler Operator for Red Hat OpenShift and click Install.
On the Install Operator page:
- The Update channel is set to stable, which installs the latest stable release of the Secondary Scheduler Operator for Red Hat OpenShift.
- Select A specific namespace on the cluster and select openshift-secondary-scheduler-operator from the drop-down menu.
Select an Update approval strategy.
- The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual strategy requires a user with appropriate credentials to approve the Operator update.
- Click Install.
-
Navigate to Operators
Verification
-
Navigate to Operators
Installed Operators. - Verify that Secondary Scheduler Operator for Red Hat OpenShift is listed with a Status of Succeeded.
4.10.3.2. Deploying a secondary scheduler Copy linkLink copied to clipboard!
After you have installed the Secondary Scheduler Operator, you can deploy a secondary scheduler.
Prerequisities
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - You have access to the OpenShift Container Platform web console.
- The Secondary Scheduler Operator for Red Hat OpenShift is installed.
Procedure
- Log in to the OpenShift Container Platform web console.
Create config map to hold the configuration for the secondary scheduler.
-
Navigate to Workloads
ConfigMaps. - Click Create ConfigMap.
In the YAML editor, enter the config map definition that contains the necessary
KubeSchedulerConfigurationconfiguration. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of the config map. This is used in the Scheduler Config field when creating the
SecondarySchedulerCR. - 2
- The config map must be created in the
openshift-secondary-scheduler-operatornamespace. - 3
- The
KubeSchedulerConfigurationresource for the secondary scheduler. For more information, seeKubeSchedulerConfigurationin the Kubernetes API documentation. - 4
- The name of the secondary scheduler. Pods that set their
spec.schedulerNamefield to this value are scheduled with this secondary scheduler. - 5
- The plugins to enable or disable for the secondary scheduler. For a list default scheduling plugins, see Scheduling plugins in the Kubernetes documentation.
- Click Create.
-
Navigate to Workloads
Create the
SecondarySchedulerCR:-
Navigate to Operators
Installed Operators. - Select Secondary Scheduler Operator for Red Hat OpenShift.
- Select the Secondary Scheduler tab and click Create SecondaryScheduler.
-
The Name field defaults to
cluster; do not change this name. -
The Scheduler Config field defaults to
secondary-scheduler-config. Ensure that this value matches the name of the config map created earlier in this procedure. In the Scheduler Image field, enter the image name for your custom scheduler.
ImportantRed Hat does not directly support the functionality of your custom secondary scheduler.
- Click Create.
-
Navigate to Operators
4.10.3.3. Scheduling a pod using the secondary scheduler Copy linkLink copied to clipboard!
To schedule a pod using the secondary scheduler, set the schedulerName field in the pod definition.
Prerequisities
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - You have access to the OpenShift Container Platform web console.
- The Secondary Scheduler Operator for Red Hat OpenShift is installed.
- A secondary scheduler is configured.
Procedure
- Log in to the OpenShift Container Platform web console.
-
Navigate to Workloads
Pods. - Click Create Pod.
In the YAML editor, enter the desired pod configuration and add the
schedulerNamefield:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
schedulerNamefield must match the name that is defined in the config map when you configured the secondary scheduler.
- Click Create.
Verification
- Log in to the OpenShift CLI.
Describe the pod using the following command:
oc describe pod nginx -n default
$ oc describe pod nginx -n defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
In the events table, find the event with a message similar to
Successfully assigned <namespace>/<pod_name> to <node_name>. In the "From" column, verify that the event was generated from the secondary scheduler and not the default scheduler.
NoteYou can also check the
secondary-scheduler-*pod logs in theopenshift-secondary-scheduler-namespaceto verify that the pod was scheduled by the secondary scheduler.
4.10.4. Uninstalling the Secondary Scheduler Operator Copy linkLink copied to clipboard!
You can remove the Secondary Scheduler Operator for Red Hat OpenShift from OpenShift Container Platform by uninstalling the Operator and removing its related resources.
4.10.4.1. Uninstalling the Secondary Scheduler Operator Copy linkLink copied to clipboard!
You can uninstall the Secondary Scheduler Operator for Red Hat OpenShift by using the web console.
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - You have access to the OpenShift Container Platform web console.
- The Secondary Scheduler Operator for Red Hat OpenShift is installed.
Procedure
- Log in to the OpenShift Container Platform web console.
Uninstall the Secondary Scheduler Operator for Red Hat OpenShift Operator.
-
Navigate to Operators
Installed Operators. -
Click the Options menu
next to the Secondary Scheduler Operator entry and click Uninstall Operator.
- In the confirmation dialog, click Uninstall.
-
Navigate to Operators
4.10.4.2. Removing Secondary Scheduler Operator resources Copy linkLink copied to clipboard!
Optionally, after uninstalling the Secondary Scheduler Operator for Red Hat OpenShift, you can remove its related resources from your cluster.
Prerequisites
-
You are logged in to OpenShift Container Platform as a user with the
cluster-adminrole. - You have access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Remove CRDs that were installed by the Secondary Scheduler Operator:
-
Navigate to Administration
CustomResourceDefinitions. -
Enter
SecondarySchedulerin the Name field to filter the CRDs. -
Click the Options menu
next to the SecondaryScheduler CRD and select Delete Custom Resource Definition:
-
Navigate to Administration
Remove the
openshift-secondary-scheduler-operatornamespace.-
Navigate to Administration
Namespaces. -
Click the Options menu
next to the openshift-secondary-scheduler-operator and select Delete Namespace.
-
In the confirmation dialog, enter
openshift-secondary-scheduler-operatorin the field and click Delete.
-
Navigate to Administration