Este contenido no está disponible en el idioma seleccionado.
Chapter 3. Controlling pod placement onto nodes (scheduling)
3.1. Controlling pod placement using the scheduler Copiar enlaceEnlace copiado en el portapapeles!
Pod scheduling is an internal process that determines placement of new pods onto nodes within the cluster.
The scheduler code has a clean separation that watches new pods as they get created and identifies the most suitable node to host them. It then creates bindings (pod to node bindings) for the pods using the master API.
- Default pod scheduling
- OpenShift Container Platform comes with a default scheduler that serves the needs of most users. The default scheduler uses both inherent and customization tools to determine the best fit for a pod.
- Advanced pod scheduling
In situations where you might want more control over where new pods are placed, the OpenShift Container Platform advanced scheduling features allow you to configure a pod so that the pod is required or has a preference to run on a particular node or alongside a specific pod by.
- Using pod affinity and anti-affinity rules.
- Controlling pod placement with pod affinity.
- Controlling pod placement with node affinity.
- Placing pods on overcomitted nodes.
- Controlling pod placement with node selectors.
- Controlling pod placement with taints and tolerations.
3.1.1. Scheduler Use Cases Copiar enlaceEnlace copiado en el portapapeles!
One of the important use cases for scheduling within OpenShift Container Platform is to support flexible affinity and anti-affinity policies.
3.1.1.1. Infrastructure Topological Levels Copiar enlaceEnlace copiado en el portapapeles!
Administrators can define multiple topological levels for their infrastructure (nodes) by specifying labels on nodes. For example:
region=r1
zone=z1
rack=s1
These label names have no particular meaning and administrators are free to name their infrastructure levels anything, such as city/building/room. Also, administrators can define any number of levels for their infrastructure topology, with three levels usually being adequate (such as:
regions
zones
racks
3.1.1.2. Affinity Copiar enlaceEnlace copiado en el portapapeles!
Administrators should be able to configure the scheduler to specify affinity at any topological level, or even at multiple levels. Affinity at a particular level indicates that all pods that belong to the same service are scheduled onto nodes that belong to the same level. This handles any latency requirements of applications by allowing administrators to ensure that peer pods do not end up being too geographically separated. If no node is available within the same affinity group to host the pod, then the pod is not scheduled.
If you need greater control over where the pods are scheduled, see Controlling pod placement on nodes using node affinity rules and Placing pods relative to other pods using affinity and anti-affinity rules.
These advanced scheduling features allow administrators to specify which node a pod can be scheduled on and to force or reject scheduling relative to other pods.
3.1.1.3. Anti-Affinity Copiar enlaceEnlace copiado en el portapapeles!
Administrators should be able to configure the scheduler to specify anti-affinity at any topological level, or even at multiple levels. Anti-affinity (or 'spread') at a particular level indicates that all pods that belong to the same service are spread across nodes that belong to that level. This ensures that the application is well spread for high availability purposes. The scheduler tries to balance the service pods across all applicable nodes as evenly as possible.
If you need greater control over where the pods are scheduled, see Controlling pod placement on nodes using node affinity rules and Placing pods relative to other pods using affinity and anti-affinity rules.
These advanced scheduling features allow administrators to specify which node a pod can be scheduled on and to force or reject scheduling relative to other pods.
3.2. Configuring the default scheduler to control pod placement Copiar enlaceEnlace copiado en el portapapeles!
The default OpenShift Container Platform pod scheduler is responsible for determining placement of new pods onto nodes within the cluster. It reads data from the pod and tries to find a node that is a good fit based on configured policies. It is completely independent and exists as a standalone/pluggable solution. It does not modify the pod and just creates a binding for the pod that ties the pod to the particular node.
Configuring a scheduler policy is deprecated and is planned for removal in a future release. For more information on the Technology Preview alternative, see Scheduling pods using a scheduler profile.
A selection of predicates and priorities defines the policy for the scheduler. See Modifying scheduler policy for a list of predicates and priorities.
Sample default scheduler object
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
annotations:
release.openshift.io/create-only: "true"
creationTimestamp: 2019-05-20T15:39:01Z
generation: 1
name: cluster
resourceVersion: "1491"
selfLink: /apis/config.openshift.io/v1/schedulers/cluster
uid: 6435dd99-7b15-11e9-bd48-0aec821b8e34
spec:
policy:
name: scheduler-policy
defaultNodeSelector: type=user-node,region=east
- 1
- You can specify the name of a custom scheduler policy file.
- 2
- Optional: Specify a default node selector to restrict pod placement to specific nodes. The default node selector is applied to the pods created in all namespaces. Pods can be scheduled on nodes with labels that match the default node selector and any existing pod node selectors. Namespaces having project-wide node selectors are not impacted even if this field is set.
3.2.1. Understanding default scheduling Copiar enlaceEnlace copiado en el portapapeles!
The existing generic scheduler is the default platform-provided scheduler engine that selects a node to host the pod in a three-step operation:
- Filters the Nodes
- The available nodes are filtered based on the constraints or requirements specified. This is done by running each node through the list of filter functions called predicates.
- Prioritize the Filtered List of Nodes
- This is achieved by passing each node through a series of priority_ functions that assign it a score between 0 - 10, with 0 indicating a bad fit and 10 indicating a good fit to host the pod. The scheduler configuration can also take in a simple weight (positive numeric value) for each priority function. The node score provided by each priority function is multiplied by the weight (default weight for most priorities is 1) and then combined by adding the scores for each node provided by all the priorities. This weight attribute can be used by administrators to give higher importance to some priorities.
- Select the Best Fit Node
- The nodes are sorted based on their scores and the node with the highest score is selected to host the pod. If multiple nodes have the same high score, then one of them is selected at random.
3.2.1.1. Understanding Scheduler Policy Copiar enlaceEnlace copiado en el portapapeles!
The selection of the predicate and priorities defines the policy for the scheduler.
The scheduler configuration file is a JSON file, which must be named
policy.cfg
In the absence of the scheduler policy file, the default scheduler behavior is used.
The predicates and priorities defined in the scheduler configuration file completely override the default scheduler policy. If any of the default predicates and priorities are required, you must explicitly specify the functions in the policy configuration.
Sample scheduler config map
apiVersion: v1
data:
policy.cfg: |
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "MaxGCEPDVolumeCount"},
{"name" : "GeneralPredicates"},
{"name" : "MaxAzureDiskVolumeCount"},
{"name" : "MaxCSIVolumeCountPred"},
{"name" : "CheckVolumeBinding"},
{"name" : "MaxEBSVolumeCount"},
{"name" : "MatchInterPodAffinity"},
{"name" : "CheckNodeUnschedulable"},
{"name" : "NoDiskConflict"},
{"name" : "NoVolumeZoneConflict"},
{"name" : "PodToleratesNodeTaints"}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" : 1},
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "NodePreferAvoidPodsPriority", "weight" : 1},
{"name" : "NodeAffinityPriority", "weight" : 1},
{"name" : "TaintTolerationPriority", "weight" : 1},
{"name" : "ImageLocalityPriority", "weight" : 1},
{"name" : "SelectorSpreadPriority", "weight" : 1},
{"name" : "InterPodAffinityPriority", "weight" : 1},
{"name" : "EqualPriority", "weight" : 1}
]
}
kind: ConfigMap
metadata:
creationTimestamp: "2019-09-17T08:42:33Z"
name: scheduler-policy
namespace: openshift-config
resourceVersion: "59500"
selfLink: /api/v1/namespaces/openshift-config/configmaps/scheduler-policy
uid: 17ee8865-d927-11e9-b213-02d1e1709840`
- 1
- The
GeneralPredicatespredicate represents thePodFitsResources,HostName,PodFitsHostPorts, andMatchNodeSelectorpredicates. Because you are not allowed to configure the same predicate multiple times, theGeneralPredicatespredicate cannot be used alongside any of the four represented predicates.
3.2.2. Creating a scheduler policy file Copiar enlaceEnlace copiado en el portapapeles!
You can change the default scheduling behavior by creating a JSON file with the desired predicates and priorities. You then generate a config map from the JSON file and point the
cluster
Procedure
To configure the scheduler policy:
Create a JSON file named
with the desired predicates and priorities.policy.cfgSample scheduler JSON file
{ "kind" : "Policy", "apiVersion" : "v1", "predicates" : [1 {"name" : "MaxGCEPDVolumeCount"}, {"name" : "GeneralPredicates"}, {"name" : "MaxAzureDiskVolumeCount"}, {"name" : "MaxCSIVolumeCountPred"}, {"name" : "CheckVolumeBinding"}, {"name" : "MaxEBSVolumeCount"}, {"name" : "MatchInterPodAffinity"}, {"name" : "CheckNodeUnschedulable"}, {"name" : "NoDiskConflict"}, {"name" : "NoVolumeZoneConflict"}, {"name" : "PodToleratesNodeTaints"} ], "priorities" : [2 {"name" : "LeastRequestedPriority", "weight" : 1}, {"name" : "BalancedResourceAllocation", "weight" : 1}, {"name" : "ServiceSpreadingPriority", "weight" : 1}, {"name" : "NodePreferAvoidPodsPriority", "weight" : 1}, {"name" : "NodeAffinityPriority", "weight" : 1}, {"name" : "TaintTolerationPriority", "weight" : 1}, {"name" : "ImageLocalityPriority", "weight" : 1}, {"name" : "SelectorSpreadPriority", "weight" : 1}, {"name" : "InterPodAffinityPriority", "weight" : 1}, {"name" : "EqualPriority", "weight" : 1} ] }Create a config map based on the scheduler JSON file:
$ oc create configmap -n openshift-config --from-file=policy.cfg <configmap-name>1 - 1
- Enter a name for the config map.
For example:
$ oc create configmap -n openshift-config --from-file=policy.cfg scheduler-policyExample output
configmap/scheduler-policy createdTipYou can alternatively apply the following YAML to create the config map:
kind: ConfigMap apiVersion: v1 metadata: name: scheduler-policy namespace: openshift-config data:1 policy.cfg: | { "kind": "Policy", "apiVersion": "v1", "predicates": [ { "name": "RequireRegion", "argument": { "labelPreference": {"label": "region"}, {"presence": true} } } ], "priorities": [ { "name":"ZonePreferred", "weight" : 1, "argument": { "labelPreference": {"label": "zone"}, {"presence": true} } } ] }- 1
- The
policy.cfgfile in JSON format with predicates and priorities.
Edit the Scheduler Operator custom resource to add the config map:
$ oc patch Scheduler cluster --type='merge' -p '{"spec":{"policy":{"name":"<configmap-name>"}}}' --type=merge1 - 1
- Specify the name of the config map.
For example:
$ oc patch Scheduler cluster --type='merge' -p '{"spec":{"policy":{"name":"scheduler-policy"}}}' --type=mergeTipYou can alternatively apply the following YAML to add the config map:
apiVersion: config.openshift.io/v1 kind: Scheduler metadata: name: cluster spec: mastersSchedulable: false policy: name: scheduler-policy1 - 1
- Add the name of the scheduler policy config map.
After making the change to the
config resource, wait for theSchedulerpods to redeploy. This can take several minutes. Until the pods redeploy, new scheduler does not take effect.openshift-kube-apiserverVerify the scheduler policy is configured by viewing the log of a scheduler pod in the
namespace. The following command checks for the predicates and priorities that are being registered by the scheduler:openshift-kube-scheduler$ oc logs <scheduler-pod> | grep predicatesFor example:
$ oc logs openshift-kube-scheduler-ip-10-0-141-29.ec2.internal | grep predicatesExample output
Creating scheduler with fit predicates 'map[MaxGCEPDVolumeCount:{} MaxAzureDiskVolumeCount:{} CheckNodeUnschedulable:{} NoDiskConflict:{} NoVolumeZoneConflict:{} GeneralPredicates:{} MaxCSIVolumeCountPred:{} CheckVolumeBinding:{} MaxEBSVolumeCount:{} MatchInterPodAffinity:{} PodToleratesNodeTaints:{}]' and priority functions 'map[InterPodAffinityPriority:{} LeastRequestedPriority:{} ServiceSpreadingPriority:{} ImageLocalityPriority:{} SelectorSpreadPriority:{} EqualPriority:{} BalancedResourceAllocation:{} NodePreferAvoidPodsPriority:{} NodeAffinityPriority:{} TaintTolerationPriority:{}]'
3.2.3. Modifying scheduler policies Copiar enlaceEnlace copiado en el portapapeles!
You change scheduling behavior by creating or editing your scheduler policy config map in the
openshift-config
Procedure
To modify the current custom scheduling, use one of the following methods:
Edit the scheduler policy config map:
$ oc edit configmap <configmap-name> -n openshift-configFor example:
$ oc edit configmap scheduler-policy -n openshift-configExample output
apiVersion: v1 data: policy.cfg: | { "kind" : "Policy", "apiVersion" : "v1", "predicates" : [1 {"name" : "MaxGCEPDVolumeCount"}, {"name" : "GeneralPredicates"}, {"name" : "MaxAzureDiskVolumeCount"}, {"name" : "MaxCSIVolumeCountPred"}, {"name" : "CheckVolumeBinding"}, {"name" : "MaxEBSVolumeCount"}, {"name" : "MatchInterPodAffinity"}, {"name" : "CheckNodeUnschedulable"}, {"name" : "NoDiskConflict"}, {"name" : "NoVolumeZoneConflict"}, {"name" : "PodToleratesNodeTaints"} ], "priorities" : [2 {"name" : "LeastRequestedPriority", "weight" : 1}, {"name" : "BalancedResourceAllocation", "weight" : 1}, {"name" : "ServiceSpreadingPriority", "weight" : 1}, {"name" : "NodePreferAvoidPodsPriority", "weight" : 1}, {"name" : "NodeAffinityPriority", "weight" : 1}, {"name" : "TaintTolerationPriority", "weight" : 1}, {"name" : "ImageLocalityPriority", "weight" : 1}, {"name" : "SelectorSpreadPriority", "weight" : 1}, {"name" : "InterPodAffinityPriority", "weight" : 1}, {"name" : "EqualPriority", "weight" : 1} ] } kind: ConfigMap metadata: creationTimestamp: "2019-09-17T17:44:19Z" name: scheduler-policy namespace: openshift-config resourceVersion: "15370" selfLink: /api/v1/namespaces/openshift-config/configmaps/scheduler-policyIt can take a few minutes for the scheduler to restart the pods with the updated policy.
Change the policies and predicates being used:
Remove the scheduler policy config map:
$ oc delete configmap -n openshift-config <name>For example:
$ oc delete configmap -n openshift-config scheduler-policyEdit the
file to add and remove policies and predicates as needed.policy.cfgFor example:
$ vi policy.cfgExample output
apiVersion: v1 data: policy.cfg: | { "kind" : "Policy", "apiVersion" : "v1", "predicates" : [ {"name" : "MaxGCEPDVolumeCount"}, {"name" : "GeneralPredicates"}, {"name" : "MaxAzureDiskVolumeCount"}, {"name" : "MaxCSIVolumeCountPred"}, {"name" : "CheckVolumeBinding"}, {"name" : "MaxEBSVolumeCount"}, {"name" : "MatchInterPodAffinity"}, {"name" : "CheckNodeUnschedulable"}, {"name" : "NoDiskConflict"}, {"name" : "NoVolumeZoneConflict"}, {"name" : "PodToleratesNodeTaints"} ], "priorities" : [ {"name" : "LeastRequestedPriority", "weight" : 1}, {"name" : "BalancedResourceAllocation", "weight" : 1}, {"name" : "ServiceSpreadingPriority", "weight" : 1}, {"name" : "NodePreferAvoidPodsPriority", "weight" : 1}, {"name" : "NodeAffinityPriority", "weight" : 1}, {"name" : "TaintTolerationPriority", "weight" : 1}, {"name" : "ImageLocalityPriority", "weight" : 1}, {"name" : "SelectorSpreadPriority", "weight" : 1}, {"name" : "InterPodAffinityPriority", "weight" : 1}, {"name" : "EqualPriority", "weight" : 1} ] }Re-create the scheduler policy config map based on the scheduler JSON file:
$ oc create configmap -n openshift-config --from-file=policy.cfg <configmap-name>1 - 1
- Enter a name for the config map.
For example:
$ oc create configmap -n openshift-config --from-file=policy.cfg scheduler-policyExample output
configmap/scheduler-policy createdExample 3.1. Sample config map based on the scheduler JSON file
kind: ConfigMap apiVersion: v1 metadata: name: scheduler-policy namespace: openshift-config data: policy.cfg: | { "kind": "Policy", "apiVersion": "v1", "predicates": [ { "name": "RequireRegion", "argument": { "labelPreference": {"label": "region"}, {"presence": true} } } ], "priorities": [ { "name":"ZonePreferred", "weight" : 1, "argument": { "labelPreference": {"label": "zone"}, {"presence": true} } } ] }
3.2.3.1. Understanding the scheduler predicates Copiar enlaceEnlace copiado en el portapapeles!
Predicates are rules that filter out unqualified nodes.
There are several predicates provided by default in OpenShift Container Platform. Some of these predicates can be customized by providing certain parameters. Multiple predicates can be combined to provide additional filtering of nodes.
3.2.3.1.1. Static Predicates Copiar enlaceEnlace copiado en el portapapeles!
These predicates do not take any configuration parameters or inputs from the user. These are specified in the scheduler configuration using their exact name.
3.2.3.1.1.1. Default Predicates Copiar enlaceEnlace copiado en el portapapeles!
The default scheduler policy includes the following predicates:
The
NoVolumeZoneConflict
{"name" : "NoVolumeZoneConflict"}
The
MaxEBSVolumeCount
{"name" : "MaxEBSVolumeCount"}
The
MaxAzureDiskVolumeCount
{"name" : "MaxAzureDiskVolumeCount"}
The
PodToleratesNodeTaints
{"name" : "PodToleratesNodeTaints"}
The
CheckNodeUnschedulable
Unschedulable
{"name" : "CheckNodeUnschedulable"}
The
CheckVolumeBinding
- For PVCs that are bound, the predicate checks that the corresponding PV’s node affinity is satisfied by the given node.
- For PVCs that are unbound, the predicate searched for available PVs that can satisfy the PVC requirements and that the PV node affinity is satisfied by the given node.
The predicate returns true if all bound PVCs have compatible PVs with the node, and if all unbound PVCs can be matched with an available and node-compatible PV.
{"name" : "CheckVolumeBinding"}
The
NoDiskConflict
{"name" : "NoDiskConflict"}
The
MaxGCEPDVolumeCount
{"name" : "MaxGCEPDVolumeCount"}
The
MaxCSIVolumeCountPred
{"name" : "MaxCSIVolumeCountPred"}
The
MatchInterPodAffinity
{"name" : "MatchInterPodAffinity"}
3.2.3.1.1.2. Other Static Predicates Copiar enlaceEnlace copiado en el portapapeles!
OpenShift Container Platform also supports the following predicates:
The
CheckNode-*
The
CheckNodeCondition
{"name" : "CheckNodeCondition"}
The
CheckNodeLabelPresence
{"name" : "CheckNodeLabelPresence"}
The
checkServiceAffinity
{"name" : "checkServiceAffinity"}
The
PodToleratesNodeNoExecuteTaints
NoExecute
{"name" : "PodToleratesNodeNoExecuteTaints"}
3.2.3.1.2. General Predicates Copiar enlaceEnlace copiado en el portapapeles!
The following general predicates check whether non-critical predicates and essential predicates pass. Non-critical predicates are the predicates that only non-critical pods must pass and essential predicates are the predicates that all pods must pass.
The default scheduler policy includes the general predicates.
Non-critical general predicates
The
PodFitsResources
{"name" : "PodFitsResources"}
Essential general predicates
The
PodFitsHostPorts
{"name" : "PodFitsHostPorts"}
The
HostName
{"name" : "HostName"}
The
MatchNodeSelector
{"name" : "MatchNodeSelector"}
3.2.3.2. Understanding the scheduler priorities Copiar enlaceEnlace copiado en el portapapeles!
Priorities are rules that rank nodes according to preferences.
A custom set of priorities can be specified to configure the scheduler. There are several priorities provided by default in OpenShift Container Platform. Other priorities can be customized by providing certain parameters. Multiple priorities can be combined and different weights can be given to each to impact the prioritization.
3.2.3.2.1. Static Priorities Copiar enlaceEnlace copiado en el portapapeles!
Static priorities do not take any configuration parameters from the user, except weight. A weight is required to be specified and cannot be 0 or negative.
These are specified in the scheduler policy config map in the
openshift-config
3.2.3.2.1.1. Default Priorities Copiar enlaceEnlace copiado en el portapapeles!
The default scheduler policy includes the following priorities. Each of the priority function has a weight of
1
NodePreferAvoidPodsPriority
10000
The
NodeAffinityPriority
{"name" : "NodeAffinityPriority", "weight" : 1}
The
TaintTolerationPriority
PreferNoSchedule
{"name" : "TaintTolerationPriority", "weight" : 1}
The
ImageLocalityPriority
{"name" : "ImageLocalityPriority", "weight" : 1}
The
SelectorSpreadPriority
{"name" : "SelectorSpreadPriority", "weight" : 1}
The
InterPodAffinityPriority
weightedPodAffinityTerm
{"name" : "InterPodAffinityPriority", "weight" : 1}
The
LeastRequestedPriority
{"name" : "LeastRequestedPriority", "weight" : 1}
The
BalancedResourceAllocation
LeastRequestedPriority
{"name" : "BalancedResourceAllocation", "weight" : 1}
The
NodePreferAvoidPodsPriority
{"name" : "NodePreferAvoidPodsPriority", "weight" : 10000}
3.2.3.2.1.2. Other Static Priorities Copiar enlaceEnlace copiado en el portapapeles!
OpenShift Container Platform also supports the following priorities:
The
EqualPriority
1
{"name" : "EqualPriority", "weight" : 1}
The
MostRequestedPriority
{"name" : "MostRequestedPriority", "weight" : 1}
The
ServiceSpreadingPriority
{"name" : "ServiceSpreadingPriority", "weight" : 1}
3.2.3.2.2. Configurable Priorities Copiar enlaceEnlace copiado en el portapapeles!
You can configure these priorities in the scheduler policy config map, in the
openshift-config
The type of the priority function is identified by the argument that they take. Since these are configurable, multiple priorities of the same type (but different configuration parameters) can be combined as long as their user-defined names are different.
For information on using these priorities, see Modifying Scheduler Policy.
The
ServiceAntiAffinity
{
"kind": "Policy",
"apiVersion": "v1",
"priorities":[
{
"name":"<name>",
"weight" : 1
"argument":{
"serviceAntiAffinity":{
"label": "<label>"
}
}
}
]
}
For example:
{
"kind": "Policy",
"apiVersion": "v1",
"priorities": [
{
"name":"RackSpread",
"weight" : 1,
"argument": {
"serviceAntiAffinity": {
"label": "rack"
}
}
}
]
}
In some situations using the
ServiceAntiAffinity
The
labelPreference
labelPreference
{
"kind": "Policy",
"apiVersion": "v1",
"priorities":[
{
"name":"<name>",
"weight" : 1
"argument":{
"labelPreference":{
"label": "<label>",
"presence": true
}
}
}
]
}
3.2.4. Sample Policy Configurations Copiar enlaceEnlace copiado en el portapapeles!
The configuration below specifies the default scheduler configuration, if it were to be specified using the scheduler policy file.
{
"kind": "Policy",
"apiVersion": "v1",
"predicates": [
{
"name": "RegionZoneAffinity",
"argument": {
"serviceAffinity": {
"labels": ["region, zone"]
}
}
}
],
"priorities": [
{
"name":"RackSpread",
"weight" : 1,
"argument": {
"serviceAntiAffinity": {
"label": "rack"
}
}
}
]
}
In all of the sample configurations below, the list of predicates and priority functions is truncated to include only the ones that pertain to the use case specified. In practice, a complete/meaningful scheduler policy should include most, if not all, of the default predicates and priorities listed above.
The following example defines three topological levels, region (affinity)
{
"kind": "Policy",
"apiVersion": "v1",
"predicates": [
{
"name": "RegionZoneAffinity",
"argument": {
"serviceAffinity": {
"labels": ["region, zone"]
}
}
}
],
"priorities": [
{
"name":"RackSpread",
"weight" : 1,
"argument": {
"serviceAntiAffinity": {
"label": "rack"
}
}
}
]
}
The following example defines three topological levels,
city
building
room
{
"kind": "Policy",
"apiVersion": "v1",
"predicates": [
{
"name": "CityAffinity",
"argument": {
"serviceAffinity": {
"label": "city"
}
}
}
],
"priorities": [
{
"name":"BuildingSpread",
"weight" : 1,
"argument": {
"serviceAntiAffinity": {
"label": "building"
}
}
},
{
"name":"RoomSpread",
"weight" : 1,
"argument": {
"serviceAntiAffinity": {
"label": "room"
}
}
}
]
}
The following example defines a policy to only use nodes with the 'region' label defined and prefer nodes with the 'zone' label defined:
{
"kind": "Policy",
"apiVersion": "v1",
"predicates": [
{
"name": "RequireRegion",
"argument": {
"labelPreference": {
"labels": ["region"],
"presence": true
}
}
}
],
"priorities": [
{
"name":"ZonePreferred",
"weight" : 1,
"argument": {
"labelPreference": {
"label": "zone",
"presence": true
}
}
}
]
}
The following example combines both static and configurable predicates and also priorities:
{
"kind": "Policy",
"apiVersion": "v1",
"predicates": [
{
"name": "RegionAffinity",
"argument": {
"serviceAffinity": {
"labels": ["region"]
}
}
},
{
"name": "RequireRegion",
"argument": {
"labelsPresence": {
"labels": ["region"],
"presence": true
}
}
},
{
"name": "BuildingNodesAvoid",
"argument": {
"labelsPresence": {
"label": "building",
"presence": false
}
}
},
{"name" : "PodFitsPorts"},
{"name" : "MatchNodeSelector"}
],
"priorities": [
{
"name": "ZoneSpread",
"weight" : 2,
"argument": {
"serviceAntiAffinity":{
"label": "zone"
}
}
},
{
"name":"ZonePreferred",
"weight" : 1,
"argument": {
"labelPreference":{
"label": "zone",
"presence": true
}
}
},
{"name" : "ServiceSpreadingPriority", "weight" : 1}
]
}
3.3. Scheduling pods using a scheduler profile Copiar enlaceEnlace copiado en el portapapeles!
You can configure OpenShift Container Platform to use a scheduling profile to schedule pods onto nodes within the cluster.
Enabling a scheduler profile is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.
3.3.1. About scheduler profiles Copiar enlaceEnlace copiado en el portapapeles!
You can specify a scheduler profile to control how pods are scheduled onto nodes.
Scheduler profiles are an alternative to configuring a scheduler policy. Do not set both a scheduler policy and a scheduler profile. If both are set, the scheduler policy takes precedence.
The following scheduler profiles are available:
LowNodeUtilization- This profile attempts to spread pods evenly across nodes to get low resource usage per node. This profile provides the default scheduler behavior.
HighNodeUtilization- This profile attempts to place as many pods as possible on to as few nodes as possible. This minimizes node count and has high resource usage per node.
NoScoring- This is a low-latency profile that strives for the quickest scheduling cycle by disabling all score plugins. This might sacrifice better scheduling decisions for faster ones.
3.3.2. Configuring a scheduler profile Copiar enlaceEnlace copiado en el portapapeles!
You can configure the scheduler to use a scheduler profile.
Do not set both a scheduler policy and a scheduler profile. If both are set, the scheduler policy takes precedence.
Prerequisites
-
Access to the cluster as a user with the role.
cluster-admin
Procedure
Edit the
object:Scheduler$ oc edit scheduler clusterSpecify the profile to use in the
field:spec.profileapiVersion: config.openshift.io/v1 kind: Scheduler metadata: ... name: cluster resourceVersion: "601" selfLink: /apis/config.openshift.io/v1/schedulers/cluster uid: b351d6d0-d06f-4a99-a26b-87af62e79f59 spec: mastersSchedulable: false policy: name: "" profile: HighNodeUtilization1 - 1
- Set to
LowNodeUtilization,HighNodeUtilization, orNoScoring.
- Save the file to apply the changes.
3.4. Placing pods relative to other pods using affinity and anti-affinity rules Copiar enlaceEnlace copiado en el portapapeles!
Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.
In OpenShift Container Platform pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.
3.4.1. Understanding pod affinity Copiar enlaceEnlace copiado en el portapapeles!
Pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.
- Pod affinity can tell the scheduler to locate a new pod on the same node as other pods if the label selector on the new pod matches the label on the current pod.
- Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.
For example, using affinity rules, you could spread or pack pods within a service or relative to pods in other services. Anti-affinity rules allow you to prevent pods of a particular service from scheduling on the same nodes as pods of another service that are known to interfere with the performance of the pods of the first service. Or, you could spread the pods of a service across nodes or availability zones to reduce correlated failures.
There are two types of pod affinity rules: required and preferred.
Required rules must be met before a pod can be scheduled on a node. Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.
Depending on your pod priority and preemption settings, the scheduler might not be able to find an appropriate node for a pod without violating affinity requirements. If so, a pod might not be scheduled.
To prevent this situation, carefully configure pod affinity with equal-priority pods.
You configure pod affinity/anti-affinity through the
Pod
The following example shows a
Pod
In this example, the pod affinity rule indicates that the pod can schedule onto a node only if that node has at least one already-running pod with a label that has the key
security
S1
security
S2
Sample Pod config file with pod affinity
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: failure-domain.beta.kubernetes.io/zone
containers:
- name: with-pod-affinity
image: docker.io/ocpqe/hello-pod
- 1
- Stanza to configure pod affinity.
- 2
- Defines a required rule.
- 3 5
- The key and value (label) that must be matched to apply the rule.
- 4
- The operator represents the relationship between the label on the existing pod and the set of values in the
matchExpressionparameters in the specification for the new pod. Can beIn,NotIn,Exists, orDoesNotExist.
Sample Pod config file with pod anti-affinity
apiVersion: v1
kind: Pod
metadata:
name: with-pod-antiaffinity
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: docker.io/ocpqe/hello-pod
- 1
- Stanza to configure pod anti-affinity.
- 2
- Defines a preferred rule.
- 3
- Specifies a weight for a preferred rule. The node with the highest weight is preferred.
- 4
- Description of the pod label that determines when the anti-affinity rule applies. Specify a key and value for the label.
- 5
- The operator represents the relationship between the label on the existing pod and the set of values in the
matchExpressionparameters in the specification for the new pod. Can beIn,NotIn,Exists, orDoesNotExist.
If labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node.
3.4.2. Configuring a pod affinity rule Copiar enlaceEnlace copiado en el portapapeles!
The following steps demonstrate a simple two-pod configuration that creates pod with a label and a pod that uses affinity to allow scheduling with that pod.
Procedure
Create a pod with a specific label in the
spec:Pod$ cat team4.yaml apiVersion: v1 kind: Pod metadata: name: security-s1 labels: security: S1 spec: containers: - name: security-s1 image: docker.io/ocpqe/hello-podWhen creating other pods, edit the
spec as follows:Pod-
Use the stanza to configure the
podAffinityparameter orrequiredDuringSchedulingIgnoredDuringExecutionparameter:preferredDuringSchedulingIgnoredDuringExecution Specify the key and value that must be met. If you want the new pod to be scheduled with the other pod, use the same
andkeyparameters as the label on the first pod.valuepodAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: failure-domain.beta.kubernetes.io/zone-
Specify an . The operator can be
operator,In,NotIn, orExists. For example, use the operatorDoesNotExistto require the label to be in the node.In -
Specify a , which is a prepopulated Kubernetes label that the system uses to denote such a topology domain.
topologyKey
-
Use the
Create the pod.
$ oc create -f <pod-spec>.yaml
3.4.3. Configuring a pod anti-affinity rule Copiar enlaceEnlace copiado en el portapapeles!
The following steps demonstrate a simple two-pod configuration that creates pod with a label and a pod that uses an anti-affinity preferred rule to attempt to prevent scheduling with that pod.
Procedure
Create a pod with a specific label in the
spec:Pod$ cat team4.yaml apiVersion: v1 kind: Pod metadata: name: security-s2 labels: security: S2 spec: containers: - name: security-s2 image: docker.io/ocpqe/hello-pod-
When creating other pods, edit the spec to set the following parameters:
Pod Use the
stanza to configure thepodAntiAffinityparameter orrequiredDuringSchedulingIgnoredDuringExecutionparameter:preferredDuringSchedulingIgnoredDuringExecution- Specify a weight for the node, 1-100. The node that with highest weight is preferred.
Specify the key and values that must be met. If you want the new pod to not be scheduled with the other pod, use the same
andkeyparameters as the label on the first pod.valuepodAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: kubernetes.io/hostname- For a preferred rule, specify a weight, 1-100.
-
Specify an . The operator can be
operator,In,NotIn, orExists. For example, use the operatorDoesNotExistto require the label to be in the node.In
-
Specify a , which is a prepopulated Kubernetes label that the system uses to denote such a topology domain.
topologyKey Create the pod.
$ oc create -f <pod-spec>.yaml
3.4.4. Sample pod affinity and anti-affinity rules Copiar enlaceEnlace copiado en el portapapeles!
The following examples demonstrate pod affinity and pod anti-affinity.
3.4.4.1. Pod Affinity Copiar enlaceEnlace copiado en el portapapeles!
The following example demonstrates pod affinity for pods with matching labels and label selectors.
The pod team4 has the label
.team:4$ cat team4.yaml apiVersion: v1 kind: Pod metadata: name: team4 labels: team: "4" spec: containers: - name: ocp image: docker.io/ocpqe/hello-podThe pod team4a has the label selector
underteam:4.podAffinity$ cat pod-team4a.yaml apiVersion: v1 kind: Pod metadata: name: team4a spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: team operator: In values: - "4" topologyKey: kubernetes.io/hostname containers: - name: pod-affinity image: docker.io/ocpqe/hello-pod- The team4a pod is scheduled on the same node as the team4 pod.
3.4.4.2. Pod Anti-affinity Copiar enlaceEnlace copiado en el portapapeles!
The following example demonstrates pod anti-affinity for pods with matching labels and label selectors.
The pod pod-s1 has the label
.security:s1cat pod-s1.yaml apiVersion: v1 kind: Pod metadata: name: pod-s1 labels: security: s1 spec: containers: - name: ocp image: docker.io/ocpqe/hello-podThe pod pod-s2 has the label selector
undersecurity:s1.podAntiAffinitycat pod-s2.yaml apiVersion: v1 kind: Pod metadata: name: pod-s2 spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - s1 topologyKey: kubernetes.io/hostname containers: - name: pod-antiaffinity image: docker.io/ocpqe/hello-pod-
The pod pod-s2 cannot be scheduled on the same node as .
pod-s1
3.4.4.3. Pod Affinity with no Matching Labels Copiar enlaceEnlace copiado en el portapapeles!
The following example demonstrates pod affinity for pods without matching labels and label selectors.
The pod pod-s1 has the label
.security:s1$ cat pod-s1.yaml apiVersion: v1 kind: Pod metadata: name: pod-s1 labels: security: s1 spec: containers: - name: ocp image: docker.io/ocpqe/hello-podThe pod pod-s2 has the label selector
.security:s2$ cat pod-s2.yaml apiVersion: v1 kind: Pod metadata: name: pod-s2 spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - s2 topologyKey: kubernetes.io/hostname containers: - name: pod-affinity image: docker.io/ocpqe/hello-podThe pod pod-s2 is not scheduled unless there is a node with a pod that has the
label. If there is no other pod with that label, the new pod remains in a pending state:security:s2Example output
NAME READY STATUS RESTARTS AGE IP NODE pod-s2 0/1 Pending 0 32s <none>
3.5. Controlling pod placement on nodes using node affinity rules Copiar enlaceEnlace copiado en el portapapeles!
Affinity is a property of pods that controls the nodes on which they prefer to be scheduled.
In OpenShift Container Platform node affinity is a set of rules used by the scheduler to determine where a pod can be placed. The rules are defined using custom labels on the nodes and label selectors specified in pods.
3.5.1. Understanding node affinity Copiar enlaceEnlace copiado en el portapapeles!
Node affinity allows a pod to specify an affinity towards a group of nodes it can be placed on. The node does not have control over the placement.
For example, you could configure a pod to only run on a node with a specific CPU or in a specific availability zone.
There are two types of node affinity rules: required and preferred.
Required rules must be met before a pod can be scheduled on a node. Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.
If labels on a node change at runtime that results in an node affinity rule on a pod no longer being met, the pod continues to run on the node.
You configure node affinity through the
Pod
The following example is a
Pod
e2e-az-NorthSouth
e2e-az-North
e2e-az-South
Example pod configuration file with a node affinity required rule
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: e2e-az-NorthSouth
operator: In
values:
- e2e-az-North
- e2e-az-South
containers:
- name: with-node-affinity
image: docker.io/ocpqe/hello-pod
- 1
- The stanza to configure node affinity.
- 2
- Defines a required rule.
- 3 5 6
- The key/value pair (label) that must be matched to apply the rule.
- 4
- The operator represents the relationship between the label on the node and the set of values in the
matchExpressionparameters in thePodspec. This value can beIn,NotIn,Exists, orDoesNotExist,Lt, orGt.
The following example is a node specification with a preferred rule that a node with a label whose key is
e2e-az-EastWest
e2e-az-East
e2e-az-West
Example pod configuration file with a node affinity preferred rule
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: e2e-az-EastWest
operator: In
values:
- e2e-az-East
- e2e-az-West
containers:
- name: with-node-affinity
image: docker.io/ocpqe/hello-pod
- 1
- The stanza to configure node affinity.
- 2
- Defines a preferred rule.
- 3
- Specifies a weight for a preferred rule. The node with highest weight is preferred.
- 4 6 7
- The key/value pair (label) that must be matched to apply the rule.
- 5
- The operator represents the relationship between the label on the node and the set of values in the
matchExpressionparameters in thePodspec. This value can beIn,NotIn,Exists, orDoesNotExist,Lt, orGt.
There is no explicit node anti-affinity concept, but using the
NotIn
DoesNotExist
If you are using node affinity and node selectors in the same pod configuration, note the following:
-
If you configure both and
nodeSelector, both conditions must be satisfied for the pod to be scheduled onto a candidate node.nodeAffinity -
If you specify multiple associated with
nodeSelectorTermstypes, then the pod can be scheduled onto a node if one of thenodeAffinityis satisfied.nodeSelectorTerms -
If you specify multiple associated with
matchExpressions, then the pod can be scheduled onto a node only if allnodeSelectorTermsare satisfied.matchExpressions
3.5.2. Configuring a required node affinity rule Copiar enlaceEnlace copiado en el portapapeles!
Required rules must be met before a pod can be scheduled on a node.
Procedure
The following steps demonstrate a simple configuration that creates a node and a pod that the scheduler is required to place on the node.
Add a label to a node using the
command:oc label node$ oc label node node1 e2e-az-name=e2e-az1TipYou can alternatively apply the following YAML to add the label:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: e2e-az-name: e2e-az1In the
spec, use thePodstanza to configure thenodeAffinityparameter:requiredDuringSchedulingIgnoredDuringExecution-
Specify the key and values that must be met. If you want the new pod to be scheduled on the node you edited, use the same and
keyparameters as the label in the node.value Specify an
. The operator can beoperator,In,NotIn,Exists,DoesNotExist, orLt. For example, use the operatorGtto require the label to be in the node:InExample output
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: e2e-az-name operator: In values: - e2e-az1 - e2e-az2
-
Specify the key and values that must be met. If you want the new pod to be scheduled on the node you edited, use the same
Create the pod:
$ oc create -f e2e-az2.yaml
3.5.3. Configuring a preferred node affinity rule Copiar enlaceEnlace copiado en el portapapeles!
Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.
Procedure
The following steps demonstrate a simple configuration that creates a node and a pod that the scheduler tries to place on the node.
Add a label to a node using the
command:oc label node$ oc label node node1 e2e-az-name=e2e-az3In the
spec, use thePodstanza to configure thenodeAffinityparameter:preferredDuringSchedulingIgnoredDuringExecution- Specify a weight for the node, as a number 1-100. The node with highest weight is preferred.
Specify the key and values that must be met. If you want the new pod to be scheduled on the node you edited, use the same
andkeyparameters as the label in the node:valuespec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: e2e-az-name operator: In values: - e2e-az3-
Specify an . The operator can be
operator,In,NotIn,Exists,DoesNotExist, orLt. For example, use the OperatorGtto require the label to be in the node.In
Create the pod.
$ oc create -f e2e-az3.yaml
3.5.4. Sample node affinity rules Copiar enlaceEnlace copiado en el portapapeles!
The following examples demonstrate node affinity.
3.5.4.1. Node affinity with matching labels Copiar enlaceEnlace copiado en el portapapeles!
The following example demonstrates node affinity for a node and pod with matching labels:
The Node1 node has the label
:zone:us$ oc label node node1 zone=usTipYou can alternatively apply the following YAML to add the label:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: zone: usThe pod-s1 pod has the
andzonekey/value pair under a required node affinity rule:us$ cat pod-s1.yamlExample output
apiVersion: v1 kind: Pod metadata: name: pod-s1 spec: containers: - image: "docker.io/ocpqe/hello-pod" name: hello-pod affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "zone" operator: In values: - usThe pod-s1 pod can be scheduled on Node1:
$ oc get pod -o wideExample output
NAME READY STATUS RESTARTS AGE IP NODE pod-s1 1/1 Running 0 4m IP1 node1
3.5.4.2. Node affinity with no matching labels Copiar enlaceEnlace copiado en el portapapeles!
The following example demonstrates node affinity for a node and pod without matching labels:
The Node1 node has the label
:zone:emea$ oc label node node1 zone=emeaTipYou can alternatively apply the following YAML to add the label:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: zone: emeaThe pod-s1 pod has the
andzonekey/value pair under a required node affinity rule:us$ cat pod-s1.yamlExample output
apiVersion: v1 kind: Pod metadata: name: pod-s1 spec: containers: - image: "docker.io/ocpqe/hello-pod" name: hello-pod affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "zone" operator: In values: - usThe pod-s1 pod cannot be scheduled on Node1:
$ oc describe pod pod-s1Example output
... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason --------- -------- ----- ---- ------------- -------- ------ 1m 33s 8 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: MatchNodeSelector (1).
3.6. Placing pods onto overcommited nodes Copiar enlaceEnlace copiado en el portapapeles!
In an overcommited state, the sum of the container compute resource requests and limits exceeds the resources available on the system. Overcommitment might be desirable in development environments where a trade-off of guaranteed performance for capacity is acceptable.
Requests and limits enable administrators to allow and manage the overcommitment of resources on a node. The scheduler uses requests for scheduling your container and providing a minimum service guarantee. Limits constrain the amount of compute resource that may be consumed on your node.
3.6.1. Understanding overcommitment Copiar enlaceEnlace copiado en el portapapeles!
Requests and limits enable administrators to allow and manage the overcommitment of resources on a node. The scheduler uses requests for scheduling your container and providing a minimum service guarantee. Limits constrain the amount of compute resource that may be consumed on your node.
OpenShift Container Platform administrators can control the level of overcommit and manage container density on nodes by configuring masters to override the ratio between request and limit set on developer containers. In conjunction with a per-project
LimitRange
That these overrides have no effect if no limits have been set on containers. Create a
LimitRange
After these overrides, the container limits and requests must still be validated by any
LimitRange
LimitRange
3.6.2. Understanding nodes overcommitment Copiar enlaceEnlace copiado en el portapapeles!
In an overcommitted environment, it is important to properly configure your node to provide best system behavior.
When the node starts, it ensures that the kernel tunable flags for memory management are set properly. The kernel should never fail memory allocations unless it runs out of physical memory.
To ensure this behavior, OpenShift Container Platform configures the kernel to always overcommit memory by setting the
vm.overcommit_memory
1
OpenShift Container Platform also configures the kernel not to panic when it runs out of memory by setting the
vm.panic_on_oom
0
You can view the current setting by running the following commands on your nodes:
$ sysctl -a |grep commit
Example output
vm.overcommit_memory = 1
$ sysctl -a |grep panic
Example output
vm.panic_on_oom = 0
The above flags should already be set on nodes, and no further action is required.
You can also perform the following configurations for each node:
- Disable or enforce CPU limits using CPU CFS quotas
- Reserve resources for system processes
- Reserve memory across quality of service tiers
3.7. Controlling pod placement using node taints Copiar enlaceEnlace copiado en el portapapeles!
Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.
3.7.1. Understanding taints and tolerations Copiar enlaceEnlace copiado en el portapapeles!
A taint allows a node to refuse a pod to be scheduled unless that pod has a matching toleration.
You apply taints to a node through the
Node
NodeSpec
Pod
PodSpec
Example taint in a node specification
spec:
taints:
- effect: NoExecute
key: key1
value: value1
....
Example toleration in a Pod spec
spec:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
tolerationSeconds: 3600
....
Taints and tolerations consist of a key, value, and effect.
| Parameter | Description | ||||||
|---|---|---|---|---|---|---|---|
|
| The
| ||||||
|
| The
| ||||||
|
| The effect is one of the following:
| ||||||
|
|
|
If you add a
taint to a control plane node (also known as the master node) the node must have theNoScheduletaint, which is added by default.node-role.kubernetes.io/master=:NoScheduleFor example:
apiVersion: v1 kind: Node metadata: annotations: machine.openshift.io/machine: openshift-machine-api/ci-ln-62s7gtb-f76d1-v8jxv-master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-cdc1ab7da414629332cc4c3926e6e59c ... spec: taints: - effect: NoSchedule key: node-role.kubernetes.io/master ...
A toleration matches a taint:
If the
parameter is set tooperator:Equal-
the parameters are the same;
key -
the parameters are the same;
value -
the parameters are the same.
effect
-
the
If the
parameter is set tooperator:Exists-
the parameters are the same;
key -
the parameters are the same.
effect
-
the
The following taints are built into OpenShift Container Platform:
-
: The node is not ready. This corresponds to the node condition
node.kubernetes.io/not-ready.Ready=False -
: The node is unreachable from the node controller. This corresponds to the node condition
node.kubernetes.io/unreachable.Ready=Unknown -
: The node has memory pressure issues. This corresponds to the node condition
node.kubernetes.io/memory-pressure.MemoryPressure=True -
: The node has disk pressure issues. This corresponds to the node condition
node.kubernetes.io/disk-pressure.DiskPressure=True -
: The node network is unavailable.
node.kubernetes.io/network-unavailable -
: The node is unschedulable.
node.kubernetes.io/unschedulable -
: When the node controller is started with an external cloud provider, this taint is set on a node to mark it as unusable. After a controller from the cloud-controller-manager initializes this node, the kubelet removes this taint.
node.cloudprovider.kubernetes.io/uninitialized - : The node has pid pressure. This corresponds to the node condition
node.kubernetes.io/pid-pressure.PIDPressure=TrueImportantOpenShift Container Platform does not set a default pid.available
.evictionHard
3.7.1.1. Understanding how to use toleration seconds to delay pod evictions Copiar enlaceEnlace copiado en el portapapeles!
You can specify how long a pod can remain bound to a node before being evicted by specifying the
tolerationSeconds
Pod
MachineSet
NoExecute
tolerationSeconds
Example output
spec:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
tolerationSeconds: 3600
Here, if this pod is running but does not have a matching toleration, the pod stays bound to the node for 3,600 seconds and then be evicted. If the taint is removed before that time, the pod is not evicted.
3.7.1.2. Understanding how to use multiple taints Copiar enlaceEnlace copiado en el portapapeles!
You can put multiple taints on the same node and multiple tolerations on the same pod. OpenShift Container Platform processes multiple taints and tolerations as follows:
- Process the taints for which the pod has a matching toleration.
The remaining unmatched taints have the indicated effects on the pod:
-
If there is at least one unmatched taint with effect , OpenShift Container Platform cannot schedule a pod onto that node.
NoSchedule -
If there is no unmatched taint with effect but there is at least one unmatched taint with effect
NoSchedule, OpenShift Container Platform tries to not schedule the pod onto the node.PreferNoSchedule If there is at least one unmatched taint with effect
, OpenShift Container Platform evicts the pod from the node if it is already running on the node, or the pod is not scheduled onto the node if it is not yet running on the node.NoExecute- Pods that do not tolerate the taint are evicted immediately.
-
Pods that tolerate the taint without specifying in their
tolerationSecondsspecification remain bound forever.Pod -
Pods that tolerate the taint with a specified remain bound for the specified amount of time.
tolerationSeconds
-
If there is at least one unmatched taint with effect
For example:
Add the following taints to the node:
$ oc adm taint nodes node1 key1=value1:NoSchedule$ oc adm taint nodes node1 key1=value1:NoExecute$ oc adm taint nodes node1 key2=value2:NoScheduleThe pod has the following tolerations:
spec: tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule" - key: "key1" operator: "Equal" value: "value1" effect: "NoExecute"
In this case, the pod cannot be scheduled onto the node, because there is no toleration matching the third taint. The pod continues running if it is already running on the node when the taint is added, because the third taint is the only one of the three that is not tolerated by the pod.
3.7.1.3. Understanding pod scheduling and node conditions (taint node by condition) Copiar enlaceEnlace copiado en el portapapeles!
The Taint Nodes By Condition feature, which is enabled by default, automatically taints nodes that report conditions such as memory pressure and disk pressure. If a node reports a condition, a taint is added until the condition clears. The taints have the
NoSchedule
The scheduler checks for these taints on nodes before scheduling pods. If the taint is present, the pod is scheduled on a different node. Because the scheduler checks for taints and not the actual node conditions, you configure the scheduler to ignore some of these node conditions by adding appropriate pod tolerations.
To ensure backward compatibility, the daemon set controller automatically adds the following tolerations to all daemons:
- node.kubernetes.io/memory-pressure
- node.kubernetes.io/disk-pressure
- node.kubernetes.io/unschedulable (1.10 or later)
- node.kubernetes.io/network-unavailable (host network only)
You can also add arbitrary tolerations to daemon sets.
The control plane also adds the
node.kubernetes.io/memory-pressure
Guaranteed
Burstable
BestEffort
3.7.1.4. Understanding evicting pods by condition (taint-based evictions) Copiar enlaceEnlace copiado en el portapapeles!
The Taint-Based Evictions feature, which is enabled by default, evicts pods from a node that experiences specific conditions, such as
not-ready
unreachable
Taint Based Evictions have a
NoExecute
tolerationSeconds
The
tolerationSeconds
tolerationSeconds
tolerationSeconds
If you use the
tolerationSeconds
OpenShift Container Platform evicts pods in a rate-limited way to prevent massive pod evictions in scenarios such as the master becoming partitioned from the nodes.
By default, if more than 55% of nodes in a given zone are unhealthy, the node lifecycle controller changes that zone’s state to
PartialDisruption
For more information, see Rate limits on eviction in the Kubernetes documentation.
OpenShift Container Platform automatically adds a toleration for
node.kubernetes.io/not-ready
node.kubernetes.io/unreachable
tolerationSeconds=300
Pod
spec:
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- 1
- These tolerations ensure that the default pod behavior is to remain bound for five minutes after one of these node conditions problems is detected.
You can configure these tolerations as needed. For example, if you have an application with a lot of local state, you might want to keep the pods bound to node for a longer time in the event of network partition, allowing for the partition to recover and avoiding pod eviction.
Pods spawned by a daemon set are created with
NoExecute
tolerationSeconds
-
node.kubernetes.io/unreachable -
node.kubernetes.io/not-ready
As a result, daemon set pods are never evicted because of these node conditions.
3.7.1.5. Tolerating all taints Copiar enlaceEnlace copiado en el portapapeles!
You can configure a pod to tolerate all taints by adding an
operator: "Exists"
key
value
Pod spec for tolerating all taints
spec:
tolerations:
- operator: "Exists"
3.7.2. Adding taints and tolerations Copiar enlaceEnlace copiado en el portapapeles!
You add tolerations to pods and taints to nodes to allow the node to control which pods should or should not be scheduled on them. For existing pods and nodes, you should add the toleration to the pod first, then add the taint to the node to avoid pods being removed from the node before you can add the toleration.
Procedure
Add a toleration to a pod by editing the
spec to include aPodstanza:tolerationsSample pod configuration file with an Equal operator
spec: tolerations: - key: "key1"1 value: "value1" operator: "Equal" effect: "NoExecute" tolerationSeconds: 36002 For example:
Sample pod configuration file with an Exists operator
spec: tolerations: - key: "key1" operator: "Exists"1 effect: "NoExecute" tolerationSeconds: 3600- 1
- The
Existsoperator does not take avalue.
This example places a taint on
that has keynode1, valuekey1, and taint effectvalue1.NoExecuteAdd a taint to a node by using the following command with the parameters described in the Taint and toleration components table:
$ oc adm taint nodes <node_name> <key>=<value>:<effect>For example:
$ oc adm taint nodes node1 key1=value1:NoExecuteThis command places a taint on
that has keynode1, valuekey1, and effectvalue1.NoExecuteNoteIf you add a
taint to a control plane node (also known as the master node) the node must have theNoScheduletaint, which is added by default.node-role.kubernetes.io/master=:NoScheduleFor example:
apiVersion: v1 kind: Node metadata: annotations: machine.openshift.io/machine: openshift-machine-api/ci-ln-62s7gtb-f76d1-v8jxv-master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-cdc1ab7da414629332cc4c3926e6e59c ... spec: taints: - effect: NoSchedule key: node-role.kubernetes.io/master ...The tolerations on the pod match the taint on the node. A pod with either toleration can be scheduled onto
.node1
3.7.2.1. Adding taints and tolerations using a machine set Copiar enlaceEnlace copiado en el portapapeles!
You can add taints to nodes using a machine set. All nodes associated with the
MachineSet
Procedure
Add a toleration to a pod by editing the
spec to include aPodstanza:tolerationsSample pod configuration file with
Equaloperatorspec: tolerations: - key: "key1"1 value: "value1" operator: "Equal" effect: "NoExecute" tolerationSeconds: 36002 For example:
Sample pod configuration file with
Existsoperatorspec: tolerations: - key: "key1" operator: "Exists" effect: "NoExecute" tolerationSeconds: 3600Add the taint to the
object:MachineSetEdit the
YAML for the nodes you want to taint or you can create a newMachineSetobject:MachineSet$ oc edit machineset <machineset>Add the taint to the
section:spec.template.specExample taint in a machine set specification
spec: .... template: .... spec: taints: - effect: NoExecute key: key1 value: value1 ....This example places a taint that has the key
, valuekey1, and taint effectvalue1on the nodes.NoExecuteScale down the machine set to 0:
$ oc scale --replicas=0 machineset <machineset> -n openshift-machine-apiTipYou can alternatively apply the following YAML to scale the machine set:
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: name: <machineset> namespace: openshift-machine-api spec: replicas: 0Wait for the machines to be removed.
Scale up the machine set as needed:
$ oc scale --replicas=2 machineset <machineset> -n openshift-machine-apiOr:
$ oc edit machineset <machineset> -n openshift-machine-apiWait for the machines to start. The taint is added to the nodes associated with the
object.MachineSet
3.7.2.2. Binding a user to a node using taints and tolerations Copiar enlaceEnlace copiado en el portapapeles!
If you want to dedicate a set of nodes for exclusive use by a particular set of users, add a toleration to their pods. Then, add a corresponding taint to those nodes. The pods with the tolerations are allowed to use the tainted nodes or any other nodes in the cluster.
If you want ensure the pods are scheduled to only those tainted nodes, also add a label to the same set of nodes and add a node affinity to the pods so that the pods can only be scheduled onto nodes with that label.
Procedure
To configure a node so that users can use only that node:
Add a corresponding taint to those nodes:
For example:
$ oc adm taint nodes node1 dedicated=groupName:NoScheduleTipYou can alternatively apply the following YAML to add the taint:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: ... spec: taints: - key: dedicated value: groupName effect: NoSchedule- Add a toleration to the pods by writing a custom admission controller.
3.7.2.3. Creating a project with a node selector and toleration Copiar enlaceEnlace copiado en el portapapeles!
You can create a project that uses a node selector and toleration, which are set as annotations, to control the placement of pods onto specific nodes. Any subsequent resources created in the project are then scheduled on nodes that have a taint matching the toleration.
Prerequisites
- A label for node selection has been added to one or more nodes by using a machine set or editing the node directly.
- A taint has been added to one or more nodes by using a machine set or editing the node directly.
Procedure
Create a
resource definition, specifying a node selector and toleration in theProjectsection:metadata.annotationsExample
project.yamlfilekind: Project apiVersion: project.openshift.io/v1 metadata: name: <project_name>1 annotations: openshift.io/node-selector: '<label>'2 scheduler.alpha.kubernetes.io/defaultTolerations: >- [{"operator": "Exists", "effect": "NoSchedule", "key": "<key_name>"}3 ]Use the
command to create the project:oc apply$ oc apply -f project.yaml
Any subsequent resources created in the
<project_name>
3.7.2.4. Controlling nodes with special hardware using taints and tolerations Copiar enlaceEnlace copiado en el portapapeles!
In a cluster where a small subset of nodes have specialized hardware, you can use taints and tolerations to keep pods that do not need the specialized hardware off of those nodes, leaving the nodes for pods that do need the specialized hardware. You can also require pods that need specialized hardware to use specific nodes.
You can achieve this by adding a toleration to pods that need the special hardware and tainting the nodes that have the specialized hardware.
Procedure
To ensure nodes with specialized hardware are reserved for specific pods:
Add a toleration to pods that need the special hardware.
For example:
spec: tolerations: - key: "disktype" value: "ssd" operator: "Equal" effect: "NoSchedule" tolerationSeconds: 3600Taint the nodes that have the specialized hardware using one of the following commands:
$ oc adm taint nodes <node-name> disktype=ssd:NoScheduleOr:
$ oc adm taint nodes <node-name> disktype=ssd:PreferNoScheduleTipYou can alternatively apply the following YAML to add the taint:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: ... spec: taints: - key: disktype value: ssd effect: PreferNoSchedule
3.7.3. Removing taints and tolerations Copiar enlaceEnlace copiado en el portapapeles!
You can remove taints from nodes and tolerations from pods as needed. You should add the toleration to the pod first, then add the taint to the node to avoid pods being removed from the node before you can add the toleration.
Procedure
To remove taints and tolerations:
To remove a taint from a node:
$ oc adm taint nodes <node-name> <key>-For example:
$ oc adm taint nodes ip-10-0-132-248.ec2.internal key1-Example output
node/ip-10-0-132-248.ec2.internal untaintedTo remove a toleration from a pod, edit the
spec to remove the toleration:Podspec: tolerations: - key: "key2" operator: "Exists" effect: "NoExecute" tolerationSeconds: 3600
3.8. Placing pods on specific nodes using node selectors Copiar enlaceEnlace copiado en el portapapeles!
A node selector specifies a map of key/value pairs that are defined using custom labels on nodes and selectors specified in pods.
For the pod to be eligible to run on a node, the pod must have the same key/value node selector as the label on the node.
3.8.1. About node selectors Copiar enlaceEnlace copiado en el portapapeles!
You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OpenShift Container Platform schedules the pods on nodes that contain matching labels.
You can use a node selector to place specific pods on specific nodes, cluster-wide node selectors to place new pods on specific nodes anywhere in the cluster, and project node selectors to place new pods in a project on specific nodes.
For example, as a cluster administrator, you can create an infrastructure where application developers can deploy pods only onto the nodes closest to their geographical location by including a node selector in every pod they create. In this example, the cluster consists of five data centers spread across two regions. In the U.S., label the nodes as
us-east
us-central
us-west
apac-east
apac-west
A pod is not scheduled if the
Pod
If you are using node selectors and node affinity in the same pod configuration, the following rules control pod placement onto nodes:
-
If you configure both and
nodeSelector, both conditions must be satisfied for the pod to be scheduled onto a candidate node.nodeAffinity -
If you specify multiple associated with
nodeSelectorTermstypes, then the pod can be scheduled onto a node if one of thenodeAffinityis satisfied.nodeSelectorTerms -
If you specify multiple associated with
matchExpressions, then the pod can be scheduled onto a node only if allnodeSelectorTermsare satisfied.matchExpressions
- Node selectors on specific pods and nodes
You can control which node a specific pod is scheduled on by using node selectors and labels.
To use node selectors and labels, first label the node to avoid pods being descheduled, then add the node selector to the pod.
NoteYou cannot add a node selector directly to an existing scheduled pod. You must label the object that controls the pod, such as deployment config.
For example, the following
object has theNodelabel:region: eastSample
Nodeobject with a labelkind: Node apiVersion: v1 metadata: name: ip-10-0-131-14.ec2.internal selfLink: /api/v1/nodes/ip-10-0-131-14.ec2.internal uid: 7bc2580a-8b8e-11e9-8e01-021ab4174c74 resourceVersion: '478704' creationTimestamp: '2019-06-10T14:46:08Z' labels: kubernetes.io/os: linux failure-domain.beta.kubernetes.io/zone: us-east-1a node.openshift.io/os_version: '4.5' node-role.kubernetes.io/worker: '' failure-domain.beta.kubernetes.io/region: us-east-1 node.openshift.io/os_id: rhcos beta.kubernetes.io/instance-type: m4.large kubernetes.io/hostname: ip-10-0-131-14 beta.kubernetes.io/arch: amd64 region: east1 - 1
- Label to match the pod node selector.
A pod has the
node selector:type: user-node,region: eastSample
Podobject with node selectorsapiVersion: v1 kind: Pod .... spec: nodeSelector:1 region: east type: user-node- 1
- Node selectors to match the node label.
When you create the pod using the example pod spec, it can be scheduled on the example node.
- Default cluster-wide node selectors
With default cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.
For example, the following
object has the default cluster-wideSchedulerandregion=eastnode selectors:type=user-nodeExample Scheduler Operator Custom Resource
apiVersion: config.openshift.io/v1 kind: Scheduler metadata: name: cluster ... spec: defaultNodeSelector: type=user-node,region=east ...A node in that cluster has the
labels:type=user-node,region=eastExample
NodeobjectapiVersion: v1 kind: Node metadata: name: ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 ... labels: region: east type: user-node ...Example
Podobject with a node selectorapiVersion: v1 kind: Pod ... spec: nodeSelector: region: east ...When you create the pod using the example pod spec in the example cluster, the pod is created with the cluster-wide node selector and is scheduled on the labeled node:
Example pod list with the pod on the labeled node
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>NoteIf the project where you create the pod has a project node selector, that selector takes preference over a cluster-wide node selector. Your pod is not created or scheduled if the pod does not have the project node selector.
- Project node selectors
With project node selectors, when you create a pod in this project, OpenShift Container Platform adds the node selectors to the pod and schedules the pods on a node with matching labels. If there is a cluster-wide default node selector, a project node selector takes preference.
For example, the following project has the
node selector:region=eastExample
NamespaceobjectapiVersion: v1 kind: Namespace metadata: name: east-region annotations: openshift.io/node-selector: "region=east" ...The following node has the
labels:type=user-node,region=eastExample
NodeobjectapiVersion: v1 kind: Node metadata: name: ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 ... labels: region: east type: user-node ...When you create the pod using the example pod spec in this example project, the pod is created with the project node selectors and is scheduled on the labeled node:
Example
PodobjectapiVersion: v1 kind: Pod metadata: namespace: east-region ... spec: nodeSelector: region: east type: user-node ...Example pod list with the pod on the labeled node
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>A pod in the project is not created or scheduled if the pod contains different node selectors. For example, if you deploy the following pod into the example project, it is not be created:
Example
Podobject with an invalid node selectorapiVersion: v1 kind: Pod ... spec: nodeSelector: region: west ....
3.8.2. Using node selectors to control pod placement Copiar enlaceEnlace copiado en el portapapeles!
You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OpenShift Container Platform schedules the pods on nodes that contain matching labels.
You add labels to a node, a machine set, or a machine config. Adding the label to the machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
To add node selectors to an existing pod, add a node selector to the controlling object for that pod, such as a
ReplicaSet
DaemonSet
StatefulSet
Deployment
DeploymentConfig
Pod
You cannot add a node selector directly to an existing scheduled pod.
Prerequisites
To add a node selector to existing pods, determine the controlling object for that pod. For example, the
router-default-66d5cf9464-m2g75
router-default-66d5cf9464
$ oc describe pod router-default-66d5cf9464-7pwkc
Name: router-default-66d5cf9464-7pwkc
Namespace: openshift-ingress
....
Controlled By: ReplicaSet/router-default-66d5cf9464
The web console lists the controlling object under
ownerReferences
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: router-default-66d5cf9464
uid: d81dd094-da26-11e9-a48a-128e7edf0312
controller: true
blockOwnerDeletion: true
Procedure
Add labels to a node by using a machine set or editing the node directly:
Use a
object to add labels to nodes managed by the machine set when a node is created:MachineSetRun the following command to add labels to a
object:MachineSet$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-apiFor example:
$ oc patch MachineSet abc612-msrtw-worker-us-east-1c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-apiTipYou can alternatively apply the following YAML to add labels to a machine set:
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: name: <machineset> namespace: openshift-machine-api spec: template: spec: metadata: labels: region: "east" type: "user-node"Verify that the labels are added to the
object by using theMachineSetcommand:oc editFor example:
$ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-apiExample
MachineSetobjectapiVersion: machine.openshift.io/v1beta1 kind: MachineSet .... spec: ... template: metadata: ... spec: metadata: labels: region: east type: user-node ....
Add labels directly to a node:
Edit the
object for the node:Node$ oc label nodes <name> <key>=<value>For example, to label a node:
$ oc label nodes ip-10-0-142-25.ec2.internal type=user-node region=eastTipYou can alternatively apply the following YAML to add labels to a node:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: type: "user-node" region: "east"Verify that the labels are added to the node:
$ oc get nodes -l type=user-node,region=eastExample output
NAME STATUS ROLES AGE VERSION ip-10-0-142-25.ec2.internal Ready worker 17m v1.18.3+002a51f
Add the matching node selector to a pod:
To add a node selector to existing and future pods, add a node selector to the controlling object for the pods:
Example
ReplicaSetobject with labelskind: ReplicaSet .... spec: .... template: metadata: creationTimestamp: null labels: ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default pod-template-hash: 66d5cf9464 spec: nodeSelector: kubernetes.io/os: linux node-role.kubernetes.io/worker: '' type: user-node1 - 1
- Add the node selector.
To add a node selector to a specific, new pod, add the selector to the
object directly:PodExample
Podobject with a node selectorapiVersion: v1 kind: Pod .... spec: nodeSelector: region: east type: user-nodeNoteYou cannot add a node selector directly to an existing scheduled pod.
3.8.3. Creating default cluster-wide node selectors Copiar enlaceEnlace copiado en el portapapeles!
You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.
With cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.
You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a machine set, or a machine config. Adding the label to the machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
You can add additional key/value pairs to a pod. But you cannot add a different value for a default key.
Procedure
To add a default cluster-wide node selector:
Edit the Scheduler Operator CR to add the default cluster-wide node selectors:
$ oc edit scheduler clusterExample Scheduler Operator CR with a node selector
apiVersion: config.openshift.io/v1 kind: Scheduler metadata: name: cluster ... spec: defaultNodeSelector: type=user-node,region=east1 mastersSchedulable: false policy: name: ""- 1
- Add a node selector with the appropriate
<key>:<value>pairs.
After making this change, wait for the pods in the
project to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.openshift-kube-apiserverAdd labels to a node by using a machine set or editing the node directly:
Use a machine set to add labels to nodes managed by the machine set when a node is created:
Run the following command to add labels to a
object:MachineSet$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api1 - 1
- Add a
<key>/<value>pair for each label.
For example:
$ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-apiTipYou can alternatively apply the following YAML to add labels to a machine set:
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: name: <machineset> namespace: openshift-machine-api spec: template: spec: metadata: labels: region: "east" type: "user-node"Verify that the labels are added to the
object by using theMachineSetcommand:oc editFor example:
$ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-apiExample
MachineSetobjectapiVersion: machine.openshift.io/v1beta1 kind: MachineSet ... spec: ... template: metadata: ... spec: metadata: labels: region: east type: user-node ...Redeploy the nodes associated with that machine set by scaling down to
and scaling up the nodes:0For example:
$ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api$ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiWhen the nodes are ready and available, verify that the label is added to the nodes by using the
command:oc get$ oc get nodes -l <key>=<value>For example:
$ oc get nodes -l type=user-nodeExample output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.18.3+002a51f
Add labels directly to a node:
Edit the
object for the node:Node$ oc label nodes <name> <key>=<value>For example, to label a node:
$ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=eastTipYou can alternatively apply the following YAML to add labels to a node:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: type: "user-node" region: "east"Verify that the labels are added to the node using the
command:oc get$ oc get nodes -l <key>=<value>,<key>=<value>For example:
$ oc get nodes -l type=user-node,region=eastExample output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.18.3+002a51f
3.8.4. Creating project-wide node selectors Copiar enlaceEnlace copiado en el portapapeles!
You can use node selectors in a project together with labels on nodes to constrain all pods created in that project to the labeled nodes.
When you create a pod in this project, OpenShift Container Platform adds the node selectors to the pods in the project and schedules the pods on a node with matching labels in the project. If there is a cluster-wide default node selector, a project node selector takes preference.
You add node selectors to a project by editing the
Namespace
openshift.io/node-selector
A pod is not scheduled if the
Pod
Example error message
Error from server (Forbidden): error when creating "pod.yaml": pods "pod-4" is forbidden: pod node label selector conflicts with its project node label selector
You can add additional key/value pairs to a pod. But you cannot add a different value for a project key.
Procedure
To add a default project node selector:
Create a namespace or edit an existing namespace to add the
parameter:openshift.io/node-selector$ oc edit namespace <name>Example output
apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/node-selector: "type=user-node,region=east"1 openshift.io/description: "" openshift.io/display-name: "" openshift.io/requester: kube:admin openshift.io/sa.scc.mcs: s0:c30,c5 openshift.io/sa.scc.supplemental-groups: 1000880000/10000 openshift.io/sa.scc.uid-range: 1000880000/10000 creationTimestamp: "2021-05-10T12:35:04Z" labels: kubernetes.io/metadata.name: demo name: demo resourceVersion: "145537" uid: 3f8786e3-1fcb-42e3-a0e3-e2ac54d15001 spec: finalizers: - kubernetes- 1
- Add the
openshift.io/node-selectorwith the appropriate<key>:<value>pairs.
Add labels to a node by using a machine set or editing the node directly:
Use a
object to add labels to nodes managed by the machine set when a node is created:MachineSetRun the following command to add labels to a
object:MachineSet$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-apiFor example:
$ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-apiTipYou can alternatively apply the following YAML to add labels to a machine set:
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: name: <machineset> namespace: openshift-machine-api spec: template: spec: metadata: labels: region: "east" type: "user-node"Verify that the labels are added to the
object by using theMachineSetcommand:oc editFor example:
$ oc edit MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiExample output
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: ... spec: ... template: metadata: ... spec: metadata: labels: region: east type: user-nodeRedeploy the nodes associated with that machine set:
For example:
$ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api$ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-apiWhen the nodes are ready and available, verify that the label is added to the nodes by using the
command:oc get$ oc get nodes -l <key>=<value>For example:
$ oc get nodes -l type=user-node,region=eastExample output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.18.3+002a51f
Add labels directly to a node:
Edit the
object to add labels:Node$ oc label <resource> <name> <key>=<value>For example, to label a node:
$ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-c-tgq49 type=user-node region=eastTipYou can alternatively apply the following YAML to add labels to a node:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: type: "user-node" region: "east"Verify that the labels are added to the
object using theNodecommand:oc get$ oc get nodes -l <key>=<value>For example:
$ oc get nodes -l type=user-node,region=eastExample output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.18.3+002a51f
3.9. Controlling pod placement by using pod topology spread constraints Copiar enlaceEnlace copiado en el portapapeles!
You can use pod topology spread constraints to control the placement of your pods across nodes, zones, regions, or other user-defined topology domains.
3.9.1. About pod topology spread constraints Copiar enlaceEnlace copiado en el portapapeles!
By using a pod topology spread constraint, you provide fine-grained control over the distribution of pods across failure domains to help achieve high availability and more efficient resource utilization.
OpenShift Container Platform administrators can label nodes to provide topology information, such as regions, zones, nodes, or other user-defined domains. After these labels are set on nodes, users can then define pod topology spread constraints to control the placement of pods across these topology domains.
You specify which pods to group together, which topology domains they are spread among, and the acceptable skew. Only pods within the same namespace are matched and grouped together when spreading due to a constraint.
3.9.2. Configuring pod topology spread constraints Copiar enlaceEnlace copiado en el portapapeles!
The following steps demonstrate how to configure pod topology spread constraints to distribute pods that match the specified labels based on their zone.
You can specify multiple pod topology spread constraints, but you must ensure that they do not conflict with each other. All pod topology spread constraints must be satisfied for a pod to be placed.
Prerequisites
- A cluster administrator has added the required labels to nodes.
Procedure
Create a
spec and specify a pod topology spread constraint:PodExample
pod-spec.yamlfileapiVersion: v1 kind: Pod metadata: name: my-pod labels: foo: bar spec: topologySpreadConstraints: - maxSkew: 11 topologyKey: topology.kubernetes.io/zone2 whenUnsatisfiable: DoNotSchedule3 labelSelector:4 matchLabels: foo: bar5 containers: - image: "docker.io/ocpqe/hello-pod" name: hello-pod- 1
- The maximum difference in number of pods between any two topology domains. The default is
1, and you cannot specify a value of0. - 2
- The key of a node label. Nodes with this key and identical value are considered to be in the same topology.
- 3
- How to handle a pod if it does not satisfy the spread constraint. The default is
DoNotSchedule, which tells the scheduler not to schedule the pod. Set toScheduleAnywayto still schedule the pod, but the scheduler prioritizes honoring the skew to not make the cluster more imbalanced. - 4
- Pods that match this label selector are counted and recognized as a group when spreading to satisfy the constraint. Be sure to specify a label selector, otherwise no pods can be matched.
- 5
- Be sure that this
Podspec also sets its labels to match this label selector if you want it to be counted properly in the future.
Create the pod:
$ oc create -f pod-spec.yaml
3.9.3. Example pod topology spread constraints Copiar enlaceEnlace copiado en el portapapeles!
The following examples demonstrate pod topology spread constraint configurations.
3.9.3.1. Single pod topology spread constraint example Copiar enlaceEnlace copiado en el portapapeles!
This example
Pod
foo:bar
1
kind: Pod
apiVersion: v1
metadata:
name: my-pod
labels:
foo: bar
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foo: bar
containers:
- image: "docker.io/ocpqe/hello-pod"
name: hello-pod
3.9.3.2. Multiple pod topology spread constraints example Copiar enlaceEnlace copiado en el portapapeles!
This example
Pod
foo:bar
1
The first constraint distributes pods based on a user-defined label
node
rack
kind: Pod
apiVersion: v1
metadata:
name: my-pod-2
labels:
foo: bar
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: node
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foo: bar
- maxSkew: 1
topologyKey: rack
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foo: bar
containers:
- image: "docker.io/ocpqe/hello-pod"
name: hello-pod
3.10. Running a custom scheduler Copiar enlaceEnlace copiado en el portapapeles!
You can run multiple custom schedulers alongside the default scheduler and configure which scheduler to use for each pod.
It is supported to use a custom scheduler with OpenShift Container Platform, but Red Hat does not directly support the functionality of the custom scheduler.
For information on how to configure the default scheduler, see Configuring the default scheduler to control pod placement.
To schedule a given pod using a specific scheduler, specify the name of the scheduler in that Pod specification.
3.10.1. Deploying a custom scheduler Copiar enlaceEnlace copiado en el portapapeles!
To include a custom scheduler in your cluster, include the image for a custom scheduler in a deployment.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin You have a scheduler binary.
NoteInformation on how to create a scheduler binary is outside the scope of this document. For an example, see Configure Multiple Schedulers in the Kubernetes documentation. Note that the actual functionality of your custom scheduler is not supported by Red Hat.
- You have created an image containing the scheduler binary and pushed it to a registry.
Procedure
Create a file that contains a config map that holds the scheduler configuration file:
Example
scheduler-config-map.yamlapiVersion: v1 kind: ConfigMap metadata: name: scheduler-config namespace: kube-system1 data: scheduler-config.yaml: |2 apiVersion: kubescheduler.config.k8s.io/v1beta2 kind: KubeSchedulerConfiguration profiles: - schedulerName: custom-scheduler3 leaderElection: leaderElect: false- 1
- This procedure uses the
kube-systemnamespace, but you can use the namespace of your choosing. - 2
- When you define your
Deploymentresource later in this procedure, you pass this file in to the scheduler command by using the--configargument. - 3
- Define a scheduler profile for your custom scheduler. This scheduler name is used when defining the
schedulerNamein thePodconfiguration.
Create the config map:
$ oc create -f scheduler-config-map.yamlCreate a file that contains the deployment resources for the custom scheduler:
Example
custom-scheduler.yamlfileapiVersion: v1 kind: ServiceAccount metadata: name: custom-scheduler namespace: kube-system1 --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: custom-scheduler-as-kube-scheduler subjects: - kind: ServiceAccount name: custom-scheduler namespace: kube-system2 roleRef: kind: ClusterRole name: system:kube-scheduler apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: custom-scheduler-as-volume-scheduler subjects: - kind: ServiceAccount name: custom-scheduler namespace: kube-system3 roleRef: kind: ClusterRole name: system:volume-scheduler apiGroup: rbac.authorization.k8s.io --- apiVersion: apps/v1 kind: Deployment metadata: labels: component: scheduler tier: control-plane name: custom-scheduler namespace: kube-system4 spec: selector: matchLabels: component: scheduler tier: control-plane replicas: 1 template: metadata: labels: component: scheduler tier: control-plane version: second spec: serviceAccountName: custom-scheduler containers: - command: - /usr/local/bin/kube-scheduler - --config=/etc/config/scheduler-config.yaml5 image: "<namespace>/<image_name>:<tag>"6 livenessProbe: httpGet: path: /healthz port: 10259 scheme: HTTPS initialDelaySeconds: 15 name: kube-second-scheduler readinessProbe: httpGet: path: /healthz port: 10259 scheme: HTTPS resources: requests: cpu: '0.1' securityContext: privileged: false volumeMounts: - name: config-volume mountPath: /etc/config hostNetwork: false hostPID: false volumes: - name: config-volume configMap: name: scheduler-configCreate the deployment resources in the cluster:
$ oc create -f custom-scheduler.yaml
Verification
Verify that the scheduler pod is running:
$ oc get pods -n kube-systemThe custom scheduler pod is listed as
:RunningNAME READY STATUS RESTARTS AGE custom-scheduler-6cd7c4b8bc-854zb 1/1 Running 0 2m
3.10.2. Deploying pods using a custom scheduler Copiar enlaceEnlace copiado en el portapapeles!
After the custom scheduler is deployed in your cluster, you can configure pods to use that scheduler instead of the default scheduler.
Each scheduler has a separate view of resources in a cluster. For that reason, each scheduler should operate over its own set of nodes.
If two or more schedulers operate on the same node, they might intervene with each other and schedule more pods on the same node than there are available resources for. Pods might get rejected due to insufficient resources in this case.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin - The custom scheduler has been deployed in the cluster.
Procedure
If your cluster uses role-based access control (RBAC), add the custom scheduler name to the
cluster role.system:kube-schedulerEdit the
cluster role:system:kube-scheduler$ oc edit clusterrole system:kube-schedulerAdd the name of the custom scheduler to the
lists for theresourceNamesandleasesresources:endpointsapiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" creationTimestamp: "2021-07-07T10:19:14Z" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kube-scheduler resourceVersion: "125" uid: 53896c70-b332-420a-b2a4-f72c822313f2 rules: ... - apiGroups: - coordination.k8s.io resources: - leases verbs: - create - apiGroups: - coordination.k8s.io resourceNames: - kube-scheduler - custom-scheduler1 resources: - leases verbs: - get - update - apiGroups: - "" resources: - endpoints verbs: - create - apiGroups: - "" resourceNames: - kube-scheduler - custom-scheduler2 resources: - endpoints verbs: - get - update ...
Create a
configuration and specify the name of the custom scheduler in thePodparameter:schedulerNameExample
custom-scheduler-example.yamlfileapiVersion: v1 kind: Pod metadata: name: custom-scheduler-example labels: name: custom-scheduler-example spec: schedulerName: custom-scheduler1 containers: - name: pod-with-second-annotation-container image: docker.io/ocpqe/hello-pod- 1
- The name of the custom scheduler to use, which is
custom-schedulerin this example. When no scheduler name is supplied, the pod is automatically scheduled using the default scheduler.
Create the pod:
$ oc create -f custom-scheduler-example.yaml
Verification
Enter the following command to check that the pod was created:
$ oc get pod custom-scheduler-exampleThe
pod is listed in the output:custom-scheduler-exampleNAME READY STATUS RESTARTS AGE custom-scheduler-example 1/1 Running 0 4mEnter the following command to check that the custom scheduler has scheduled the pod:
$ oc describe pod custom-scheduler-exampleThe scheduler,
, is listed as shown in the following truncated output:custom-schedulerEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> custom-scheduler Successfully assigned default/custom-scheduler-example to <node_name>
3.11. Evicting pods using the descheduler Copiar enlaceEnlace copiado en el portapapeles!
While the scheduler is used to determine the most suitable node to host a new pod, the descheduler can be used to evict a running pod so that the pod can be rescheduled onto a more suitable node.
3.11.1. About the descheduler Copiar enlaceEnlace copiado en el portapapeles!
You can use the descheduler to evict pods based on specific strategies so that the pods can be rescheduled onto more appropriate nodes.
You can benefit from descheduling running pods in situations such as the following:
- Nodes are underutilized or overutilized.
- Pod and node affinity requirements, such as taints or labels, have changed and the original scheduling decisions are no longer appropriate for certain nodes.
- Node failure requires pods to be moved.
- New nodes are added to clusters.
- Pods have been restarted too many times.
The descheduler does not schedule replacement of evicted pods. The scheduler automatically performs this task for the evicted pods.
When the descheduler decides to evict pods from a node, it employs the following general mechanism:
-
Pods in the and
openshift-*namespaces are never evicted.kube-system -
Critical pods with set to
priorityClassNameorsystem-cluster-criticalare never evicted.system-node-critical - Static, mirrored, or stand-alone pods that are not part of a replication controller, replica set, deployment, or job are never evicted because these pods will not be recreated.
- Pods associated with daemon sets are never evicted.
- Pods with local storage are never evicted.
- Best effort pods are evicted before burstable and guaranteed pods.
-
All types of pods with the annotation are eligible for eviction. This annotation is used to override checks that prevent eviction, and the user can select which pod is evicted. Users should know how and if the pod will be recreated.
descheduler.alpha.kubernetes.io/evict - Pods subject to pod disruption budget (PDB) are not evicted if descheduling violates its pod disruption budget (PDB). The pods are evicted by using eviction subresource to handle PDB.
3.11.2. Descheduler profiles Copiar enlaceEnlace copiado en el portapapeles!
The following descheduler profiles are available:
AffinityAndTaintsThis profile evicts pods that violate inter-pod anti-affinity, node affinity, and node taints.
It enables the following strategies:
-
: removes pods that are violating inter-pod anti-affinity.
RemovePodsViolatingInterPodAntiAffinity -
: removes pods that are violating node affinity.
RemovePodsViolatingNodeAffinity - : removes pods that are violating
RemovePodsViolatingNodeTaintstaints on nodes.NoSchedulePods with a node affinity type of
are removed.requiredDuringSchedulingIgnoredDuringExecution
-
TopologyAndDuplicatesThis profile evicts pods in an effort to evenly spread similar pods, or pods of the same topology domain, among nodes.
It enables the following strategies:
-
: finds unbalanced topology domains and tries to evict pods from larger ones when
RemovePodsViolatingTopologySpreadConstraintconstraints are violated.DoNotSchedule -
: ensures that there is only one pod associated with a replica set, replication controller, deployment, or job running on same node. If there are more, those duplicate pods are evicted for better pod distribution in a cluster.
RemoveDuplicates
-
LifecycleAndUtilizationThis profile evicts long-running pods and balances resource usage between nodes.
It enables the following strategies:
- : removes pods whose containers have been restarted too many times.
RemovePodsHavingTooManyRestartsPods where the sum of restarts over all containers (including Init Containers) is more than 100.
- : finds nodes that are underutilized and evicts pods, if possible, from overutilized nodes in the hope that recreation of evicted pods will be scheduled on these underutilized nodes.
LowNodeUtilizationA node is considered underutilized if its usage is below 20% for all thresholds (CPU, memory, and number of pods).
A node is considered overutilized if its usage is above 50% for any of the thresholds (CPU, memory, and number of pods).
- : evicts pods that are too old.
PodLifeTimePods that are older than 24 hours are removed.
3.11.3. Installing the descheduler Copiar enlaceEnlace copiado en el portapapeles!
The descheduler is not available by default. To enable the descheduler, you must install the Kube Descheduler Operator from OperatorHub and enable one or more descheduler profiles.
Prerequisites
- Cluster administrator privileges.
- Access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Create the required namespace for the Kube Descheduler Operator.
-
Navigate to Administration
Namespaces and click Create Namespace. -
Enter in the Name field, enter
openshift-kube-descheduler-operatorin the Labels field to enable descheduler metrics, and click Create.openshift.io/cluster-monitoring=true
-
Navigate to Administration
Install the Kube Descheduler Operator.
-
Navigate to Operators
OperatorHub. - Type Kube Descheduler Operator into the filter box.
- Select the Kube Descheduler Operator and click Install.
- On the Install Operator page, select A specific namespace on the cluster. Select openshift-kube-descheduler-operator from the drop-down menu.
- Adjust the values for the Update Channel and Approval Strategy to the desired values.
- Click Install.
-
Navigate to Operators
Create a descheduler instance.
-
From the Operators
Installed Operators page, click the Kube Descheduler Operator. - Select the Kube Descheduler tab and click Create KubeDescheduler.
Edit the settings as necessary.
-
Expand the Profiles section to select one or more profiles to enable. The profile is enabled by default. Click Add Profile to select additional profiles.
AffinityAndTaints -
Optional: Use the Descheduling Interval Seconds field to change the number of seconds between descheduler runs. The default is seconds.
3600
-
Expand the Profiles section to select one or more profiles to enable. The
- Click Create.
-
From the Operators
You can also configure the profiles and settings for the descheduler later using the OpenShift CLI (
oc
AffinityAndTaints
3.11.4. Configuring descheduler profiles Copiar enlaceEnlace copiado en el portapapeles!
You can configure which profiles the descheduler uses to evict pods.
Prerequisites
- Cluster administrator privileges
Procedure
Edit the
object:KubeDescheduler$ oc edit kubedeschedulers.operator.openshift.io cluster -n openshift-kube-descheduler-operatorSpecify one or more profiles in the
section.spec.profilesapiVersion: operator.openshift.io/v1 kind: KubeDescheduler metadata: name: cluster namespace: openshift-kube-descheduler-operator spec: deschedulingIntervalSeconds: 3600 logLevel: Normal managementState: Managed operatorLogLevel: Normal profiles: - AffinityAndTaints1 - TopologyAndDuplicates2 - LifecycleAndUtilization3 - 1
- Enable the
AffinityAndTaintsprofile, which evicts pods that violate inter-pod anti-affinity, node affinity, and node taints. - 2
- Enable the
TopologyAndDuplicatesprofile, which evicts pods in an effort to evenly spread similar pods, or pods of the same topology domain, among nodes. - 3
- Enable the
LifecycleAndUtilizationprofile, which evicts long-running pods and balances resource usage between nodes.
You can enable multiple profiles; the order that the profiles are specified in is not important.
- Save the file to apply the changes.
3.11.5. Configuring the descheduler interval Copiar enlaceEnlace copiado en el portapapeles!
You can configure the amount of time between descheduler runs. The default is 3600 seconds (one hour).
Prerequisites
- Cluster administrator privileges
Procedure
Edit the
object:KubeDescheduler$ oc edit kubedeschedulers.operator.openshift.io cluster -n openshift-kube-descheduler-operatorUpdate the
field to the desired value:deschedulingIntervalSecondsapiVersion: operator.openshift.io/v1 kind: KubeDescheduler metadata: name: cluster namespace: openshift-kube-descheduler-operator spec: deschedulingIntervalSeconds: 36001 ...- 1
- Set the number of seconds between descheduler runs. A value of
0in this field runs the descheduler once and exits.
- Save the file to apply the changes.
3.11.6. Uninstalling the descheduler Copiar enlaceEnlace copiado en el portapapeles!
You can remove the descheduler from your cluster by removing the descheduler instance and uninstalling the Kube Descheduler Operator. This procedure also cleans up the
KubeDescheduler
openshift-kube-descheduler-operator
Prerequisites
- Cluster administrator privileges.
- Access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Delete the descheduler instance.
-
From the Operators
Installed Operators page, click Kube Descheduler Operator. - Select the Kube Descheduler tab.
-
Click the Options menu
next to the cluster entry and select Delete KubeDescheduler.
- In the confirmation dialog, click Delete.
-
From the Operators
Uninstall the Kube Descheduler Operator.
-
Navigate to Operators
Installed Operators, -
Click the Options menu
next to the Kube Descheduler Operator entry and select Uninstall Operator.
- In the confirmation dialog, click Uninstall.
-
Navigate to Operators
Delete the
namespace.openshift-kube-descheduler-operator-
Navigate to Administration
Namespaces. -
Enter into the filter box.
openshift-kube-descheduler-operator -
Click the Options menu
next to the openshift-kube-descheduler-operator entry and select Delete Namespace.
-
In the confirmation dialog, enter and click Delete.
openshift-kube-descheduler-operator
-
Navigate to Administration
Delete the
CRD.KubeDescheduler-
Navigate to Administration
Custom Resource Definitions. -
Enter into the filter box.
KubeDescheduler -
Click the Options menu
next to the KubeDescheduler entry and select Delete CustomResourceDefinition.
- In the confirmation dialog, click Delete.
-
Navigate to Administration