Chapter 7. Applying autoscaling to an OpenShift Container Platform cluster

7.1. About the cluster autoscaler
Copy link

The cluster autoscaler adjusts the size of an OpenShift Container Platform cluster to meet its current deployment needs. It uses declarative, Kubernetes-style arguments to provide infrastructure management that does not rely on objects of a specific cloud provider. The cluster autoscaler has a cluster scope, and is not associated with a particular namespace.

The cluster autoscaler increases the size of the cluster when there are pods that fail to schedule on any of the current worker nodes due to insufficient resources or when another node is necessary to meet deployment needs. The cluster autoscaler does not increase the cluster resources beyond the limits that you specify.

The cluster autoscaler computes the total memory, CPU, and GPU on all nodes the cluster, even though it does not manage the control plane nodes. These values are not single-machine oriented. They are an aggregation of all the resources in the entire cluster. For example, if you set the maximum memory resource limit, the cluster autoscaler includes all the nodes in the cluster when calculating the current memory usage. That calculation is then used to determine if the cluster autoscaler has the capacity to add more worker resources.

Important

Ensure that the maxNodesTotal value in the ClusterAutoscaler resource definition that you create is large enough to account for the total possible number of machines in your cluster. This value must encompass the number of control plane machines and the possible number of compute machines that you might scale to.

7.1.1. Automatic node removal
Copy link

Every 10 seconds, the cluster autoscaler checks which nodes are unnecessary in the cluster and removes them. The cluster autoscaler considers a node for removal if the following conditions apply:

The node utilization is less than the node utilization level threshold for the cluster. The node utilization level is the sum of the requested resources divided by the allocated resources for the node. If you do not specify a value in the ClusterAutoscaler custom resource, the cluster autoscaler uses a default value of 0.5, which corresponds to 50% utilization.
The cluster autoscaler can move all pods running on the node to the other nodes. The Kubernetes scheduler is responsible for scheduling pods on the nodes.
The cluster autoscaler does not have scale down disabled annotation.

If the following types of pods are present on a node, the cluster autoscaler will not remove the node:

Pods with restrictive pod disruption budgets (PDBs).
Kube-system pods that do not run on the node by default.
Kube-system pods that do not have a PDB or have a PDB that is too restrictive.
Pods that are not backed by a controller object such as a deployment, replica set, or stateful set.
Pods with local storage.
Pods that cannot be moved elsewhere because of a lack of resources, incompatible node selectors or affinity, matching anti-affinity, and so on.
Unless they also have a "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" annotation, pods that have a "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.

For example, you set the maximum CPU limit to 64 cores and configure the cluster autoscaler to only create machines that have 8 cores each. If your cluster starts with 30 cores, the cluster autoscaler can add up to 4 more nodes with 32 cores, for a total of 62.

7.1.2. Limitations
Copy link

If you configure the cluster autoscaler, additional usage restrictions apply:

Do not modify the nodes that are in autoscaled node groups directly. All nodes within the same node group have the same capacity and labels and run the same system pods.
Specify requests for your pods.
If you have to prevent pods from being deleted too quickly, configure appropriate PDBs.
Confirm that your cloud provider quota is large enough to support the maximum node pools that you configure.
Do not run additional node group autoscalers, especially the ones offered by your cloud provider.

Note

The cluster autoscaler only adds nodes in autoscaled node groups if doing so would result in a schedulable pod. If the available node types cannot meet the requirements for a pod request, or if the node groups that could meet these requirements are at their maximum size, the cluster autoscaler cannot scale up.

7.1.3. Interaction with other scheduling features
Copy link

The horizontal pod autoscaler (HPA) and the cluster autoscaler modify cluster resources in different ways. The HPA changes the deployment’s or replica set’s number of replicas based on the current CPU load. If the load increases, the HPA creates new replicas, regardless of the amount of resources available to the cluster. If there are not enough resources, the cluster autoscaler adds resources so that the HPA-created pods can run. If the load decreases, the HPA stops some replicas. If this action causes some nodes to be underutilized or completely empty, the cluster autoscaler deletes the unnecessary nodes.

The cluster autoscaler takes pod priorities into account. The Pod Priority and Preemption feature enables scheduling pods based on priorities if the cluster does not have enough resources, but the cluster autoscaler ensures that the cluster has resources to run all pods. To honor the intention of both features, the cluster autoscaler includes a priority cutoff function. You can use this cutoff to schedule "best-effort" pods, which do not cause the cluster autoscaler to increase resources but instead run only when spare resources are available.

Pods with priority lower than the cutoff value do not cause the cluster to scale up or prevent the cluster from scaling down. No new nodes are added to run the pods, and nodes running these pods might be deleted to free resources.

7.2. Configuring the cluster autoscaler
Copy link

First, deploy the cluster autoscaler to manage automatic resource scaling in your OpenShift Container Platform cluster.

Note

Because the cluster autoscaler is scoped to the entire cluster, you can make only one cluster autoscaler for the cluster.

7.2.1. Cluster autoscaler resource definition
Copy link

This ClusterAutoscaler resource definition shows the parameters and sample values for the cluster autoscaler.

Note

When you change the configuration of an existing cluster autoscaler, it restarts.

apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  podPriorityThreshold: -10 
  resourceLimits:
    maxNodesTotal: 24 
    cores:
      min: 8 
      max: 128 
    memory:
      min: 4 
      max: 256 
    gpus:
    - type: <gpu_type> 
      min: 0 
      max: 16 
  logVerbosity: 4 
  scaleDown: 
    enabled: true 
    delayAfterAdd: 10m 
    delayAfterDelete: 5m 
    delayAfterFailure: 30s 
    unneededTime: 5m 
    utilizationThreshold: "0.4"

apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  podPriorityThreshold: -10

1


  resourceLimits:
    maxNodesTotal: 24

2


    cores:
      min: 8

3


      max: 128

4


    memory:
      min: 4

5


      max: 256

6


    gpus:
    - type: <gpu_type>

7


      min: 0

8


      max: 16

9


  logVerbosity: 4

10


  scaleDown:

11


    enabled: true

12


    delayAfterAdd: 10m

13


    delayAfterDelete: 5m

14


    delayAfterFailure: 30s

15


    unneededTime: 5m

16


    utilizationThreshold: "0.4"

17

Copy to Clipboard

Toggle word wrap

1

Specify the priority that a pod must exceed to cause the cluster autoscaler to deploy additional nodes. Enter a 32-bit integer value. The podPriorityThreshold value is compared to the value of the PriorityClass that you assign to each pod.

2

Specify the maximum number of nodes to deploy. This value is the total number of machines that are deployed in your cluster, not just the ones that the autoscaler controls. Ensure that this value is large enough to account for all of your control plane and compute machines and the total number of replicas that you specify in your MachineAutoscaler resources.

3

Specify the minimum number of cores to deploy in the cluster.

4

Specify the maximum number of cores to deploy in the cluster.

5

Specify the minimum amount of memory, in GiB, in the cluster.

6

Specify the maximum amount of memory, in GiB, in the cluster.

7

Optional: To configure the cluster autoscaler to deploy GPU-enabled nodes, specify a type value. This value must match the value of the spec.template.spec.metadata.labels[cluster-api/accelerator] label in the machine set that manages the GPU-enabled nodes of that type. For example, this value might be nvidia-t4 to represent Nvidia T4 GPUs, or nvidia-a10g for A10G GPUs. For more information, see "Labeling GPU machine sets for the cluster autoscaler".

8

Specify the minimum number of GPUs of the specified type to deploy in the cluster.

9

Specify the maximum number of GPUs of the specified type to deploy in the cluster.

10

Specify the logging verbosity level between 0 and 10. The following log level thresholds are provided for guidance:

1: (Default) Basic information about changes.
4: Debug-level verbosity for troubleshooting typical issues.
9: Extensive, protocol-level debugging information.

If you do not specify a value, the default value of 1 is used.

11

In this section, you can specify the period to wait for each action by using any valid ParseDuration interval, including ns, us, ms, s, m, and h.

12

Specify whether the cluster autoscaler can remove unnecessary nodes.

13

Optional: Specify the period to wait before deleting a node after a node has recently been added. If you do not specify a value, the default value of 10m is used.

14

Optional: Specify the period to wait before deleting a node after a node has recently been deleted. If you do not specify a value, the default value of 0s is used.

15

Optional: Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of 3m is used.

16

Optional: Specify a period of time before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of 10m is used.

17

Optional: Specify the node utilization level. Nodes below this utilization level are eligible for deletion.

The node utilization level is the sum of the requested resources divided by the allocated resources for the node, and must be a value greater than "0" but less than "1". If you do not specify a value, the cluster autoscaler uses a default value of "0.5", which corresponds to 50% utilization. You must express this value as a string.

Note

When performing a scaling operation, the cluster autoscaler remains within the ranges set in the ClusterAutoscaler resource definition, such as the minimum and maximum number of cores to deploy or the amount of memory in the cluster. However, the cluster autoscaler does not correct the current values in your cluster to be within those ranges.

The minimum and maximum CPUs, memory, and GPU values are determined by calculating those resources on all nodes in the cluster, even if the cluster autoscaler does not manage the nodes. For example, the control plane nodes are considered in the total memory in the cluster, even though the cluster autoscaler does not manage the control plane nodes.

7.2.1.1. Labeling GPU machine sets for the cluster autoscaler
Copy link

You can use a machine set label to indicate which machines the cluster autoscaler can use to deploy GPU-enabled nodes.

Prerequisites

Your cluster uses a cluster autoscaler.

Procedure

On the machine set that you want to create machines for the cluster autoscaler to use to deploy GPU-enabled nodes, add a cluster-api/accelerator label:
```
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  name: machine-set-name
spec:
  template:
    spec:
      metadata:
        labels:
          cluster-api/accelerator: nvidia-t4 
```
```
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  name: machine-set-name
spec:
  template:
    spec:
      metadata:
        labels:
          cluster-api/accelerator: nvidia-t4 
```
1
Copy to Clipboard Toggle word wrap
1
Specify a label of your choice that consists of alphanumeric characters, -, _, or . and starts and ends with an alphanumeric character. For example, you might use nvidia-t4 to represent Nvidia T4 GPUs, or nvidia-a10g for A10G GPUs.
Note
You must specify the value of this label for the spec.resourceLimits.gpus.type parameter in your ClusterAutoscaler CR. For more information, see "Cluster autoscaler resource definition".

7.2.2. Deploying a cluster autoscaler
Copy link

To deploy a cluster autoscaler, you create an instance of the ClusterAutoscaler resource.

Procedure

Create a YAML file for a ClusterAutoscaler resource that contains the custom resource definition.
Create the custom resource in the cluster by running the following command:
```
oc create -f <filename>.yaml
```
```
$ oc create -f <filename>.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
<filename> is the name of the custom resource file.

Next steps

After you configure the cluster autoscaler, you must configure at least one machine autoscaler.

7.3. About the machine autoscaler
Copy link

The machine autoscaler adjusts the number of Machines in the compute machine sets that you deploy in an OpenShift Container Platform cluster. You can scale both the default worker compute machine set and any other compute machine sets that you create. The machine autoscaler makes more Machines when the cluster runs out of resources to support more deployments. Any changes to the values in MachineAutoscaler resources, such as the minimum or maximum number of instances, are immediately applied to the compute machine set they target.

Important

You must deploy a machine autoscaler for the cluster autoscaler to scale your machines. The cluster autoscaler uses the annotations on compute machine sets that the machine autoscaler sets to determine the resources that it can scale. If you define a cluster autoscaler without also defining machine autoscalers, the cluster autoscaler will never scale your cluster.

7.4. Configuring machine autoscalers
Copy link

After you deploy the cluster autoscaler, deploy MachineAutoscaler resources that reference the compute machine sets that are used to scale the cluster.

Important

You must deploy at least one MachineAutoscaler resource after you deploy the ClusterAutoscaler resource.

Note

You must configure separate resources for each compute machine set. Remember that compute machine sets are different in each region, so consider whether you want to enable machine scaling in multiple regions. The compute machine set that you scale must have at least one machine in it.

7.4.1. Machine autoscaler resource definition
Copy link

This MachineAutoscaler resource definition shows the parameters and sample values for the machine autoscaler.

apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: "worker-us-east-1a" 
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1 
  maxReplicas: 12 
  scaleTargetRef: 
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet 
    name: worker-us-east-1a

apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: "worker-us-east-1a"

1


  namespace: "openshift-machine-api"
spec:
  minReplicas: 1

2


  maxReplicas: 12

3


  scaleTargetRef:

4


    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet

5


    name: worker-us-east-1a

6

Copy to Clipboard

Toggle word wrap

1: Specify the machine autoscaler name. To make it easier to identify which compute machine set this machine autoscaler scales, specify or include the name of the compute machine set to scale. The compute machine set name takes the following form: <clusterid>-<machineset>-<region>.
2: Specify the minimum number machines of the specified type that must remain in the specified zone after the cluster autoscaler initiates cluster scaling. If running in AWS, Google Cloud, Azure, RHOSP, or vSphere, this value can be set to 0. For other providers, do not set this value to 0.
You can save on costs by setting this value to 0 for use cases such as running expensive or limited-usage hardware that is used for specialized workloads, or by scaling a compute machine set with extra large machines. The cluster autoscaler scales the compute machine set down to zero if the machines are not in use.
Important
Do not set the spec.minReplicas value to 0 for the three compute machine sets that are created during the OpenShift Container Platform installation process for an installer provisioned infrastructure.
3: Specify the maximum number machines of the specified type that the cluster autoscaler can deploy in the specified zone after it initiates cluster scaling. Ensure that the maxNodesTotal value in the ClusterAutoscaler resource definition is large enough to allow the machine autoscaler to deploy this number of machines.
4: In this section, provide values that describe the existing compute machine set to scale.
5: The kind parameter value is always MachineSet.
6: The name value must match the name of an existing compute machine set, as shown in the metadata.name parameter value.

7.4.2. Deploying a machine autoscaler
Copy link

To deploy a machine autoscaler, you create an instance of the MachineAutoscaler resource.

Procedure

Create a YAML file for a MachineAutoscaler resource that contains the custom resource definition.
Create the custom resource in the cluster by running the following command:
```
oc create -f <filename>.yaml
```
```
$ oc create -f <filename>.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
<filename> is the name of the custom resource file.

7.5. Disabling autoscaling
Copy link

You can disable an individual machine autoscaler in your cluster or disable autoscaling on the cluster entirely.

7.5.1. Disabling a machine autoscaler
Copy link

To disable a machine autoscaler, you delete the corresponding MachineAutoscaler custom resource (CR).

Note

Disabling a machine autoscaler does not disable the cluster autoscaler. To disable the cluster autoscaler, follow the instructions in "Disabling the cluster autoscaler".

Procedure

List the MachineAutoscaler CRs for the cluster by running the following command:

oc get MachineAutoscaler -n openshift-machine-api

$ oc get MachineAutoscaler -n openshift-machine-api

Copy to Clipboard

Toggle word wrap

Example output

NAME                 REF KIND     REF NAME             MIN   MAX   AGE
compute-us-east-1a   MachineSet   compute-us-east-1a   1     12    39m
compute-us-west-1a   MachineSet   compute-us-west-1a   2     4     37m

NAME                 REF KIND     REF NAME             MIN   MAX   AGE
compute-us-east-1a   MachineSet   compute-us-east-1a   1     12    39m
compute-us-west-1a   MachineSet   compute-us-west-1a   2     4     37m

Copy to Clipboard

Toggle word wrap

Optional: Create a YAML file backup of the MachineAutoscaler CR by running the following command:
```
oc get MachineAutoscaler/<machine_autoscaler_name> \
  -n openshift-machine-api \
  -o yaml> <machine_autoscaler_name_backup>.yaml
```
```
$ oc get MachineAutoscaler/<machine_autoscaler_name> \
```
1
```
  -n openshift-machine-api \
  -o yaml> <machine_autoscaler_name_backup>.yaml 
```
2
Copy to Clipboard Toggle word wrap
1
<machine_autoscaler_name> is the name of the CR that you want to delete.
2
<machine_autoscaler_name_backup> is the name for the backup of the CR.

Delete the MachineAutoscaler CR by running the following command:

oc delete MachineAutoscaler/<machine_autoscaler_name> -n openshift-machine-api

$ oc delete MachineAutoscaler/<machine_autoscaler_name> -n openshift-machine-api

Copy to Clipboard

Toggle word wrap

Example output

machineautoscaler.autoscaling.openshift.io "compute-us-east-1a" deleted

machineautoscaler.autoscaling.openshift.io "compute-us-east-1a" deleted

Copy to Clipboard

Toggle word wrap

Verification

To verify that the machine autoscaler is disabled, run the following command:
```
oc get MachineAutoscaler -n openshift-machine-api
```
```
$ oc get MachineAutoscaler -n openshift-machine-api
```
Copy to Clipboard Toggle word wrap
The disabled machine autoscaler does not appear in the list of machine autoscalers.

Next steps

If you need to re-enable the machine autoscaler, use the <machine_autoscaler_name_backup>.yaml backup file and follow the instructions in "Deploying a machine autoscaler".

7.5.2. Disabling the cluster autoscaler
Copy link

To disable the cluster autoscaler, you delete the corresponding ClusterAutoscaler resource.

Note

Disabling the cluster autoscaler disables autoscaling on the cluster, even if the cluster has existing machine autoscalers.

Procedure

List the ClusterAutoscaler resource for the cluster by running the following command:
```
oc get ClusterAutoscaler
```
```
$ oc get ClusterAutoscaler
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME      AGE
default   42m
```
```
NAME      AGE
default   42m
```
Copy to Clipboard Toggle word wrap
Optional: Create a YAML file backup of the ClusterAutoscaler CR by running the following command:
```
oc get ClusterAutoscaler/default \
  -o yaml> <cluster_autoscaler_backup_name>.yaml
```
```
$ oc get ClusterAutoscaler/default \
```
1
```
  -o yaml> <cluster_autoscaler_backup_name>.yaml 
```
2
Copy to Clipboard Toggle word wrap
1
default is the name of the ClusterAutoscaler CR.
2
<cluster_autoscaler_backup_name> is the name for the backup of the CR.

Delete the ClusterAutoscaler CR by running the following command:

oc delete ClusterAutoscaler/default

$ oc delete ClusterAutoscaler/default

Copy to Clipboard

Toggle word wrap

Example output

clusterautoscaler.autoscaling.openshift.io "default" deleted

clusterautoscaler.autoscaling.openshift.io "default" deleted

Copy to Clipboard

Toggle word wrap

Verification

To verify that the cluster autoscaler is disabled, run the following command:
```
oc get ClusterAutoscaler
```
```
$ oc get ClusterAutoscaler
```
Copy to Clipboard Toggle word wrap
Expected output
```
No resources found
```
```
No resources found
```
Copy to Clipboard Toggle word wrap

Next steps

Disabling the cluster autoscaler by deleting the ClusterAutoscaler CR prevents the cluster from autoscaling but does not delete any existing machine autoscalers on the cluster. To clean up unneeded machine autoscalers, see "Disabling a machine autoscaler".
If you need to re-enable the cluster autoscaler, use the <cluster_autoscaler_name_backup>.yaml backup file and follow the instructions in "Deploying a cluster autoscaler".

7.1. About the cluster autoscaler
Copy link

7.1.1. Automatic node removal
Copy link

7.1.2. Limitations
Copy link

7.1.3. Interaction with other scheduling features
Copy link

7.2. Configuring the cluster autoscaler
Copy link

7.2.1. Cluster autoscaler resource definition
Copy link

7.2.1.1. Labeling GPU machine sets for the cluster autoscaler
Copy link

7.2.2. Deploying a cluster autoscaler
Copy link

7.3. About the machine autoscaler
Copy link

7.4. Configuring machine autoscalers
Copy link

7.4.1. Machine autoscaler resource definition
Copy link

7.4.2. Deploying a machine autoscaler
Copy link

7.5. Disabling autoscaling
Copy link

7.5.1. Disabling a machine autoscaler
Copy link

7.5.2. Disabling the cluster autoscaler
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 7. Applying autoscaling to an OpenShift Container Platform cluster

7.1. About the cluster autoscalerCopy linkLink copied to clipboard!

7.1.1. Automatic node removalCopy linkLink copied to clipboard!

7.1.2. LimitationsCopy linkLink copied to clipboard!

7.1.3. Interaction with other scheduling featuresCopy linkLink copied to clipboard!

7.2. Configuring the cluster autoscalerCopy linkLink copied to clipboard!

7.2.1. Cluster autoscaler resource definitionCopy linkLink copied to clipboard!

7.2.1.1. Labeling GPU machine sets for the cluster autoscalerCopy linkLink copied to clipboard!

7.2.2. Deploying a cluster autoscalerCopy linkLink copied to clipboard!

7.3. About the machine autoscalerCopy linkLink copied to clipboard!

7.4. Configuring machine autoscalersCopy linkLink copied to clipboard!

7.4.1. Machine autoscaler resource definitionCopy linkLink copied to clipboard!

7.4.2. Deploying a machine autoscalerCopy linkLink copied to clipboard!

7.5. Disabling autoscalingCopy linkLink copied to clipboard!

7.5.1. Disabling a machine autoscalerCopy linkLink copied to clipboard!

7.5.2. Disabling the cluster autoscalerCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

7.1. About the cluster autoscaler
Copy link

7.1.1. Automatic node removal
Copy link

7.1.2. Limitations
Copy link

7.1.3. Interaction with other scheduling features
Copy link

7.2. Configuring the cluster autoscaler
Copy link

7.2.1. Cluster autoscaler resource definition
Copy link

7.2.1.1. Labeling GPU machine sets for the cluster autoscaler
Copy link

7.2.2. Deploying a cluster autoscaler
Copy link

7.3. About the machine autoscaler
Copy link

7.4. Configuring machine autoscalers
Copy link

7.4.1. Machine autoscaler resource definition
Copy link

7.4.2. Deploying a machine autoscaler
Copy link

7.5. Disabling autoscaling
Copy link

7.5.1. Disabling a machine autoscaler
Copy link

7.5.2. Disabling the cluster autoscaler
Copy link