Chapter 4. Managing compute nodes using machine pools


4.1. About machine pools

Red Hat OpenShift Service on AWS classic architecture uses machine pools as an elastic, dynamic provisioning method on top of your cloud infrastructure.

The primary resources are machines, compute machine sets, and machine pools.

4.1.1. Machines

A machine is a fundamental unit that describes the host for a worker node.

4.1.2. Machine sets

MachineSet resources are groups of compute machines. If you need more machines or must scale them down, change the number of replicas in the machine pool to which the compute machine sets belong.

Machine sets are not directly modifiable in Red Hat OpenShift Service on AWS classic architecture.

4.1.3. Machine pools

Machine pools are a higher level construct to compute machine sets.

A machine pool creates compute machine sets that are all clones of the same configuration across availability zones. Machine pools perform all of the host node provisioning management actions on a worker node. If you need more machines or must scale them down, change the number of replicas in the machine pool to meet your compute needs. You can manually configure scaling or set autoscaling.

Important

Worker nodes are not guaranteed longevity, and may be replaced at any time as part of the normal operation and management of OpenShift. For more details about the node lifecycle, refer to additional resources.

Multiple machine pools can exist on a single cluster, and each machine pool can contain a unique node type and node size (AWS EC2 instance type and size) configuration.

4.1.3.1. Machine pools during cluster installation

By default, a cluster has one machine pool. During cluster installation, you can define instance type or size and add labels to this machine pool as well as define the size of the root disk.

After a cluster’s installation:

  • You can remove or add labels to any machine pool.
  • You can add additional machine pools to an existing cluster.
  • You can add taints to any machine pool if there is one machine pool without any taints.
  • You can create or delete a machine pool if there is one machine pool without any taints and at least two replicas for a Single-AZ cluster or three replicas for a Multi-AZ cluster.

    Note

    You cannot change the machine pool node type or size. The machine pool node type or size is specified during their creation only. If you need a different node type or size, you must re-create a machine pool and specify the required node type or size values.

  • You can add a label to each added machine pool.
Important

Worker nodes are not guaranteed longevity, and may be replaced at any time as part of the normal operation and management of OpenShift. For more details about the node lifecycle, refer to additional resources.

Procedure

  • Optional: Add a label to the default machine pool after configuration by using the default machine pool labels and running the following command:

    $ rosa edit machinepool -c <cluster_name> <machinepool_name> -i

    Example input

    $ rosa edit machinepool -c mycluster worker -i
    ? Enable autoscaling: No
    ? Replicas: 3
    ? Labels: mylabel=true
    I: Updated machine pool 'worker' on cluster 'mycluster'

4.1.4. Machine pools in multiple zone clusters

In a cluster created across multiple Availability Zones (AZ), the machine pools can be created across either all of the three AZs or any single AZ of your choice. The machine pool created by default at the time of cluster creation will be created with machines in all three AZs and scale in multiples of three.

If you create a new Multi-AZ cluster, the machine pools are replicated to those zones automatically. By default, if you add a machine pool to an existing Multi-AZ cluster, the new machine pool is automatically created in all of the zones.

Note

You can override this default setting and create a machine pool in a Single-AZ of your choice.

Similarly, deleting a machine pool will delete it from all zones. Due to this multiplicative effect, using machine pools in Multi-AZ cluster can consume more of your project’s quota for a specific region when creating machine pools.

4.1.5. Additional resources

4.2. Managing compute nodes

With Red Hat OpenShift Service on AWS classic architecture, you can manage compute (also known as worker) nodes to create and configure optimal compute capacity for your workloads.

The majority of changes for compute nodes are configured on machine pools. A machine pool is a group of compute nodes in a cluster that have the same configuration, providing ease of management.

You can edit machine pool configuration options such as scaling, adding node labels, and adding taints.

4.2.1. Creating a machine pool

A machine pool is created when you install a Red Hat OpenShift Service on AWS classic architecture cluster. After installation, you can create additional machine pools for your cluster by using OpenShift Cluster Manager or the ROSA command-line interface (CLI) (rosa).

Note

For users of rosa version 1.2.25 and earlier versions, the machine pool created along with the cluster is identified as Default. For users of rosa version 1.2.26 and later, the machine pool created along with the cluster is identified as worker.

You can create additional machine pools for your Red Hat OpenShift Service on AWS classic architecture cluster by using OpenShift Cluster Manager.

Prerequisites

  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.

Procedure

  1. Navigate to OpenShift Cluster Manager and select your cluster.
  2. Under the Machine pools tab, click Add machine pool.
  3. Add a Machine pool name.
  4. Select a Compute node instance type from the list. The instance type defines the vCPU and memory allocation for each compute node in the machine pool.

    Note

    You cannot change the instance type for a machine pool after the pool is created.

  5. Optional: Configure autoscaling for the machine pool:

    1. Select Enable autoscaling to automatically scale the number of machines in your machine pool to meet the deployment needs.
    2. Set the minimum and maximum node count limits for autoscaling. The cluster autoscaler does not reduce or increase the machine pool node count beyond the limits that you specify.

      • If you deployed your cluster using a single availability zone, set the Minimum and maximum node count. This defines the minimum and maximum compute node limits in the availability zone.
      • If you deployed your cluster using multiple availability zones, set the Minimum nodes per zone and Maximum nodes per zone. This defines the minimum and maximum compute node limits per zone.

        Note

        Alternatively, you can set your autoscaling preferences for the machine pool after the machine pool is created.

  6. If you did not enable autoscaling, select a compute node count:

    • If you deployed your cluster using a single availability zone, select a Compute node count from the drop-down menu. This defines the number of compute nodes to provision to the machine pool for the zone.
    • If you deployed your cluster using multiple availability zones, select a Compute node count (per zone) from the drop-down menu. This defines the number of compute nodes to provision to the machine pool per zone.
  7. Optional: Configure Root disk size.
  8. Optional: Add node labels and taints for your machine pool:

    1. Expand the Edit node labels and taints menu.
    2. Under Node labels, add Key and Value entries for your node labels.
    3. Under Taints, add Key and Value entries for your taints.

      Note

      Creating a machine pool with taints is only possible if the cluster already has at least one machine pool without a taint.

    4. For each taint, select an Effect from the drop-down menu. Available options include NoSchedule, PreferNoSchedule, and NoExecute.

      Note

      Alternatively, you can add the node labels and taints after you create the machine pool.

  9. Optional: Select additional custom security groups to use for nodes in this machine pool. You must have already created the security groups and associated them with the VPC that you selected for this cluster. You cannot add or edit security groups after you create the machine pool. For more information, see the requirements for security groups in the "Additional resources" section.
  10. Optional: Use Amazon EC2 Spot Instances if you want to configure your machine pool to deploy machines as non-guaranteed AWS Spot Instances:

    1. Select Use Amazon EC2 Spot Instances.
    2. Leave Use On-Demand instance price selected to use the on-demand instance price. Alternatively, select Set maximum price to define a maximum hourly price for a Spot Instance.

      For more information about Amazon EC2 Spot Instances, see the AWS documentation.

      Important

      Your Amazon EC2 Spot Instances might be interrupted at any time. Use Amazon EC2 Spot Instances only for workloads that can tolerate interruptions.

      Note

      If you select Use Amazon EC2 Spot Instances for a machine pool, you cannot disable the option after the machine pool is created.

  11. Click Add machine pool to create the machine pool.

Verification

  • Verify that the machine pool is visible on the Machine pools page and the configuration is as expected.

You can create additional machine pools for your Red Hat OpenShift Service on AWS classic architecture cluster by using the ROSA command-line interface (CLI) (rosa).

Prerequisites

  • You installed and configured the latest ROSA CLI on your workstation.
  • You logged in to your Red Hat account using the ROSA CLI.
  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.

Procedure

  • To add a machine pool that does not use autoscaling, create the machine pool and define the instance type, compute (also known as worker) node count, and node labels:

    $ rosa create machinepool --cluster=<cluster-name> \
                              --name=<machine_pool_id> \
                              --replicas=<replica_count> \
                              --instance-type=<instance_type> \
                              --labels=<key>=<value>,<key>=<value> \
                              --taints=<key>=<value>:<effect>,<key>=<value>:<effect> \
                              --use-spot-instances \
                              --spot-max-price=<price> \
                              --disk-size=<disk_size> \
                              --availability-zone=<availability_zone_name> \
                              --additional-security-group-ids <sec_group_id> \
                              --subnet <subnet_id>

    where:

    --name=<machine_pool_id>
    Specifies the name of the machine pool.
    --replicas=<replica_count>
    Specifies the number of compute nodes to provision. If you deployed Red Hat OpenShift Service on AWS classic architecture using a single availability zone, this defines the number of compute nodes to provision to the machine pool for the zone. If you deployed your cluster using multiple availability zones, this defines the number of compute nodes to provision in total across all zones and the count must be a multiple of 3. The --replicas argument is required when autoscaling is not configured.
    --instance-type=<instance_type>
    Optional: Sets the instance type for the compute nodes in your machine pool. The instance type defines the vCPU and memory allocation for each compute node in the pool. Replace <instance_type> with an instance type. The default is m5.xlarge. You cannot change the instance type for a machine pool after the pool is created.
    --labels=<key>=<value>,<key>=<value>
    Optional: Defines the labels for the machine pool. Replace <key>=<value>,<key>=<value> with a comma-delimited list of key-value pairs, for example --labels=key1=value1,key2=value2.
    --taints=<key>=<value>:<effect>,<key>=<value>:<effect>
    Optional: Defines the taints for the machine pool. Replace <key>=<value>:<effect>,<key>=<value>:<effect> with a key, value, and effect for each taint, for example --taints=key1=value1:NoSchedule,key2=value2:NoExecute. Available effects include NoSchedule, PreferNoSchedule, and NoExecute.
    --use-spot-instances
    Optional: Configures your machine pool to deploy machines as non-guaranteed AWS Spot Instances. For information, see Amazon EC2 Spot Instances in the AWS documentation. If you select Use Amazon EC2 Spot Instances for a machine pool, you cannot disable the option after the machine pool is created.
    --spot-max-price=<price>

    Optional: If you choose to use Spot Instances, you can specify this argument to define a maximum hourly price for a Spot Instance. If this argument is not specified, the on-demand price is used.

    Important

    Your Amazon EC2 Spot Instances might be interrupted at any time. Use Amazon EC2 Spot Instances only for workloads that can tolerate interruptions.

    --disk-size=<disk_size>
    Optional: Specifies the worker node disk size. The value can be in GB, GiB, TB, or TiB. Replace <disk_size> with a numeric value and unit, for example --disk-size=200GiB.
    --availability-zone=<availability_zone_name>

    Optional: For Multi-AZ clusters, you can create a machine pool in a Single-AZ of your choice. Replace <availability_zone_name> with a Single-AZ name.

    Note

    Multi-AZ clusters retain a Multi-AZ control plane and can have worker machine pools across a Single-AZ or Multi-AZ. Machine pools distribute machines (nodes) evenly across availability zones.

    Warning

    If you choose a worker machine pool with a Single-AZ, there is no fault tolerance for that machine pool, regardless of machine replica count. For fault-tolerant worker machine pools, choosing a Multi-AZ machine pool distributes machines in multiples of 3 across availability zones.

    • A Multi-AZ machine pool with three availability zones can have a machine count in multiples of 3 only, such as 3, 6, 9, and so on.
    • A Single-AZ machine pool with one availability zone can have a machine count in multiples of 1, such as 1, 2, 3, 4, and so on.
    --additional-security-group-ids <sec_group_id>
    Optional: For machine pools in clusters that do not have Red Hat managed VPCs, you can select additional custom security groups to use in your machine pools. You must have already created the security groups and associated them with the VPC that you selected for this cluster. You cannot add or edit security groups after you create the machine pool. For more information, see the requirements for security groups in the "Additional resources" section.
    --subnet <subnet_id>

    Optional: For BYO VPC clusters, you can select a subnet to create a Single-AZ machine pool. If the subnet is out of your cluster creation subnets, there must be a tag with a key kubernetes.io/cluster/<infra-id> and value shared. Customers can obtain the Infra ID by using the following command:

    $ rosa describe cluster -c <cluster name>|grep "Infra ID:"

    Example output

    Infra ID:                   mycluster-xqvj7

    Note

    You cannot set both --subnet and --availability-zone at the same time, only 1 is allowed for a Single-AZ machine pool creation.

    The following example creates a machine pool called mymachinepool that uses the m5.xlarge instance type and has 2 compute node replicas. The example also adds 2 workload-specific labels:

    $ rosa create machinepool --cluster=mycluster --name=mymachinepool --replicas=2 --instance-type=m5.xlarge --labels=app=db,tier=backend

    Example output

    I: Machine pool 'mymachinepool' created successfully on cluster 'mycluster'
    I: To view all machine pools, run 'rosa list machinepools -c mycluster'

  • To add a machine pool that uses autoscaling, create the machine pool and define the autoscaling configuration, instance type and node labels:

    $ rosa create machinepool --cluster=<cluster-name> \
                              --name=<machine_pool_id> \
                              --enable-autoscaling \
                              --min-replicas=<minimum_replica_count> \
                              --max-replicas=<maximum_replica_count> \
                              --instance-type=<instance_type> \
                              --labels=<key>=<value>,<key>=<value> \
                              --taints=<key>=<value>:<effect>,<key>=<value>:<effect> \
                              --availability-zone=<availability_zone_name> \
                              --use-spot-instances \
                              --spot-max-price=<price>

    where:

    --name=<machine_pool_id>
    Specifies the name of the machine pool. Replace <machine_pool_id> with the name of your machine pool.
    --enable-autoscaling
    Enables autoscaling in the machine pool to meet the deployment needs.
    --min-replicas=<minimum_replica_count> and --max-replicas=<maximum_replica_count>

    Defines the minimum and maximum compute node limits. The cluster autoscaler does not reduce or increase the machine pool node count beyond the limits that you specify.

    If you deployed Red Hat OpenShift Service on AWS classic architecture using a single availability zone, the --min-replicas and --max-replicas arguments define the autoscaling limits in the machine pool for the zone. If you deployed your cluster using multiple availability zones, the arguments define the autoscaling limits in total across all zones and the counts must be multiples of 3.

    --instance-type=<instance_type>
    Optional: Sets the instance type for the compute nodes in your machine pool. The instance type defines the vCPU and memory allocation for each compute node in the pool. Replace <instance_type> with an instance type. The default is m5.xlarge. You cannot change the instance type for a machine pool after the pool is created.
    --labels=<key>=<value>,<key>=<value>
    Optional: Defines the labels for the machine pool. Replace <key>=<value>,<key>=<value> with a comma-delimited list of key-value pairs, for example --labels=key1=value1,key2=value2.
    --taints=<key>=<value>:<effect>,<key>=<value>:<effect>
    Optional: Defines the taints for the machine pool. Replace <key>=<value>:<effect>,<key>=<value>:<effect> with a key, value, and effect for each taint, for example --taints=key1=value1:NoSchedule,key2=value2:NoExecute. Available effects include NoSchedule, PreferNoSchedule, and NoExecute.
    --availability-zone=<availability_zone_name>
    Optional: For Multi-AZ clusters, you can create a machine pool in a Single-AZ of your choice. Replace <availability_zone_name> with a Single-AZ name.
    --use-spot-instances

    Optional: Configures your machine pool to deploy machines as non-guaranteed AWS Spot Instances. For information, see Amazon EC2 Spot Instances in the AWS documentation. If you select Use Amazon EC2 Spot Instances for a machine pool, you cannot disable the option after the machine pool is created.

    Important

    Your Amazon EC2 Spot Instances might be interrupted at any time. Use Amazon EC2 Spot Instances only for workloads that can tolerate interruptions.

    --spot-max-price=<price>
    Optional: If you choose to use Spot Instances, you can specify this argument to define a maximum hourly price for a Spot Instance. If this argument is not specified, the on-demand price is used.

    The following example creates a machine pool called mymachinepool that uses the m5.xlarge instance type and has autoscaling enabled. The minimum compute node limit is 3 and the maximum is 6 overall. The example also adds 2 workload-specific labels:

    $ rosa create machinepool --cluster=mycluster --name=mymachinepool --enable-autoscaling --min-replicas=3 --max-replicas=6 --instance-type=m5.xlarge --labels=app=db,tier=backend

    Example output

    I: Machine pool 'mymachinepool' created successfully on cluster 'mycluster'
    I: To view all machine pools, run 'rosa list machinepools -c mycluster'

Verification

You can list all machine pools on your cluster or describe individual machine pools.

  1. List the available machine pools on your cluster:

    $ rosa list machinepools --cluster=<cluster_name>

    Example output

    ID             AUTOSCALING  REPLICAS  INSTANCE TYPE  LABELS                  TAINTS    AVAILABILITY ZONES                    SPOT INSTANCES
    Default        No           3         m5.xlarge                                        us-east-1a, us-east-1b, us-east-1c    N/A
    mymachinepool  Yes          3-6       m5.xlarge      app=db, tier=backend              us-east-1a, us-east-1b, us-east-1c    No

  2. Describe the information of a specific machine pool in your cluster:

    $ rosa describe machinepool --cluster=<cluster_name> --machinepool=mymachinepool

    Example output

    ID:                         mymachinepool
    Cluster ID:                 27iimopsg1mge0m81l0sqivkne2qu6dr
    Autoscaling:                Yes
    Replicas:                   3-6
    Instance type:              m5.xlarge
    Image type:                 Windows
    Labels:                     app=db, tier=backend
    Taints:
    Availability zones:         us-east-1a, us-east-1b, us-east-1c
    Subnets:
    Spot instances:             No
    Disk size:                  300 GiB
    Security Group IDs:

  3. Verify that the machine pool is included in the output and the configuration is as expected.

4.2.2. Configuring machine pool disk volume

Machine pool disk volume size can be configured for additional flexibility. The default disk size is 300 GiB.

For Red Hat OpenShift Service on AWS classic architecture clusters version 4.13 or earlier, the disk size can be configured from a minimum of 128 GiB to a maximum of 1 TiB. For version 4.14 and later, the disk size can be configured to a minimum of 128 GiB to a maximum of 16 TiB.

You can configure the machine pool disk size for your cluster by using OpenShift Cluster Manager or the ROSA command-line interface (CLI) (rosa).

Note

Existing cluster and machine pool node volumes cannot be resized.

Prerequisite for cluster creation

  • You have the option to select the node disk sizing for the default machine pool during cluster installation.

Procedure for cluster creation

  1. From the Red Hat OpenShift Service on AWS classic architecture cluster wizard, navigate to Cluster settings.
  2. Navigate to Machine pool step.
  3. Select the desired Root disk size.
  4. Select Next to continue creating your cluster.

Prerequisite for machine pool creation

  • You have the option to select the node disk sizing for the new machine pool after the cluster has been installed.

Procedure for machine pool creation

  1. Navigate to OpenShift Cluster Manager and select your cluster.
  2. Navigate to Machine pool tab.
  3. Click Add machine pool.
  4. Select the desired Root disk size.
  5. Select Add machine pool to create the machine pool.

Prerequisite for cluster creation

  • You have the option to select the root disk sizing for the default machine pool during cluster installation.

Procedure for cluster creation

  • Run the following command when creating your OpenShift cluster for the desired root disk size:

    $ rosa create cluster --worker-disk-size=<disk_size>

    The value can be in GB, GiB, TB, or TiB. Replace <disk_size> with a numeric value and unit, for example --worker-disk-size=200GiB. You cannot separate the digit and the unit. No spaces are allowed.

Prerequisite for machine pool creation

  • You have the option to select the root disk sizing for the new machine pool after the cluster has been installed.

Procedure for machine pool creation

  1. Scale up the cluster by executing the following command:

    $ rosa create machinepool --cluster=<cluster_id> \
    1
    
                              --disk-size=<disk_size> 
    2
    1
    Specifies the ID or name of your existing OpenShift cluster.
    2
    Specifies the worker node disk size. The value can be in GB, GiB, TB, or TiB. Replace <disk_size> with a numeric value and unit, for example --disk-size=200GiB. You cannot separate the digit and the unit. No spaces are allowed.
  2. Confirm new machine pool disk volume size by logging into the AWS console and find the EC2 virtual machine root volume size.

4.2.3. Deleting a machine pool

You can delete a machine pool in the event that your workload requirements have changed and your current machine pools no longer meet your needs.

You can delete machine pools using Red Hat OpenShift Cluster Manager or the ROSA command-line interface (CLI) (rosa).

You can delete a machine pool for your Red Hat OpenShift Service on AWS classic architecture cluster by using Red Hat OpenShift Cluster Manager.

Prerequisites

  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.
  • The cluster is in the ready state.
  • You have an existing machine pool without any taints and with at least two instances for a single-AZ cluster or three instances for a multi-AZ cluster.

Procedure

  1. From OpenShift Cluster Manager, navigate to the Cluster List page and select the cluster that contains the machine pool that you want to delete.
  2. On the selected cluster, select the Machine pools tab.
  3. Under the Machine pools tab, click the Options menu kebab for the machine pool that you want to delete.
  4. Click Delete.

    The selected machine pool is deleted.

You can delete a machine pool for your Red Hat OpenShift Service on AWS classic architecture cluster by using the ROSA command-line interface (CLI) (rosa).

Note

For users of rosa version 1.2.25 and earlier versions, the machine pool (ID='Default') that is created along with the cluster cannot be deleted. For users of rosa version 1.2.26 and later, the machine pool (ID='worker') that is created along with the cluster can be deleted if there is one machine pool within the cluster that contains no taints, and at least two replicas for a Single-AZ cluster or three replicas for a Multi-AZ cluster.

Prerequisites

  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.
  • The cluster is in the ready state.
  • You have an existing machine pool without any taints and with at least two instances for a Single-AZ cluster or three instances for a Multi-AZ cluster.

Procedure

  1. From the ROSA CLI, run the following command:

    $ rosa delete machinepool -c=<cluster_name> <machine_pool_ID>

    Example output

    ? Are you sure you want to delete machine pool <machine_pool_ID> on cluster <cluster_name>? (y/N)

  2. Enter y to delete the machine pool.

    The selected machine pool is deleted.

4.2.4. Scaling compute nodes manually

If you have not enabled autoscaling for your machine pool, you can manually scale the number of compute (also known as worker) nodes in the pool to meet your deployment needs.

You must scale each machine pool separately.

Prerequisites

  • You installed and configured the latest ROSA command-line interface (CLI) (rosa) on your workstation.
  • You logged in to your Red Hat account using the ROSA CLI.
  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.
  • You have an existing machine pool.

Procedure

  1. List the machine pools in the cluster:

    $ rosa list machinepools --cluster=<cluster_name>

    Example output

    ID        AUTOSCALING   REPLICAS    INSTANCE TYPE  LABELS    TAINTS   AVAILABILITY ZONES    DISK SIZE   SG IDs
    default   No            2           m5.xlarge                         us-east-1a            300GiB      sg-0e375ff0ec4a6cfa2
    mp1       No            2           m5.xlarge                         us-east-1a            300GiB      sg-0e375ff0ec4a6cfa2

  2. Increase or decrease the number of compute node replicas in a machine pool:

    $ rosa edit machinepool --cluster=<cluster_name> \
                            --replicas=<replica_count> \
    1
    
                            <machine_pool_id> 
    2
    1
    If you deployed Red Hat OpenShift Service on AWS classic architecture using a single availability zone, the replica count defines the number of compute nodes to provision to the machine pool for the zone. If you deployed your cluster using multiple availability zones, the count defines the total number of compute nodes in the machine pool across all zones and must be a multiple of 3.
    2
    Replace <machine_pool_id> with the ID of your machine pool, as listed in the output of the preceding command.

Verification

  1. List the available machine pools in your cluster:

    $ rosa list machinepools --cluster=<cluster_name>

    Example output

    ID        AUTOSCALING   REPLICAS    INSTANCE TYPE  LABELS    TAINTS   AVAILABILITY ZONES    DISK SIZE   SG IDs
    default   No            2           m5.xlarge                         us-east-1a            300GiB      sg-0e375ff0ec4a6cfa2
    mp1       No            3           m5.xlarge                         us-east-1a            300GiB      sg-0e375ff0ec4a6cfa2

  2. In the output of the preceding command, verify that the compute node replica count is as expected for your machine pool. In the example output, the compute node replica count for the mp1 machine pool is scaled to 3.

4.2.5. Node labels

A label is a key-value pair applied to a Node object. You can use labels to organize sets of objects and control the scheduling of pods.

You can add labels during cluster creation or after. Labels can be modified or updated at any time.

4.2.5.1. Adding node labels to a machine pool

Add or edit labels for compute (also known as worker) nodes at any time to manage the nodes in a manner that is relevant to you. For example, you can assign types of workloads to specific nodes.

Labels are assigned as key-value pairs. Each key must be unique to the object it is assigned to.

Prerequisites

  • You installed and configured the latest ROSA command-line interface (CLI) (rosa) on your workstation.
  • You logged in to your Red Hat account using the ROSA CLI.
  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.
  • You have an existing machine pool.

Procedure

  1. List the machine pools in the cluster:

    $ rosa list machinepools --cluster=<cluster_name>

    Example output

    ID           AUTOSCALING  REPLICAS  INSTANCE TYPE  LABELS    TAINTS    AVAILABILITY ZONES    SPOT INSTANCES
    Default      No           2         m5.xlarge                          us-east-1a            N/A
    db-nodes-mp  No           2         m5.xlarge                          us-east-1a            No

  2. Add or update the node labels for a machine pool:

    • To add or update node labels for a machine pool that does not use autoscaling, run the following command:

      $ rosa edit machinepool --cluster=<cluster_name> \
                              --labels=<key>=<value>,<key>=<value> \
      1
      
                              <machine_pool_id>
      1
      Replace <key>=<value>,<key>=<value> with a comma-delimited list of key-value pairs, for example --labels=key1=value1,key2=value2. This list overwrites any modifications made to node labels on an ongoing basis.

      The following example adds labels to the db-nodes-mp machine pool:

      $ rosa edit machinepool --cluster=mycluster --replicas=2 --labels=app=db,tier=backend db-nodes-mp

      Example output

      I: Updated machine pool 'db-nodes-mp' on cluster 'mycluster'

Verification

  1. Describe the details of the machine pool with the new labels:

    $ rosa describe machinepool --cluster=<cluster_name> --machinepool=<machine-pool-name>

    Example output

    ID:                         db-nodes-mp
    Cluster ID:                 <ID_of_cluster>
    Autoscaling:                No
    Replicas:                   2
    Instance type:              m5.xlarge
    Labels:                     app=db, tier=backend
    Taints:
    Availability zones:         us-east-1a
    Subnets:
    Spot instances:             No
    Disk size:                  300 GiB
    Security Group IDs:

  2. Verify that the labels are included for your machine pool in the output.

4.2.6. Adding tags to a machine pool

You can add tags for compute nodes, also known as worker nodes, in a machine pool to introduce custom user tags for AWS resources that are generated when you provision your machine pool, noting that you can not edit the tags after you create the machine pool.

You can add tags to a machine pool for your Red Hat OpenShift Service on AWS classic architecture cluster by using the ROSA command-line interface (CLI) (rosa). You can not edit the tags after after you create the machine pool.

Important

You must ensure that your tag keys are not aws, red-hat-managed, red-hat-clustertype, or Name. In addition, you must not set a tag key that begins with kubernetes.io/cluster/. Your tag’s key cannot be longer than 128 characters, while your tag’s value cannot be longer than 256 characters. Red Hat reserves the right to add additional reserved tags in the future.

Prerequisites

  • You installed and configured the latest AWS (aws), ROSA (rosa), and OpenShift (oc) CLIs on your workstation.
  • You logged in to your Red Hat account by using the ROSA CLI.
  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.

Procedure

  • Create a machine pool with a custom tag by running the following command:

    $ rosa create machinepools --cluster=<name> --replicas=<replica_count> \
         --name <mp_name> --tags='<key> <value>,<key> <value>' 
    1
    1
    Replace <key> <value>,<key> <value> with a key and value for each tag.

    Example output

    $ rosa create machinepools --cluster=mycluster --replicas 2 --tags='tagkey1 tagvalue1,tagkey2 tagvaluev2'
    
    I: Checking available instance types for machine pool 'mp-1'
    I: Machine pool 'mp-1' created successfully on cluster 'mycluster'
    I: To view the machine pool details, run 'rosa describe machinepool --cluster mycluster --machinepool mp-1'
    I: To view all machine pools, run 'rosa list machinepools --cluster mycluster'

Verification

  • Use the describe command to see the details of the machine pool with the tags, and verify that the tags are included for your machine pool in the output:

    $ rosa describe machinepool --cluster=<cluster_name> --machinepool=<machinepool_name>

    Example output

    ID:                                    mp-1
    Cluster ID:                            2baiirqa2141oreotoivp4sipq84vp5g
    Autoscaling:                           No
    Replicas:                              2
    Instance type:                         m5.xlarge
    Labels:
    Taints:
    Availability zones:                    us-east-1a
    Subnets:
    Spot instances:                        No
    Disk size:                             300 GiB
    Additional Security Group IDs:
    Tags:                                  red-hat-clustertype=rosa, red-hat-managed=true, tagkey1=tagvalue1, tagkey2=tagvaluev2

4.2.7. Adding taints to a machine pool

You can add taints for compute (also known as worker) nodes in a machine pool to control which pods are scheduled to them. When you apply a taint to a machine pool, the scheduler cannot place a pod on the nodes in the pool unless the pod specification includes a toleration for the taint. Taints can be added to a machine pool using Red Hat OpenShift Cluster Manager or the ROSA command-line interface (CLI) (rosa).

Note

A cluster must have at least one machine pool that does not contain any taints.

You can add taints to a machine pool for your Red Hat OpenShift Service on AWS classic architecture cluster by using Red Hat OpenShift Cluster Manager.

Prerequisites

  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.
  • You have an existing machine pool that does not contain any taints and contains at least two instances.

Procedure

  1. Navigate to OpenShift Cluster Manager and select your cluster.
  2. Under the Machine pools tab, click the Options menu kebab for the machine pool that you want to add a taint to.
  3. Select Edit taints.
  4. Add Key and Value entries for your taint.
  5. Select an Effect for your taint from the list. Available options include NoSchedule, PreferNoSchedule, and NoExecute.
  6. Optional: Select Add taint if you want to add more taints to the machine pool.
  7. Click Save to apply the taints to the machine pool.

Verification

  1. Under the Machine pools tab, select > next to your machine pool to expand the view.
  2. Verify that your taints are listed under Taints in the expanded view.

You can add taints to a machine pool for your Red Hat OpenShift Service on AWS classic architecture cluster by using the ROSA command-line interface (CLI) (rosa).

Note

For users of rosa version 1.2.25 and prior versions, the number of taints cannot be changed within the machine pool (ID=Default) created along with the cluster. For users of rosa version 1.2.26 and beyond, the number of taints can be changed within the machine pool (ID=worker) created along with the cluster. There must be at least one machine pool without any taints and with at least two replicas for a Single-AZ cluster or three replicas for a Multi-AZ cluster.

Prerequisites

  • You installed and configured the latest AWS (aws), ROSA (rosa), and OpenShift (oc) CLIs on your workstation.
  • You logged in to your Red Hat account by using the rosa CLI.
  • You created a Red Hat OpenShift Service on AWS classic architecture cluster.
  • You have an existing machine pool that does not contain any taints and contains at least two instances.

Procedure

  1. List the machine pools in the cluster by running the following command:

    $ rosa list machinepools --cluster=<cluster_name>
  2. Add or update the taints for a machine pool:

    • To add or update taints for a machine pool that does not use autoscaling, run the following command:

      $ rosa edit machinepool --cluster=<cluster_name> \
                              --taints=<key>=<value>:<effect>,<key>=<value>:<effect> \
      1
      
                              <machine_pool_id>
      1
      Replace <key>=<value>:<effect>,<key>=<value>:<effect> with a key, value, and effect for each taint, for example --taints=key1=value1:NoSchedule,key2=value2:NoExecute. Available effects include NoSchedule, PreferNoSchedule, and NoExecute.This list overwrites any modifications made to node taints on an ongoing basis.

      The following example adds taints to the db-nodes-mp machine pool:

      $ rosa edit machinepool --cluster=mycluster --replicas 2 --taints=key1=value1:NoSchedule,key2=value2:NoExecute db-nodes-mp

      Example output

      I: Updated machine pool 'db-nodes-mp' on cluster 'mycluster'

Verification

  1. Describe the details of the machine pool with the new taints:

    $ rosa describe machinepool --cluster=<cluster_name> --machinepool=<machinepool_name>

    Example output

    ID:                         db-nodes-mp
    Cluster ID:                 <ID_of_cluster>
    Autoscaling:                No
    Replicas:                   2
    Instance type:              m5.xlarge
    Labels:
    Taints:                     key1=value1:NoSchedule, key2=value2:NoExecute
    Availability zones:         us-east-1a
    Subnets:
    Spot instances:             No
    Disk size:                  300 GiB
    Security Group IDs:

  2. Verify that the taints are included for your machine pool in the output.

4.2.8. Additional resources

4.3. Configuring machine pools in Local Zones

This document describes how to configure Local Zones in machine pools with Red Hat OpenShift Service on AWS classic architecture.

4.3.1. Configuring machine pools in Local Zones

Use the following steps to configure machine pools in Local Zones.

Important

AWS Local Zones are supported on Red Hat OpenShift Service on AWS classic architecture 4.12. See the Red Hat Knowledgebase article for information on how to enable Local Zones.

Prerequisites

  • Red Hat OpenShift Service on AWS classic architecture is generally available in the parent region of choice. See the AWS generally available locations list to determine the Local Zone available to specific AWS regions.
  • The Red Hat OpenShift Service on AWS classic architecture cluster was initially built in an existing Amazon VPC (BYO-VPC).
  • The maximum transmission unit (MTU) for the Red Hat OpenShift Service on AWS classic architecture cluster is set at 1200.

    Important

    Generally, the Maximum Transmission Unit (MTU) between an Amazon EC2 instance in a Local Zone and an Amazon EC2 instance in the Region is 1300. See How Local Zones work in the AWS documentation. The cluster network MTU must always be less than the EC2 MTU to account for the overhead. The specific overhead is determined by your network plugin, for example: - OVN-Kubernetes: 100 bytes - OpenShift SDN: 50 bytes

    The network plugin could provide additional features that may also decrease the MTU. Check the documentation for additional information.

  • The AWS account has Local Zones enabled.
  • The AWS account has a Local Zone subnet for the same VPC as the cluster.
  • The AWS account has a subnet that is associated with a routing table that has a route to a NAT gateway.
  • The AWS account has the tag `kubernetes.io/cluster/<infra_id>: shared' on the associated subnet.

Procedure

  1. Create a machine pool on the cluster by running the following ROSA CLI (rosa) command.

    $ rosa create machinepool -c <cluster-name> -i
  2. Add the subnet and instance type for the machine pool in the ROSA CLI. After several minutes, the cluster will provision the nodes.

    I: Enabling interactive mode 
    1
    
    ? Machine pool name: xx-lz-xx 
    2
    
    ? Create multi-AZ machine pool: No 
    3
    
    ? Select subnet for a single AZ machine pool (optional): Yes 
    4
    
    ? Subnet ID: subnet-<a> (region-info) 
    5
    
    ? Enable autoscaling (optional): No 
    6
    
    ? Replicas: 2 
    7
    
    I: Fetching instance types 
    8
    
    ? disk-size (optional): 
    9
    1
    Enables interactive mode.
    2
    Names the machine pool. This is limited to alphanumeric and a maximum length of 30 characters.
    3
    Set this option to no.
    4
    Set this option to yes.
    5
    Selects a subnet ID from the list.
    6
    Select yes to enable autoscaling or no to disable autoscaling.
    7
    Selects the number of machines for the machine pool. This number can be anywhere from 1 - 180.
    8
    Selects an instance type from the list. Only instance types that are supported in the selected Local Zone will appear.
    9
    Optional: Specifies the worker node disk size. The value can be in GB, GiB, TB, or TiB. Set a numeric value and unit, for example '200GiB'. You cannot separate the digit and the unit. No spaces are allowed.
  3. Provide the subnet ID to provision the machine pool in the Local Zone.

See the AWS Local Zones locations list on AWS for generally available and announced AWS Local Zone locations.

4.4. About autoscaling nodes on a cluster

The autoscaler option can be configured to automatically scale the number of machines in a machine pool.

The cluster autoscaler increases the size of the machine pool when there are pods that failed to schedule on any of the current nodes due to insufficient resources or when another node is necessary to meet deployment needs. The cluster autoscaler does not increase the cluster resources beyond the limits that you specify.

Additionally, the cluster autoscaler decreases the size of the machine pool when some nodes are consistently not needed for a significant period, such as when it has low resource use and all of its important pods can fit on other nodes.

When you enable autoscaling, you must also set a minimum and maximum number of worker nodes.

Note

Only cluster owners and organization admins can scale or delete a cluster.

4.4.1. Enabling autoscaling nodes on a cluster

You can enable autoscaling on worker nodes to increase or decrease the number of nodes available by editing the machine pool definition for an existing cluster.

Enable autoscaling for worker nodes in the machine pool definition from OpenShift Cluster Manager console.

Procedure

  1. From OpenShift Cluster Manager, navigate to the Cluster List page and select the cluster that you want to enable autoscaling for.
  2. On the selected cluster, select the Machine pools tab.
  3. Click the Options menu kebab at the end of the machine pool that you want to enable autoscaling for and select Edit.
  4. On the Edit machine pool dialog, select the Enable autoscaling checkbox.
  5. Select Save to save these changes and enable autoscaling for the machine pool.
Note

Additionally, you can configure autoscaling on the default machine pool when you create the cluster using interactive mode.

Configure autoscaling to dynamically scale the number of worker nodes up or down based on load.

Successful autoscaling is dependent on having the correct AWS resource quotas in your AWS account. Verify resource quotas and request quota increases from the AWS console.

Procedure

  1. To identify the machine pool IDs in a cluster, enter the following command:

    $ rosa list machinepools --cluster=<cluster_name>

    Example output

    ID      AUTOSCALING  REPLICAS  INSTANCE TYPE  LABELS    TAINTS    AVAILABILITY ZONES    SUBNETS    SPOT INSTANCES  DISK SIZE  SG IDs
    worker  No           2         m5.xlarge                          us-east-2a                       No              300 GiB
    mp1     No           2         m5.xlarge                          us-east-2a                       No              300 GiB

  2. Get the ID of the machine pools that you want to configure.
  3. To enable autoscaling on a machine pool, enter the following command:

    $ rosa edit machinepool --cluster=<cluster_name> <machinepool_ID> --enable-autoscaling --min-replicas=<number> --max-replicas=<number>

    Example

    Enable autoscaling on a machine pool with the ID mp1 on a cluster named mycluster, with the number of replicas set to scale between 2 and 5 worker nodes:

    $ rosa edit machinepool --cluster=mycluster mp1 --enable-autoscaling --min-replicas=2 --max-replicas=5

4.4.2. Disabling autoscaling nodes on a cluster

You can disable autoscaling on worker nodes to increase or decrease the number of nodes available by editing the machine pool definition for an existing cluster.

You can disable autoscaling on a cluster using Red Hat OpenShift Cluster Manager or the ROSA command-line interface (CLI) (rosa).

Note

Additionally, you can configure autoscaling on the default machine pool when you create the cluster using interactive mode.

Disable autoscaling for worker nodes in the machine pool definition from OpenShift Cluster Manager.

Procedure

  1. From OpenShift Cluster Manager, navigate to the Cluster List page and select the cluster with autoscaling that must be disabled.
  2. On the selected cluster, select the Machine pools tab.
  3. Click the Options menu kebab at the end of the machine pool with autoscaling and select Edit.
  4. On the Edit machine pool dialog, deselect the Enable autoscaling checkbox.
  5. Select Save to save these changes and disable autoscaling from the machine pool.

Disable autoscaling for worker nodes in the machine pool definition using the ROSA command-line interface (CLI) (rosa).

Procedure

  • Enter the following command:

    $ rosa edit machinepool --cluster=<cluster_name> <machinepool_ID> --enable-autoscaling=false --replicas=<number>

    Example

    Disable autoscaling on the default machine pool on a cluster named mycluster:

    $ rosa edit machinepool --cluster=mycluster default --enable-autoscaling=false --replicas=3

As a cluster administrator, you can help your clusters operate efficiently through managing application memory by:

  • Determining the memory and risk requirements of a containerized application component and configuring the container memory parameters to suit those requirements.
  • Configuring containerized application runtimes (for example, OpenJDK) to adhere optimally to the configured container memory parameters.
  • Diagnosing and resolving memory-related error conditions associated with running in a container.

You can review the following concepts to learn how Red Hat OpenShift Service on AWS classic architecture manages compute resources so that you can lean how to keep your cluster running efficiently.

For each kind of resource (memory, CPU, storage), Red Hat OpenShift Service on AWS classic architecture allows optional request and limit values to be placed on each container in a pod.

Note the following information about memory requests and memory limits:

  • Memory request

    • The memory request value, if specified, influences the Red Hat OpenShift Service on AWS classic architecture scheduler. The scheduler considers the memory request when scheduling a container to a node, then fences off the requested memory on the chosen node for the use of the container.
    • If a node’s memory is exhausted, Red Hat OpenShift Service on AWS classic architecture prioritizes evicting its containers whose memory usage most exceeds their memory request. In serious cases of memory exhaustion, the node OOM killer might select and kill a process in a container based on a similar metric.
    • The cluster administrator can assign quota or assign default values for the memory request value.
    • The cluster administrator can override the memory request values that a developer specifies, to manage cluster overcommit.
  • Memory limit

    • The memory limit value, if specified, provides a hard limit on the memory that can be allocated across all the processes in a container.
    • If the memory allocated by all of the processes in a container exceeds the memory limit, the node Out of Memory (OOM) killer immediately selects and kills a process in the container.
    • If both memory request and limit are specified, the memory limit value must be greater than or equal to the memory request.
    • The cluster administrator can assign quota or assign default values for the memory limit value.
    • The minimum memory limit is 12 MB. If a container fails to start due to a Cannot allocate memory pod event, the memory limit is too low. Either increase or remove the memory limit. Removing the limit allows pods to consume unbounded node resources.

The steps for sizing application memory on Red Hat OpenShift Service on AWS classic architecture are as follows:

  1. Determine expected container memory usage

    Determine expected mean and peak container memory usage. For example, you could perform separate load testing. Remember to consider all the processes that could potentially run in parallel in the container, such as any ancillary scripts that might be spawned by the main application.

  2. Determine risk appetite

    Determine risk appetite for eviction. If the risk appetite is low, the container should request memory according to the expected peak usage plus a percentage safety margin. If the risk appetite is higher, it might be more appropriate to request memory according to the expected mean usage.

  3. Set container memory request

    Set the container memory request based on the above. The request should represent the application memory usage as accurately as possible. If the request is too high, cluster and quota usage will be inefficient. If the request is too low, the chances of application eviction increase.

  4. Set container memory limit, if required

    Set the container memory limit, if required. Setting a limit has the effect of immediately killing a container process if the combined memory usage of all processes in the container exceeds the limit. Setting a limit might make unanticipated excess memory usage obvious early (fail fast). However, setting a limit also terminates processes abruptly.

    Note that some Red Hat OpenShift Service on AWS classic architecture clusters might require a limit value to be set; some might override the request based on the limit; and some application images rely on a limit value being set as this is easier to detect than a request value.

    If the memory limit is set, it should not be set to less than the expected peak container memory usage plus a percentage safety margin.

  5. Ensure applications are tuned

    Ensure your applications are tuned with respect to configured request and limit values, if appropriate. This step is particularly relevant to applications which pool memory, such as the JVM. The rest of this page discusses this.

You can review the following concepts to learn about how to deploy OpenJDK applications in your cluster effectively.

The default OpenJDK settings do not work well with containerized environments. As a result, some additional Java memory settings must always be provided whenever running the OpenJDK in a container.

The JVM memory layout is complex, version dependent, and describing it in detail is beyond the scope of this documentation. However, as a starting point for running OpenJDK in a container, at least the following three memory-related tasks are key:

Overriding the JVM maximum heap size

OpenJDK defaults to using a maximum of 25% of available memory (recognizing any container memory limits in place) for heap memory. This default value is conservative, and, in a properly-configured container environment, would result in 75% of the memory assigned to a container being mostly unused. A much higher percentage for the JVM to use for heap memory, such as 80%, is more suitable in a container context where memory limits are imposed on the container level.

Most of the Red Hat containers include a startup script that replaces the OpenJDK default by setting updated values when the JVM launches.

For example, the Red Hat build of OpenJDK containers have a default value of 80%. This value can be set to a different percentage by defining the JAVA_MAX_RAM_RATIO environment variable.

For other OpenJDK deployements, the default value of 25% can be changed using the following command:

Example

$ java -XX:MaxRAMPercentage=80.0

Encouraging the JVM to release unused memory to the operating system, if appropriate

By default, the OpenJDK does not aggressively return unused memory to the operating system. This could be appropriate for many containerized Java workloads, but notable exceptions include workloads where additional active processes co-exist with a JVM within a container, whether those additional processes are native, additional JVMs, or a combination of the two.

Java-based agents can use the following JVM arguments to encourage the JVM to release unused memory to the operating system:

-XX:+UseParallelGC
-XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4
-XX:AdaptiveSizePolicyWeight=90

These arguments are intended to return heap memory to the operating system whenever allocated memory exceeds 110% of in-use memory (-XX:MaxHeapFreeRatio), spending up to 20% of CPU time in the garbage collector (-XX:GCTimeRatio). At no time will the application heap allocation be less than the initial heap allocation (overridden by -XX:InitialHeapSize / -Xms). Detailed additional information is available Tuning Java’s footprint in OpenShift (Part 1), Tuning Java’s footprint in OpenShift (Part 2), and at OpenJDK and Containers.

Ensuring all JVM processes within a container are appropriately configured

In the case that multiple JVMs run in the same container, it is essential to ensure that they are all configured appropriately. For many workloads it will be necessary to grant each JVM a percentage memory budget, leaving a perhaps substantial additional safety margin.

Many Java tools use different environment variables (JAVA_OPTS, GRADLE_OPTS, and so on) to configure their JVMs and it can be challenging to ensure that the right settings are being passed to the right JVM.

The JAVA_TOOL_OPTIONS environment variable is always respected by the OpenJDK, and values specified in JAVA_TOOL_OPTIONS will be overridden by other options specified on the JVM command line. By default, to ensure that these options are used by default for all JVM workloads run in the Java-based agent image, the Red Hat OpenShift Service on AWS classic architecture Jenkins Maven agent image sets the following variable:

JAVA_TOOL_OPTIONS="-Dsun.zip.disableMemoryMapping=true"

This does not guarantee that additional options are not required, but is intended to be a helpful starting point. Optimally tuning JVM workloads for running in a container is beyond the scope of this documentation, and may involve setting multiple additional JVM options.

You can configure your container to use the Downward API to dynamically discover its memory request and limit from within a pod. This allows your applications to better manage these resources without needing to use the API server.

Procedure

  • Configure the pod to add the MEMORY_REQUEST and MEMORY_LIMIT stanzas:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        name: test
      spec:
        securityContext:
          runAsNonRoot: false
          seccompProfile:
            type: RuntimeDefault
        containers:
        - name: test
          image: fedora:latest
          command:
          - sleep
          - "3600"
          env:
          - name: MEMORY_REQUEST
            valueFrom:
              resourceFieldRef:
                containerName: test
                resource: requests.memory
          - name: MEMORY_LIMIT
            valueFrom:
              resourceFieldRef:
                containerName: test
                resource: limits.memory
          resources:
            requests:
              memory: 384Mi
            limits:
              memory: 512Mi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: [ALL]

      where:

      spec.consinters.env.name.MEMORY_REQUEST
      This stanza discovers the application memory request value.
      spec.consinters.env.name.MEMORY_LIMIT
      This stanza discovers the application memory limit value.
    2. Create the pod by running the following command:

      $ oc create -f <file_name>.yaml

Verification

  1. Access the pod using a remote shell:

    $ oc rsh test
  2. Check that the requested values were applied:

    $ env | grep MEMORY | sort

    Example output

    MEMORY_LIMIT=536870912
    MEMORY_REQUEST=402653184

Note

The memory limit value can also be read from inside the container by the /sys/fs/cgroup/memory/memory.limit_in_bytes file.

4.5.4. Understanding OOM kill policy

Red Hat OpenShift Service on AWS classic architecture can kill a process in a container if the total memory usage of all the processes in the container exceeds the memory limit, or in serious cases of node memory exhaustion.

If a process is Out of Memory (OOM) killed, the container could exit immediately. If the container PID 1 process receives the SIGKILL, the container does exit immediately. Otherwise, the container behavior is dependent on the behavior of the other processes.

For example, a container process exited with code 137, indicating it received a SIGKILL signal.

If the container does not exit immediately, use the following stepts to detect if an OOM kill occurred.

Procedure

  1. Access the pod using a remote shell:

    # oc rsh <pod name>
  2. Run the following command to see the current OOM kill count in /sys/fs/cgroup/memory/memory.oom_control:

    $ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control

    Example output

    oom_kill 0

  3. Run the following command to provoke an OOM kill:

    $ sed -e '' </dev/zero

    Example output

    Killed

  4. Run the following command to see that the OOM kill counter in /sys/fs/cgroup/memory/memory.oom_control incremented:

    $ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control

    Example output

    oom_kill 1

    If one or more processes in a pod are OOM killed, when the pod subsequently exits, whether immediately or not, it will have phase Failed and reason OOMKilled. An OOM-killed pod might be restarted depending on the value of restartPolicy. If not restarted, controllers such as the replication controller will notice the pod’s failed status and create a new pod to replace the old one.

    Use the following command to get the pod status:

    $ oc get pod test

    Example output

    NAME      READY     STATUS      RESTARTS   AGE
    test      0/1       OOMKilled   0          1m

    • If the pod has not restarted, run the following command to view the pod:

      $ oc get pod test -o yaml

      Example output

      apiVersion: v1
      kind: Pod
      metadata:
        name: test
      # ...
      status:
        containerStatuses:
        - name: test
          ready: false
          restartCount: 0
          state:
            terminated:
              exitCode: 137
              reason: OOMKilled
        phase: Failed

    • If restarted, run the following command to view the pod:

      $ oc get pod test -o yaml

      Example output

      apiVersion: v1
      kind: Pod
      metadata:
        name: test
      # ...
      status:
        containerStatuses:
        - name: test
          ready: true
          restartCount: 1
          lastState:
            terminated:
              exitCode: 137
              reason: OOMKilled
          state:
            running:
        phase: Running

4.5.5. Understanding pod eviction

You can review the following concepts to learn the Red Hat OpenShift Service on AWS classic architecture pod eviction policy.

Red Hat OpenShift Service on AWS classic architecture can evict a pod from its node when the node’s memory is exhausted. Depending on the extent of memory exhaustion, the eviction might or might not be graceful. Graceful eviction implies the main process (PID 1) of each container receiving a SIGTERM signal, then some time later a SIGKILL signal if the process has not exited already. Non-graceful eviction implies the main process of each container immediately receiving a SIGKILL signal.

An evicted pod has phase Failed and reason Evicted. It is not restarted, regardless of the value of restartPolicy. However, controllers such as the replication controller will notice the pod’s failed status and create a new pod to replace the old one.

$ oc get pod test

Example output

NAME      READY     STATUS    RESTARTS   AGE
test      0/1       Evicted   0          1m

$ oc get pod test -o yaml

Example output

apiVersion: v1
kind: Pod
metadata:
  name: test
...
status:
  message: 'Pod The node was low on resource: [MemoryPressure].'
  phase: Failed
  reason: Evicted

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top