Postinstallation configuration


OpenShift Container Platform 4.17

Day 2 operations for OpenShift Container Platform

Red Hat OpenShift Documentation Team

Abstract

This document provides instructions and guidance on post installation activities for OpenShift Container Platform.

Chapter 1. Postinstallation configuration overview

After installing OpenShift Container Platform, a cluster administrator can configure and customize the following components:

  • Machine
  • Bare metal
  • Cluster
  • Node
  • Network
  • Storage
  • Users
  • Alerts and notifications

1.1. Post-installation configuration tasks

You can perform the post-installation configuration tasks to configure your environment to meet your need.

The following lists details these configurations:

  • Configure operating system features: The Machine Config Operator (MCO) manages MachineConfig objects. By using the MCO, you can configure nodes and custom resources.
  • Configure cluster features. You can modify the following features of an OpenShift Container Platform cluster:

    • Image registry
    • Networking configuration
    • Image build behavior
    • Identity provider
    • The etcd configuration
    • Machine set creation to handle the workloads
    • Cloud provider credential management
  • Configuring a private cluster: By default, the installation program provisions OpenShift Container Platform by using a publicly accessible DNS and endpoints. To make your cluster accessible only from within an internal network, configure the following components to make them private:

    • DNS
    • Ingress Controller
    • API server
  • Perform node operations: By default, OpenShift Container Platform uses Red Hat Enterprise Linux CoreOS (RHCOS) compute machines. You can perform the following node operations:

    • Add and remove compute machines.
    • Add and remove taints and tolerations.
    • Configure the maximum number of pods per node.
    • Enable Device Manager.
  • Configure users: OAuth access tokens allow users to authenticate themselves to the API. You can configure OAuth to perform the following tasks:
  • Specify an identity provider
  • Use role-based access control to define and supply permissions to users
  • Install an Operator from OperatorHub
  • Configuring alert notifications: By default, firing alerts are displayed on the Alerting UI of the web console. You can also configure OpenShift Container Platform to send alert notifications to external systems.

Chapter 2. Configuring a private cluster

After you install an OpenShift Container Platform version 4.17 cluster, you can set some of its core components to be private.

2.1. About private clusters

By default, OpenShift Container Platform is provisioned using publicly-accessible DNS and endpoints. You can set the DNS, Ingress Controller, and API server to private after you deploy your private cluster.

Important

If the cluster has any public subnets, load balancer services created by administrators might be publicly accessible. To ensure cluster security, verify that these services are explicitly annotated as private.

DNS

If you install OpenShift Container Platform on installer-provisioned infrastructure, the installation program creates records in a pre-existing public zone and, where possible, creates a private zone for the cluster’s own DNS resolution. In both the public zone and the private zone, the installation program or cluster creates DNS entries for *.apps, for the Ingress object, and api, for the API server.

The *.apps records in the public and private zone are identical, so when you delete the public zone, the private zone seamlessly provides all DNS resolution for the cluster.

Ingress Controller

Because the default Ingress object is created as public, the load balancer is internet-facing and in the public subnets.

The Ingress Operator generates a default certificate for an Ingress Controller to serve as a placeholder until you configure a custom default certificate. Do not use Operator-generated default certificates in production clusters. The Ingress Operator does not rotate its own signing certificate or the default certificates that it generates. Operator-generated default certificates are intended as placeholders for custom default certificates that you configure.

API server

By default, the installation program creates appropriate network load balancers for the API server to use for both internal and external traffic.

On Amazon Web Services (AWS), separate public and private load balancers are created. The load balancers are identical except that an additional port is available on the internal one for use within the cluster. Although the installation program automatically creates or destroys the load balancer based on API server requirements, the cluster does not manage or maintain them. As long as you preserve the cluster’s access to the API server, you can manually modify or move the load balancers. For the public load balancer, port 6443 is open and the health check is configured for HTTPS against the /readyz path.

On Google Cloud Platform, a single load balancer is created to manage both internal and external API traffic, so you do not need to modify the load balancer.

On Microsoft Azure, both public and private load balancers are created. However, because of limitations in current implementation, you just retain both load balancers in a private cluster.

2.2. Configuring DNS records to be published in a private zone

For all OpenShift Container Platform clusters, whether public or private, DNS records are published in a public zone by default.

You can remove the public zone from the cluster DNS configuration to avoid exposing DNS records to the public. You might want to avoid exposing sensitive information, such as internal domain names, internal IP addresses, or the number of clusters at an organization, or you might simply have no need to publish records publicly. If all the clients that should be able to connect to services within the cluster use a private DNS service that has the DNS records from the private zone, then there is no need to have a public DNS record for the cluster.

After you deploy a cluster, you can modify its DNS to use only a private zone by modifying the DNS custom resource (CR). Modifying the DNS CR in this way means that any DNS records that are subsequently created are not published to public DNS servers, which keeps knowledge of the DNS records isolated to internal users. This can be done when you configure the cluster to be private, or if you never want DNS records to be publicly resolvable.

Alternatively, even in a private cluster, you might keep the public zone for DNS records because it allows clients to resolve DNS names for applications running on that cluster. For example, an organization can have machines that connect to the public internet and then establish VPN connections for certain private IP ranges in order to connect to private IP addresses. The DNS lookups from these machines use the public DNS to determine the private addresses of those services, and then connect to the private addresses over the VPN.

Procedure

  1. Review the DNS CR for your cluster by running the following command and observing the output:

    $ oc get dnses.config.openshift.io/cluster -o yaml

    Example output

    apiVersion: config.openshift.io/v1
    kind: DNS
    metadata:
      creationTimestamp: "2019-10-25T18:27:09Z"
      generation: 2
      name: cluster
      resourceVersion: "37966"
      selfLink: /apis/config.openshift.io/v1/dnses/cluster
      uid: 0e714746-f755-11f9-9cb1-02ff55d8f976
    spec:
      baseDomain: <base_domain>
      privateZone:
        tags:
          Name: <infrastructure_id>-int
          kubernetes.io/cluster/<infrastructure_id>: owned
      publicZone:
        id: Z2XXXXXXXXXXA4
    status: {}

    Note that the spec section contains both a private and a public zone.

  2. Patch the DNS CR to remove the public zone by running the following command:

    $ oc patch dnses.config.openshift.io/cluster --type=merge --patch='{"spec": {"publicZone": null}}'

    Example output

    dns.config.openshift.io/cluster patched

    The Ingress Operator consults the DNS CR definition when it creates DNS records for IngressController objects. If only private zones are specified, only private records are created.

    Important

    Existing DNS records are not modified when you remove the public zone. You must manually delete previously published public DNS records if you no longer want them to be published publicly.

Verification

  • Review the DNS CR for your cluster and confirm that the public zone was removed, by running the following command and observing the output:

    $ oc get dnses.config.openshift.io/cluster -o yaml

    Example output

    apiVersion: config.openshift.io/v1
    kind: DNS
    metadata:
      creationTimestamp: "2019-10-25T18:27:09Z"
      generation: 2
      name: cluster
      resourceVersion: "37966"
      selfLink: /apis/config.openshift.io/v1/dnses/cluster
      uid: 0e714746-f755-11f9-9cb1-02ff55d8f976
    spec:
      baseDomain: <base_domain>
      privateZone:
        tags:
          Name: <infrastructure_id>-int
          kubernetes.io/cluster/<infrastructure_id>-wfpg4: owned
    status: {}

2.3. Setting the Ingress Controller to private

After you deploy a cluster, you can modify its Ingress Controller to use only a private zone.

Procedure

  1. Modify the default Ingress Controller to use only an internal endpoint:

    $ oc replace --force --wait --filename - <<EOF
    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      namespace: openshift-ingress-operator
      name: default
    spec:
      endpointPublishingStrategy:
        type: LoadBalancerService
        loadBalancer:
          scope: Internal
    EOF

    Example output

    ingresscontroller.operator.openshift.io "default" deleted
    ingresscontroller.operator.openshift.io/default replaced

    The public DNS entry is removed, and the private zone entry is updated.

2.4. Restricting the API server to private

After you deploy a cluster to Amazon Web Services (AWS) or Microsoft Azure, you can reconfigure the API server to use only the private zone.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Have access to the web console as a user with admin privileges.

Procedure

  1. In the web portal or console for your cloud provider, take the following actions:

    1. Locate and delete the appropriate load balancer component:

      • For AWS, delete the external load balancer. The API DNS entry in the private zone already points to the internal load balancer, which uses an identical configuration, so you do not need to modify the internal load balancer.
      • For Azure, delete the api-internal-v4 rule for the public load balancer.
    2. For Azure, configure the Ingress Controller endpoint publishing scope to Internal. For more information, see "Configuring the Ingress Controller endpoint publishing scope to Internal".
    3. For the Azure public load balancer, if you configure the Ingress Controller endpoint publishing scope to Internal and there are no existing inbound rules in the public load balancer, you must create an outbound rule explicitly to provide outbound traffic for the backend address pool. For more information, see the Microsoft Azure documentation about adding outbound rules.
    4. Delete the api.$clustername.$yourdomain or api.$clustername DNS entry in the public zone.
  2. AWS clusters: Remove the external load balancers:

    Important

    You can run the following steps only for an installer-provisioned infrastructure (IPI) cluster. For a user-provisioned infrastructure (UPI) cluster, you must manually remove or disable the external load balancers.

    • If your cluster uses a control plane machine set, delete the lines in the control plane machine set custom resource that configure your public or external load balancer:

      # ...
      providerSpec:
        value:
      # ...
          loadBalancers:
          - name: lk4pj-ext 1
            type: network 2
          - name: lk4pj-int
            type: network
      # ...
      1
      Delete the name value for the external load balancer, which ends in -ext.
      2
      Delete the type value for the external load balancer.
    • If your cluster does not use a control plane machine set, you must delete the external load balancers from each control plane machine.

      1. From your terminal, list the cluster machines by running the following command:

        $ oc get machine -n openshift-machine-api

        Example output

        NAME                            STATE     TYPE        REGION      ZONE         AGE
        lk4pj-master-0                  running   m4.xlarge   us-east-1   us-east-1a   17m
        lk4pj-master-1                  running   m4.xlarge   us-east-1   us-east-1b   17m
        lk4pj-master-2                  running   m4.xlarge   us-east-1   us-east-1a   17m
        lk4pj-worker-us-east-1a-5fzfj   running   m4.xlarge   us-east-1   us-east-1a   15m
        lk4pj-worker-us-east-1a-vbghs   running   m4.xlarge   us-east-1   us-east-1a   15m
        lk4pj-worker-us-east-1b-zgpzg   running   m4.xlarge   us-east-1   us-east-1b   15m

        The control plane machines contain master in the name.

      2. Remove the external load balancer from each control plane machine:

        1. Edit a control plane machine object to by running the following command:

          $ oc edit machines -n openshift-machine-api <control_plane_name> 1
          1
          Specify the name of the control plane machine object to modify.
        2. Remove the lines that describe the external load balancer, which are marked in the following example:

          # ...
          providerSpec:
            value:
          # ...
              loadBalancers:
              - name: lk4pj-ext 1
                type: network 2
              - name: lk4pj-int
                type: network
          # ...
          1
          Delete the name value for the external load balancer, which ends in -ext.
          2
          Delete the type value for the external load balancer.
        3. Save your changes and exit the object specification.
        4. Repeat this process for each of the control plane machines.

2.5. Configuring a private storage endpoint on Azure

You can leverage the Image Registry Operator to use private endpoints on Azure, which enables seamless configuration of private storage accounts when OpenShift Container Platform is deployed on private Azure clusters. This allows you to deploy the image registry without exposing public-facing storage endpoints.

Important

Do not configure a private storage endpoint on Microsoft Azure Red Hat OpenShift (ARO), because the endpoint can put your Microsoft Azure Red Hat OpenShift cluster in an unrecoverable state.

You can configure the Image Registry Operator to use private storage endpoints on Azure in one of two ways:

  • By configuring the Image Registry Operator to discover the VNet and subnet names
  • With user-provided Azure Virtual Network (VNet) and subnet names

2.5.1. Limitations for configuring a private storage endpoint on Azure

The following limitations apply when configuring a private storage endpoint on Azure:

  • When configuring the Image Registry Operator to use a private storage endpoint, public network access to the storage account is disabled. Consequently, pulling images from the registry outside of OpenShift Container Platform only works by setting disableRedirect: true in the registry Operator configuration. With redirect enabled, the registry redirects the client to pull images directly from the storage account, which will no longer work due to disabled public network access. For more information, see "Disabling redirect when using a private storage endpoint on Azure".
  • This operation cannot be undone by the Image Registry Operator.

2.5.2. Configuring a private storage endpoint on Azure by enabling the Image Registry Operator to discover VNet and subnet names

The following procedure shows you how to set up a private storage endpoint on Azure by configuring the Image Registry Operator to discover VNet and subnet names.

Prerequisites

  • You have configured the image registry to run on Azure.
  • Your network has been set up using the Installer Provisioned Infrastructure installation method.

    For users with a custom network setup, see "Configuring a private storage endpoint on Azure with user-provided VNet and subnet names".

Procedure

  1. Edit the Image Registry Operator config object and set networkAccess.type to Internal:

    $ oc edit configs.imageregistry/cluster
    # ...
    spec:
      # ...
       storage:
          azure:
            # ...
            networkAccess:
              type: Internal
    # ...
  2. Optional: Enter the following command to confirm that the Operator has completed provisioning. This might take a few minutes.

    $ oc get configs.imageregistry/cluster -o=jsonpath="{.spec.storage.azure.privateEndpointName}" -w
  3. Optional: If the registry is exposed by a route, and you are configuring your storage account to be private, you must disable redirect if you want pulls external to the cluster to continue to work. Enter the following command to disable redirect on the Image Operator configuration:

    $ oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"disableRedirect": true}}'
    Note

    When redirect is enabled, pulling images from outside of the cluster will not work.

Verification

  1. Fetch the registry service name by running the following command:

    $ oc get imagestream -n openshift

    Example output

    NAME   IMAGE REPOSITORY                                                 TAGS     UPDATED
    cli    image-registry.openshift-image-registry.svc:5000/openshift/cli   latest   8 hours ago
    ...

  2. Enter debug mode by running the following command:

    $ oc debug node/<node_name>
  3. Run the suggested chroot command. For example:

    $ chroot /host
  4. Enter the following command to log in to your container registry:

    $ podman login --tls-verify=false -u unused -p $(oc whoami -t) image-registry.openshift-image-registry.svc:5000

    Example output

    Login Succeeded!

  5. Enter the following command to verify that you can pull an image from the registry:

    $ podman pull --tls-verify=false image-registry.openshift-image-registry.svc:5000/openshift/tools

    Example output

    Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/tools/openshift/tools...
    Getting image source signatures
    Copying blob 6b245f040973 done
    Copying config 22667f5368 done
    Writing manifest to image destination
    Storing signatures
    22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9

2.5.3. Configuring a private storage endpoint on Azure with user-provided VNet and subnet names

Use the following procedure to configure a storage account that has public network access disabled and is exposed behind a private storage endpoint on Azure.

Prerequisites

  • You have configured the image registry to run on Azure.
  • You must know the VNet and subnet names used for your Azure environment.
  • If your network was configured in a separate resource group in Azure, you must also know its name.

Procedure

  1. Edit the Image Registry Operator config object and configure the private endpoint using your VNet and subnet names:

    $ oc edit configs.imageregistry/cluster
    # ...
    spec:
      # ...
       storage:
          azure:
            # ...
            networkAccess:
              type: Internal
              internal:
                subnetName: <subnet_name>
                vnetName: <vnet_name>
                networkResourceGroupName: <network_resource_group_name>
    # ...
  2. Optional: Enter the following command to confirm that the Operator has completed provisioning. This might take a few minutes.

    $ oc get configs.imageregistry/cluster -o=jsonpath="{.spec.storage.azure.privateEndpointName}" -w
    Note

    When redirect is enabled, pulling images from outside of the cluster will not work.

Verification

  1. Fetch the registry service name by running the following command:

    $ oc get imagestream -n openshift

    Example output

    NAME   IMAGE REPOSITORY                                                 TAGS     UPDATED
    cli    image-registry.openshift-image-registry.svc:5000/openshift/cli   latest   8 hours ago
    ...

  2. Enter debug mode by running the following command:

    $ oc debug node/<node_name>
  3. Run the suggested chroot command. For example:

    $ chroot /host
  4. Enter the following command to log in to your container registry:

    $ podman login --tls-verify=false -u unused -p $(oc whoami -t) image-registry.openshift-image-registry.svc:5000

    Example output

    Login Succeeded!

  5. Enter the following command to verify that you can pull an image from the registry:

    $ podman pull --tls-verify=false image-registry.openshift-image-registry.svc:5000/openshift/tools

    Example output

    Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/tools/openshift/tools...
    Getting image source signatures
    Copying blob 6b245f040973 done
    Copying config 22667f5368 done
    Writing manifest to image destination
    Storing signatures
    22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9

2.5.4. Optional: Disabling redirect when using a private storage endpoint on Azure

By default, redirect is enabled when using the image registry. Redirect allows off-loading of traffic from the registry pods into the object storage, which makes pull faster. When redirect is enabled and the storage account is private, users from outside of the cluster are unable to pull images from the registry.

In some cases, users might want to disable redirect so that users from outside of the cluster can pull images from the registry.

Use the following procedure to disable redirect.

Prerequisites

  • You have configured the image registry to run on Azure.
  • You have configured a route.

Procedure

  • Enter the following command to disable redirect on the image registry configuration:

    $ oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"disableRedirect": true}}'

Verification

  1. Fetch the registry service name by running the following command:

    $ oc get imagestream -n openshift

    Example output

    NAME   IMAGE REPOSITORY                                           TAGS     UPDATED
    cli    default-route-openshift-image-registry.<cluster_dns>/cli   latest   8 hours ago
    ...

  2. Enter the following command to log in to your container registry:

    $ podman login --tls-verify=false -u unused -p $(oc whoami -t) default-route-openshift-image-registry.<cluster_dns>

    Example output

    Login Succeeded!

  3. Enter the following command to verify that you can pull an image from the registry:

    $ podman pull --tls-verify=false default-route-openshift-image-registry.<cluster_dns>
    /openshift/tools

    Example output

    Trying to pull default-route-openshift-image-registry.<cluster_dns>/openshift/tools...
    Getting image source signatures
    Copying blob 6b245f040973 done
    Copying config 22667f5368 done
    Writing manifest to image destination
    Storing signatures
    22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9

Chapter 3. Configuring multi-architecture compute machines on an OpenShift cluster

3.1. About clusters with multi-architecture compute machines

An OpenShift Container Platform cluster with multi-architecture compute machines is a cluster that supports compute machines with different architectures.

Note

When there are nodes with multiple architectures in your cluster, the architecture of your image must be consistent with the architecture of the node. You need to ensure that the pod is assigned to the node with the appropriate architecture and that it matches the image architecture. For more information on assigning pods to nodes, see Assigning pods to nodes.

Important

The Cluster Samples Operator is not supported on clusters with multi-architecture compute machines. Your cluster can be created without this capability. For more information, see Cluster capabilities.

For information on migrating your single-architecture cluster to a cluster that supports multi-architecture compute machines, see Migrating to a cluster with multi-architecture compute machines.

3.1.1. Configuring your cluster with multi-architecture compute machines

To create a cluster with multi-architecture compute machines with different installation options and platforms, you can use the documentation in the following table:

Table 3.1. Cluster with multi-architecture compute machine installation options
Documentation sectionPlatformUser-provisioned installationInstaller-provisioned installationControl PlaneCompute node

Creating a cluster with multi-architecture compute machines on Azure

Microsoft Azure

 

aarch64 or x86_64

aarch64, x86_64

Creating a cluster with multi-architecture compute machines on AWS

Amazon Web Services (AWS)

 

aarch64 or x86_64

aarch64, x86_64

Creating a cluster with multi-architecture compute machines on GCP

Google Cloud Platform (GCP)

 

aarch64 or x86_64

aarch64, x86_64

Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z

Bare metal

 

aarch64 or x86_64

aarch64, x86_64

IBM Power

 

x86_64 or ppc64le

x86_64, ppc64le

IBM Z

 

x86_64 or s390x

x86_64, s390x

Creating a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE with z/VM

IBM Z® and IBM® LinuxONE

 

x86_64

x86_64, s390x

Creating a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE with RHEL KVM

IBM Z® and IBM® LinuxONE

 

x86_64

x86_64, s390x

Creating a cluster with multi-architecture compute machines on IBM Power®

IBM Power®

 

x86_64

x86_64, ppc64le

Important

Autoscaling from zero is currently not supported on Google Cloud Platform (GCP).

3.2. Creating a cluster with multi-architecture compute machine on Azure

To deploy an Azure cluster with multi-architecture compute machines, you must first create a single-architecture Azure installer-provisioned cluster that uses the multi-architecture installer binary. For more information on Azure installations, see Installing a cluster on Azure with customizations.

You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.

After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.

3.2.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.2.2. Creating a 64-bit ARM boot image using the Azure image gallery

The following procedure describes how to manually generate a 64-bit ARM boot image.

Prerequisites

  • You installed the Azure CLI (az).
  • You created a single-architecture Azure installer-provisioned cluster with the multi-architecture installer binary.

Procedure

  1. Log in to your Azure account:

    $ az login
  2. Create a storage account and upload the aarch64 virtual hard disk (VHD) to your storage account. The OpenShift Container Platform installation program creates a resource group, however, the boot image can also be uploaded to a custom named resource group:

    $ az storage account create -n ${STORAGE_ACCOUNT_NAME} -g ${RESOURCE_GROUP} -l westus --sku Standard_LRS 1
    1
    The westus object is an example region.
  3. Create a storage container using the storage account you generated:

    $ az storage container create -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME}
  4. You must use the OpenShift Container Platform installation program JSON file to extract the URL and aarch64 VHD name:

    1. Extract the URL field and set it to RHCOS_VHD_ORIGIN_URL as the file name by running the following command:

      $ RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".url')
    2. Extract the aarch64 VHD name and set it to BLOB_NAME as the file name by running the following command:

      $ BLOB_NAME=rhcos-$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".release')-azure.aarch64.vhd
  5. Generate a shared access signature (SAS) token. Use this token to upload the RHCOS VHD to your storage container with the following commands:

    $ end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
    $ sas=`az storage container generate-sas -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME} --https-only --permissions dlrw --expiry $end -o tsv`
  6. Copy the RHCOS VHD into the storage container:

    $ az storage blob copy start --account-name ${STORAGE_ACCOUNT_NAME} --sas-token "$sas" \
     --source-uri "${RHCOS_VHD_ORIGIN_URL}" \
     --destination-blob "${BLOB_NAME}" --destination-container ${CONTAINER_NAME}

    You can check the status of the copying process with the following command:

    $ az storage blob show -c ${CONTAINER_NAME} -n ${BLOB_NAME} --account-name ${STORAGE_ACCOUNT_NAME} | jq .properties.copy

    Example output

    {
     "completionTime": null,
     "destinationSnapshot": null,
     "id": "1fd97630-03ca-489a-8c4e-cfe839c9627d",
     "incrementalCopy": null,
     "progress": "17179869696/17179869696",
     "source": "https://rhcos.blob.core.windows.net/imagebucket/rhcos-411.86.202207130959-0-azure.aarch64.vhd",
     "status": "success", 1
     "statusDescription": null
    }

    1
    If the status parameter displays the success object, the copying process is complete.
  7. Create an image gallery using the following command:

    $ az sig create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME}

    Use the image gallery to create an image definition. In the following example command, rhcos-arm64 is the name of the image definition.

    $ az sig image-definition create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --publisher RedHat --offer arm --sku arm64 --os-type linux --architecture Arm64 --hyper-v-generation V2
  8. To get the URL of the VHD and set it to RHCOS_VHD_URL as the file name, run the following command:

    $ RHCOS_VHD_URL=$(az storage blob url --account-name ${STORAGE_ACCOUNT_NAME} -c ${CONTAINER_NAME} -n "${BLOB_NAME}" -o tsv)
  9. Use the RHCOS_VHD_URL file, your storage account, resource group, and image gallery to create an image version. In the following example, 1.0.0 is the image version.

    $ az sig image-version create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --gallery-image-version 1.0.0 --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} --os-vhd-uri ${RHCOS_VHD_URL}
  10. Your arm64 boot image is now generated. You can access the ID of your image with the following command:

    $ az sig image-version show -r $GALLERY_NAME -g $RESOURCE_GROUP -i rhcos-arm64 -e 1.0.0

    The following example image ID is used in the recourseID parameter of the compute machine set:

    Example resourceID

    /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-arm64/versions/1.0.0

3.2.3. Creating a 64-bit x86 boot image using the Azure image gallery

The following procedure describes how to manually generate a 64-bit x86 boot image.

Prerequisites

  • You installed the Azure CLI (az).
  • You created a single-architecture Azure installer-provisioned cluster with the multi-architecture installer binary.

Procedure

  1. Log in to your Azure account by running the following command:

    $ az login
  2. Create a storage account and upload the x86_64 virtual hard disk (VHD) to your storage account by running the following command. The OpenShift Container Platform installation program creates a resource group. However, the boot image can also be uploaded to a custom named resource group:

    $ az storage account create -n ${STORAGE_ACCOUNT_NAME} -g ${RESOURCE_GROUP} -l westus --sku Standard_LRS 1
    1
    The westus object is an example region.
  3. Create a storage container using the storage account you generated by running the following command:

    $ az storage container create -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME}
  4. Use the OpenShift Container Platform installation program JSON file to extract the URL and x86_64 VHD name:

    1. Extract the URL field and set it to RHCOS_VHD_ORIGIN_URL as the file name by running the following command:

      $ RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.x86_64."rhel-coreos-extensions"."azure-disk".url')
    2. Extract the x86_64 VHD name and set it to BLOB_NAME as the file name by running the following command:

      $ BLOB_NAME=rhcos-$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.x86_64."rhel-coreos-extensions"."azure-disk".release')-azure.x86_64.vhd
  5. Generate a shared access signature (SAS) token. Use this token to upload the RHCOS VHD to your storage container by running the following commands:

    $ end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
    $ sas=`az storage container generate-sas -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME} --https-only --permissions dlrw --expiry $end -o tsv`
  6. Copy the RHCOS VHD into the storage container by running the following command:

    $ az storage blob copy start --account-name ${STORAGE_ACCOUNT_NAME} --sas-token "$sas" \
     --source-uri "${RHCOS_VHD_ORIGIN_URL}" \
     --destination-blob "${BLOB_NAME}" --destination-container ${CONTAINER_NAME}

    You can check the status of the copying process by running the following command:

    $ az storage blob show -c ${CONTAINER_NAME} -n ${BLOB_NAME} --account-name ${STORAGE_ACCOUNT_NAME} | jq .properties.copy

    Example output

    {
     "completionTime": null,
     "destinationSnapshot": null,
     "id": "1fd97630-03ca-489a-8c4e-cfe839c9627d",
     "incrementalCopy": null,
     "progress": "17179869696/17179869696",
     "source": "https://rhcos.blob.core.windows.net/imagebucket/rhcos-411.86.202207130959-0-azure.aarch64.vhd",
     "status": "success", 1
     "statusDescription": null
    }

    1
    If the status parameter displays the success object, the copying process is complete.
  7. Create an image gallery by running the following command:

    $ az sig create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME}
  8. Use the image gallery to create an image definition by running the following command:

    $ az sig image-definition create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-x86_64 --publisher RedHat --offer x86_64 --sku x86_64 --os-type linux --architecture x64 --hyper-v-generation V2

    In this example command, rhcos-x86_64 is the name of the image definition.

  9. To get the URL of the VHD and set it to RHCOS_VHD_URL as the file name, run the following command:

    $ RHCOS_VHD_URL=$(az storage blob url --account-name ${STORAGE_ACCOUNT_NAME} -c ${CONTAINER_NAME} -n "${BLOB_NAME}" -o tsv)
  10. Use the RHCOS_VHD_URL file, your storage account, resource group, and image gallery to create an image version by running the following command:

    $ az sig image-version create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --gallery-image-version 1.0.0 --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} --os-vhd-uri ${RHCOS_VHD_URL}

    In this example, 1.0.0 is the image version.

  11. Optional: Access the ID of the generated x86_64 boot image by running the following command:

    $ az sig image-version show -r $GALLERY_NAME -g $RESOURCE_GROUP -i rhcos-x86_64 -e 1.0.0

    The following example image ID is used in the recourseID parameter of the compute machine set:

    Example resourceID

    /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-x86_64/versions/1.0.0

3.2.4. Adding a multi-architecture compute machine set to your Azure cluster

After creating a multi-architecture cluster, you can add nodes with different architectures.

You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:

  • Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
  • Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.

To create a custom compute machine set on Azure, see "Creating a compute machine set on Azure".

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You created a 64-bit ARM or 64-bit x86 boot image.
  • You used the installation program to create a 64-bit ARM or 64-bit x86 single-architecture Azure cluster with the multi-architecture installer binary.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.

    Example MachineSet object for an Azure 64-bit ARM or 64-bit x86 compute node

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructure_id>
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
      name: <infrastructure_id>-machine-set-0
      namespace: openshift-machine-api
    spec:
      replicas: 2
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id>
          machine.openshift.io/cluster-api-machineset: <infrastructure_id>-machine-set-0
      template:
        metadata:
          labels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machine-role: worker
            machine.openshift.io/cluster-api-machine-type: worker
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-machine-set-0
        spec:
          lifecycleHooks: {}
          metadata: {}
          providerSpec:
            value:
              acceleratedNetworking: true
              apiVersion: machine.openshift.io/v1beta1
              credentialsSecret:
                name: azure-cloud-credentials
                namespace: openshift-machine-api
              image:
                offer: ""
                publisher: ""
                resourceID: /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-arm64/versions/1.0.0 1
                sku: ""
                version: ""
              kind: AzureMachineProviderSpec
              location: <region>
              managedIdentity: <infrastructure_id>-identity
              networkResourceGroup: <infrastructure_id>-rg
              osDisk:
                diskSettings: {}
                diskSizeGB: 128
                managedDisk:
                  storageAccountType: Premium_LRS
                osType: Linux
              publicIP: false
              publicLoadBalancer: <infrastructure_id>
              resourceGroup: <infrastructure_id>-rg
              subnet: <infrastructure_id>-worker-subnet
              userDataSecret:
                name: worker-user-data
              vmSize: Standard_D4ps_v5 2
              vnet: <infrastructure_id>-vnet
              zone: "<zone>"

    1
    Set the resourceID parameter to either arm64 or amd64 boot image.
    2
    Set the vmSize parameter to the instance type used in your installation. Some example instance types are Standard_D4ps_v5 or D8ps.
  3. Create the compute machine set by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the YAML file with compute machine set configuration. For example: arm64-machine-set-0.yaml, or amd64-machine-set-0.yaml.

Verification

  1. Verify that the new machines are running by running the following command:

    $ oc get machineset -n openshift-machine-api

    The output must include the machine set that you created.

    Example output

    NAME                                                DESIRED  CURRENT  READY  AVAILABLE  AGE
    <infrastructure_id>-machine-set-0                   2        2      2          2  10m

  2. You can check if the nodes are ready and schedulable by running the following command:

    $ oc get nodes

3.3. Creating a cluster with multi-architecture compute machines on AWS

To create an AWS cluster with multi-architecture compute machines, you must first create a single-architecture AWS installer-provisioned cluster with the multi-architecture installer binary. For more information on AWS installations, see Installing a cluster on AWS with customizations.

You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.

After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.

3.3.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.3.2. Adding a multi-architecture compute machine set to your AWS cluster

After creating a multi-architecture cluster, you can add nodes with different architectures.

You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:

  • Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
  • Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You used the installation program to create an 64-bit ARM or 64-bit x86 single-architecture AWS cluster with the multi-architecture installer binary.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.

    Example MachineSet object for an AWS 64-bit ARM or x86 compute node

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1
      name: <infrastructure_id>-aws-machine-set-0 2
      namespace: openshift-machine-api
    spec:
      replicas: 1
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id> 3
          machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>-<zone> 4
      template:
        metadata:
          labels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machine-role: <role> 5
            machine.openshift.io/cluster-api-machine-type: <role> 6
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>-<zone> 7
        spec:
          metadata:
            labels:
              node-role.kubernetes.io/<role>: ""
          providerSpec:
            value:
              ami:
                id: ami-02a574449d4f4d280 8
              apiVersion: awsproviderconfig.openshift.io/v1beta1
              blockDevices:
                - ebs:
                    iops: 0
                    volumeSize: 120
                    volumeType: gp2
              credentialsSecret:
                name: aws-cloud-credentials
              deviceIndex: 0
              iamInstanceProfile:
                id: <infrastructure_id>-worker-profile 9
              instanceType: m6g.xlarge 10
              kind: AWSMachineProviderConfig
              placement:
                availabilityZone: us-east-1a 11
                region: <region> 12
              securityGroups:
                - filters:
                    - name: tag:Name
                      values:
                        - <infrastructure_id>-worker-sg 13
              subnet:
                filters:
                  - name: tag:Name
                    values:
                      - <infrastructure_id>-private-<zone>
              tags:
                - name: kubernetes.io/cluster/<infrastructure_id> 14
                  value: owned
                - name: <custom_tag_name>
                  value: <custom_tag_value>
              userDataSecret:
                name: worker-user-data

    1 2 3 9 13 14
    Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI (oc) installed, you can obtain the infrastructure ID by running the following command:
    $ oc get -o jsonpath=‘{.status.infrastructureName}{“\n”}’ infrastructure cluster
    4 7
    Specify the infrastructure ID, role node label, and zone.
    5 6
    Specify the role node label to add.
    8
    Specify a Red Hat Enterprise Linux CoreOS (RHCOS) Amazon Machine Image (AMI) for your AWS zone for the nodes. The RHCOS AMI must be compatible with the machine architecture.
    $ oc get configmap/coreos-bootimages \
    	  -n openshift-machine-config-operator \
    	  -o jsonpath='{.data.stream}' | jq \
    	  -r '.architectures.<arch>.images.aws.regions."<region>".image'
    10
    Specify a machine type that aligns with the CPU architecture of the chosen AMI. For more information, see "Tested instance types for AWS 64-bit ARM"
    11
    Specify the zone. For example, us-east-1a. Ensure that the zone you select has machines with the required architecture.
    12
    Specify the region. For example, us-east-1. Ensure that the zone you select has machines with the required architecture.
  3. Create the compute machine set by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the YAML file with compute machine set configuration. For example: aws-arm64-machine-set-0.yaml, or aws-amd64-machine-set-0.yaml.

Verification

  1. View the list of compute machine sets by running the following command:

    $ oc get machineset -n openshift-machine-api

    The output must include the machine set that you created.

    Example output

    NAME                                                DESIRED  CURRENT  READY  AVAILABLE  AGE
    <infrastructure_id>-aws-machine-set-0                   2        2      2          2  10m

  2. You can check if the nodes are ready and schedulable by running the following command:

    $ oc get nodes

3.4. Creating a cluster with multi-architecture compute machines on GCP

To create a Google Cloud Platform (GCP) cluster with multi-architecture compute machines, you must first create a single-architecture GCP installer-provisioned cluster with the multi-architecture installer binary. For more information on AWS installations, see Installing a cluster on GCP with customizations.

You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.

After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.

Note

Secure booting is currently not supported on 64-bit ARM machines for GCP

3.4.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.4.2. Adding a multi-architecture compute machine set to your GCP cluster

After creating a multi-architecture cluster, you can add nodes with different architectures.

You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:

  • Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
  • Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You used the installation program to create a 64-bit x86 or 64-bit ARM single-architecture GCP cluster with the multi-architecture installer binary.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.

    Example MachineSet object for a GCP 64-bit ARM or 64-bit x86 compute node

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1
      name: <infrastructure_id>-w-a
      namespace: openshift-machine-api
    spec:
      replicas: 1
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id>
          machine.openshift.io/cluster-api-machineset: <infrastructure_id>-w-a
      template:
        metadata:
          creationTimestamp: null
          labels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machine-role: <role> 2
            machine.openshift.io/cluster-api-machine-type: <role>
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-w-a
        spec:
          metadata:
            labels:
              node-role.kubernetes.io/<role>: ""
          providerSpec:
            value:
              apiVersion: gcpprovider.openshift.io/v1beta1
              canIPForward: false
              credentialsSecret:
                name: gcp-cloud-credentials
              deletionProtection: false
              disks:
              - autoDelete: true
                boot: true
                image: <path_to_image> 3
                labels: null
                sizeGb: 128
                type: pd-ssd
              gcpMetadata: 4
              - key: <custom_metadata_key>
                value: <custom_metadata_value>
              kind: GCPMachineProviderSpec
              machineType: n1-standard-4 5
              metadata:
                creationTimestamp: null
              networkInterfaces:
              - network: <infrastructure_id>-network
                subnetwork: <infrastructure_id>-worker-subnet
              projectID: <project_name> 6
              region: us-central1 7
              serviceAccounts:
              - email: <infrastructure_id>-w@<project_name>.iam.gserviceaccount.com
                scopes:
                - https://www.googleapis.com/auth/cloud-platform
              tags:
                - <infrastructure_id>-worker
              userDataSecret:
                name: worker-user-data
              zone: us-central1-a

    1
    Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. You can obtain the infrastructure ID by running the following command:
    $ oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster
    2
    Specify the role node label to add.
    3
    Specify the path to the image that is used in current compute machine sets. You need the project and image name for your path to image.

    To access the project and image name, run the following command:

    $ oc get configmap/coreos-bootimages \
      -n openshift-machine-config-operator \
      -o jsonpath='{.data.stream}' | jq \
      -r '.architectures.aarch64.images.gcp'

    Example output

      "gcp": {
        "release": "415.92.202309142014-0",
        "project": "rhcos-cloud",
        "name": "rhcos-415-92-202309142014-0-gcp-aarch64"
      }

    Use the project and name parameters from the output to create the path to image field in your machine set. The path to the image should follow the following format:

    $ projects/<project>/global/images/<image_name>
    4
    Optional: Specify custom metadata in the form of a key:value pair. For example use cases, see the GCP documentation for setting custom metadata.
    5
    Specify a machine type that aligns with the CPU architecture of the chosen OS image. For more information, see "Tested instance types for GCP on 64-bit ARM infrastructures".
    6
    Specify the name of the GCP project that you use for your cluster.
    7
    Specify the region. For example, us-central1. Ensure that the zone you select has machines with the required architecture.
  3. Create the compute machine set by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the YAML file with compute machine set configuration. For example: gcp-arm64-machine-set-0.yaml, or gcp-amd64-machine-set-0.yaml.

Verification

  1. View the list of compute machine sets by running the following command:

    $ oc get machineset -n openshift-machine-api

    The output must include the machine set that you created.

    Example output

    NAME                                                DESIRED  CURRENT  READY  AVAILABLE  AGE
    <infrastructure_id>-gcp-machine-set-0                   2        2      2          2  10m

  2. You can check if the nodes are ready and schedulable by running the following command:

    $ oc get nodes

3.5. Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z

To create a cluster with multi-architecture compute machines on bare metal (x86_64 or aarch64), IBM Power® (ppc64le), or IBM Z® (s390x) you must have an existing single-architecture cluster on one of these platforms. Follow the installations procedures for your platform:

Important

The bare metal installer-provisioned infrastructure and the Bare Metal Operator do not support adding secondary architecture nodes during the initial cluster setup. You can add secondary architecture nodes manually only after the initial cluster setup.

Before you can add additional compute nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using an ISO image or network PXE booting. This allows you to add additional nodes to your cluster and deploy a cluster with multi-architecture compute machines.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

3.5.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.5.2. Creating RHCOS machines using an ISO image

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using an ISO image to create the machines.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • You must have the OpenShift CLI (oc) installed.

Procedure

  1. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  2. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URLs of these files.
  3. You can validate that the ignition files are available on the URLs. The following example gets the Ignition config files for the compute node:

    $ curl -k http://<HTTP_server>/worker.ign
  4. You can access the ISO image for booting your new machine by running to following command:

    RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.<architecture>.artifacts.metal.formats.iso.disk.location')
  5. Use the ISO file to install RHCOS on more compute machines. Use the same method that you used when you created machines before you installed the cluster:

    • Burn the ISO image to a disk and boot it directly.
    • Use ISO redirection with a LOM interface.
  6. Boot the RHCOS ISO image without specifying any options, or interrupting the live boot sequence. Wait for the installer to boot into a shell prompt in the RHCOS live environment.

    Note

    You can interrupt the RHCOS installation boot process to add kernel arguments. However, for this ISO procedure you must use the coreos-installer command as outlined in the following steps, instead of adding kernel arguments.

  7. Run the coreos-installer command and specify the options that meet your installation requirements. At a minimum, you must specify the URL that points to the Ignition config file for the node type, and the device that you are installing to:

    $ sudo coreos-installer install --ignition-url=http://<HTTP_server>/<node_type>.ign <device> --ignition-hash=sha512-<digest> 12
    1
    You must run the coreos-installer command by using sudo, because the core user does not have the required root privileges to perform the installation.
    2
    The --ignition-hash option is required when the Ignition config file is obtained through an HTTP URL to validate the authenticity of the Ignition config file on the cluster node. <digest> is the Ignition config file SHA512 digest obtained in a preceding step.
    Note

    If you want to provide your Ignition config files through an HTTPS server that uses TLS, you can add the internal certificate authority (CA) to the system trust store before running coreos-installer.

    The following example initializes a bootstrap node installation to the /dev/sda device. The Ignition config file for the bootstrap node is obtained from an HTTP web server with the IP address 192.168.1.2:

    $ sudo coreos-installer install --ignition-url=http://192.168.1.2:80/installation_directory/bootstrap.ign /dev/sda --ignition-hash=sha512-a5a2d43879223273c9b60af66b44202a1d1248fc01cf156c46d4a79f552b6bad47bc8cc78ddf0116e80c59d2ea9e32ba53bc807afbca581aa059311def2c3e3b
  8. Monitor the progress of the RHCOS installation on the console of the machine.

    Important

    Ensure that the installation is successful on each node before commencing with the OpenShift Container Platform installation. Observing the installation process can also help to determine the cause of RHCOS installation issues that might arise.

  9. Continue to create more compute machines for your cluster.

3.5.3. Creating RHCOS machines by PXE or iPXE booting

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using PXE or iPXE booting.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • Obtain the URLs of the RHCOS ISO image, compressed metal BIOS, kernel, and initramfs files that you uploaded to your HTTP server during cluster installation.
  • You have access to the PXE booting infrastructure that you used to create the machines for your OpenShift Container Platform cluster during installation. The machines must boot from their local disks after RHCOS is installed on them.
  • If you use UEFI, you have access to the grub.conf file that you modified during OpenShift Container Platform installation.

Procedure

  1. Confirm that your PXE or iPXE installation for the RHCOS images is correct.

    • For PXE:

      DEFAULT pxeboot
      TIMEOUT 20
      PROMPT 0
      LABEL pxeboot
          KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> 1
          APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img 2
      1
      Specify the location of the live kernel file that you uploaded to your HTTP server.
      2
      Specify locations of the RHCOS files that you uploaded to your HTTP server. The initrd parameter value is the location of the live initramfs file, the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file, and the coreos.live.rootfs_url parameter value is the location of the live rootfs file. The coreos.inst.ignition_url and coreos.live.rootfs_url parameters only support HTTP and HTTPS.
      Note

      This configuration does not enable serial console access on machines with a graphical console. To configure a different console, add one or more console= arguments to the APPEND line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?.

    • For iPXE (x86_64 + aarch64):

      kernel http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> initrd=main coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
      initrd --name main http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 3
      boot
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP server. The kernel parameter value is the location of the kernel file, the initrd=main argument is needed for booting on UEFI systems, the coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your HTTP server.
      Note

      This configuration does not enable serial console access on machines with a graphical console To configure a different console, add one or more console= arguments to the kernel line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux? and "Enabling the serial console for PXE and ISO installation" in the "Advanced RHCOS installation configuration" section.

      Note

      To network boot the CoreOS kernel on aarch64 architecture, you need to use a version of iPXE build with the IMAGE_GZIP option enabled. See IMAGE_GZIP option in iPXE.

    • For PXE (with UEFI and GRUB as second stage) on aarch64:

      menuentry 'Install CoreOS' {
          linux rhcos-<version>-live-kernel-<architecture>  coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
          initrd rhcos-<version>-live-initramfs.<architecture>.img 3
      }
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP/TFTP server. The kernel parameter value is the location of the kernel file on your TFTP server. The coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file on your HTTP Server.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your TFTP server.
  2. Use the PXE or iPXE infrastructure to create the required compute machines for your cluster.

3.5.4. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.30.3
    master-1  Ready     master  63m  v1.30.3
    master-2  Ready     master  64m  v1.30.3

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.30.3
    master-1  Ready     master  73m  v1.30.3
    master-2  Ready     master  74m  v1.30.3
    worker-0  Ready     worker  11m  v1.30.3
    worker-1  Ready     worker  11m  v1.30.3

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

3.6. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with z/VM

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) with z/VM, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using a z/VM instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

3.6.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.6.2. Creating RHCOS machines on IBM Z with z/VM

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines running on IBM Z® with z/VM and attach them to your existing cluster.

Prerequisites

  • You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
  • You have an HTTP or HTTPS server running on your provisioning machine that is accessible to the machines you create.

Procedure

  1. Disable UDP aggregation.

    Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.

    1. Create a YAML file udp-aggregation-config.yaml with the following content:

      apiVersion: v1
      kind: ConfigMap
      data:
        disable-udp-aggregation: "true"
      metadata:
        name: udp-aggregation-config
        namespace: openshift-network-operator
    2. Create the ConfigMap resource by running the following command:

      $ oc create -f udp-aggregation-config.yaml
  2. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  3. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
  4. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:

    $ curl -k http://<http_server>/worker.ign
  5. Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
  6. Move the downloaded RHEL live kernel, initramfs, and rootfs files to an HTTP or HTTPS server that is accessible from the z/VM guest you want to add.
  7. Create a parameter file for the z/VM guest. The following parameters are specific for the virtual machine:

    • Optional: To specify a static IP address, add an ip= parameter with the following entries, with each separated by a colon:

      1. The IP address for the machine.
      2. An empty string.
      3. The gateway.
      4. The netmask.
      5. The machine host and domain name in the form hostname.domainname. Omit this value to let RHCOS decide.
      6. The network interface name. Omit this value to let RHCOS decide.
      7. The value none.
    • For coreos.inst.ignition_url=, specify the URL to the worker.ign file. Only HTTP and HTTPS protocols are supported.
    • For coreos.live.rootfs_url=, specify the matching rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported.
    • For installations on DASD-type disks, complete the following tasks:

      1. For coreos.inst.install_dev=, specify /dev/dasda.
      2. Use rd.dasd= to specify the DASD where RHCOS is to be installed.
      3. You can adjust further parameters if required.

        The following is an example parameter file, additional-worker-dasd.parm:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/dasda \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        rd.dasd=0.0.3490 \
        zfcp.allow_lun_scan=0

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

    • For installations on FCP-type disks, complete the following tasks:

      1. Use rd.zfcp=<adapter>,<wwpn>,<lun> to specify the FCP disk where RHCOS is to be installed. For multipathing, repeat this step for each additional path.

        Note

        When you install with multiple paths, you must enable multipathing directly after the installation, not at a later point in time, as this can cause problems.

      2. Set the install device as: coreos.inst.install_dev=/dev/sda.

        Note

        If additional LUNs are configured with NPIV, FCP requires zfcp.allow_lun_scan=0. If you must enable zfcp.allow_lun_scan=1 because you use a CSI driver, for example, you must configure your NPIV so that each node cannot access the boot partition of another node.

      3. You can adjust further parameters if required.

        Important

        Additional postinstallation steps are required to fully enable multipathing. For more information, see “Enabling multipathing with kernel arguments on RHCOS" in Postinstallation machine configuration tasks.

        The following is an example parameter file, additional-worker-fcp.parm for a worker node with multipathing:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/sda \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        zfcp.allow_lun_scan=0 \
        rd.zfcp=0.0.1987,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.1987,0x50050763071bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763071bc5e3,0x4008400B00000000

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

  8. Transfer the initramfs, kernel, parameter files, and RHCOS images to z/VM, for example, by using FTP. For details about how to transfer the files with FTP and boot from the virtual reader, see Installing under Z/VM.
  9. Punch the files to the virtual reader of the z/VM guest virtual machine.

    See PUNCH in IBM® Documentation.

    Tip

    You can use the CP PUNCH command or, if you use Linux, the vmur command to transfer files between two z/VM guest virtual machines.

  10. Log in to CMS on the bootstrap machine.
  11. IPL the bootstrap machine from the reader by running the following command:

    $ ipl c

    See IPL in IBM® Documentation.

3.6.3. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.30.3
    master-1  Ready     master  63m  v1.30.3
    master-2  Ready     master  64m  v1.30.3

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.30.3
    master-1  Ready     master  73m  v1.30.3
    master-2  Ready     master  74m  v1.30.3
    worker-0  Ready     worker  11m  v1.30.3
    worker-1  Ready     worker  11m  v1.30.3

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

3.7. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE in an LPAR

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) in an LPAR, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using an LPAR instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

Note

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

3.7.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.7.2. Creating RHCOS machines on IBM Z with z/VM

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines running on IBM Z® with z/VM and attach them to your existing cluster.

Prerequisites

  • You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
  • You have an HTTP or HTTPS server running on your provisioning machine that is accessible to the machines you create.

Procedure

  1. Disable UDP aggregation.

    Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.

    1. Create a YAML file udp-aggregation-config.yaml with the following content:

      apiVersion: v1
      kind: ConfigMap
      data:
        disable-udp-aggregation: "true"
      metadata:
        name: udp-aggregation-config
        namespace: openshift-network-operator
    2. Create the ConfigMap resource by running the following command:

      $ oc create -f udp-aggregation-config.yaml
  2. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  3. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
  4. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:

    $ curl -k http://<http_server>/worker.ign
  5. Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
  6. Move the downloaded RHEL live kernel, initramfs, and rootfs files to an HTTP or HTTPS server that is accessible from the z/VM guest you want to add.
  7. Create a parameter file for the z/VM guest. The following parameters are specific for the virtual machine:

    • Optional: To specify a static IP address, add an ip= parameter with the following entries, with each separated by a colon:

      1. The IP address for the machine.
      2. An empty string.
      3. The gateway.
      4. The netmask.
      5. The machine host and domain name in the form hostname.domainname. Omit this value to let RHCOS decide.
      6. The network interface name. Omit this value to let RHCOS decide.
      7. The value none.
    • For coreos.inst.ignition_url=, specify the URL to the worker.ign file. Only HTTP and HTTPS protocols are supported.
    • For coreos.live.rootfs_url=, specify the matching rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported.
    • For installations on DASD-type disks, complete the following tasks:

      1. For coreos.inst.install_dev=, specify /dev/dasda.
      2. Use rd.dasd= to specify the DASD where RHCOS is to be installed.
      3. You can adjust further parameters if required.

        The following is an example parameter file, additional-worker-dasd.parm:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/dasda \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        rd.dasd=0.0.3490 \
        zfcp.allow_lun_scan=0

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

    • For installations on FCP-type disks, complete the following tasks:

      1. Use rd.zfcp=<adapter>,<wwpn>,<lun> to specify the FCP disk where RHCOS is to be installed. For multipathing, repeat this step for each additional path.

        Note

        When you install with multiple paths, you must enable multipathing directly after the installation, not at a later point in time, as this can cause problems.

      2. Set the install device as: coreos.inst.install_dev=/dev/sda.

        Note

        If additional LUNs are configured with NPIV, FCP requires zfcp.allow_lun_scan=0. If you must enable zfcp.allow_lun_scan=1 because you use a CSI driver, for example, you must configure your NPIV so that each node cannot access the boot partition of another node.

      3. You can adjust further parameters if required.

        Important

        Additional postinstallation steps are required to fully enable multipathing. For more information, see “Enabling multipathing with kernel arguments on RHCOS" in Postinstallation machine configuration tasks.

        The following is an example parameter file, additional-worker-fcp.parm for a worker node with multipathing:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/sda \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        zfcp.allow_lun_scan=0 \
        rd.zfcp=0.0.1987,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.1987,0x50050763071bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763071bc5e3,0x4008400B00000000

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

  8. Transfer the initramfs, kernel, parameter files, and RHCOS images to z/VM, for example, by using FTP. For details about how to transfer the files with FTP and boot from the virtual reader, see Installing under Z/VM.
  9. Punch the files to the virtual reader of the z/VM guest virtual machine.

    See PUNCH in IBM® Documentation.

    Tip

    You can use the CP PUNCH command or, if you use Linux, the vmur command to transfer files between two z/VM guest virtual machines.

  10. Log in to CMS on the bootstrap machine.
  11. IPL the bootstrap machine from the reader by running the following command:

    $ ipl c

    See IPL in IBM® Documentation.

3.7.3. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.30.3
    master-1  Ready     master  63m  v1.30.3
    master-2  Ready     master  64m  v1.30.3

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.30.3
    master-1  Ready     master  73m  v1.30.3
    master-2  Ready     master  74m  v1.30.3
    worker-0  Ready     worker  11m  v1.30.3
    worker-1  Ready     worker  11m  v1.30.3

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

3.8. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with RHEL KVM

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) with RHEL KVM, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using a RHEL KVM instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

3.8.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.8.2. Creating RHCOS machines using virt-install

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using virt-install.

Prerequisites

  • You have at least one LPAR running on RHEL 8.7 or later with KVM, referred to as RHEL KVM host in this procedure.
  • The KVM/QEMU hypervisor is installed on the RHEL KVM host.
  • You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
  • An HTTP or HTTPS server is set up.

Procedure

  1. Disable UDP aggregation.

    Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.

    1. Create a YAML file udp-aggregation-config.yaml with the following content:

      apiVersion: v1
      kind: ConfigMap
      data:
        disable-udp-aggregation: "true"
      metadata:
        name: udp-aggregation-config
        namespace: openshift-network-operator
    2. Create the ConfigMap resource by running the following command:

      $ oc create -f udp-aggregation-config.yaml
  2. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  3. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
  4. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:

    $ curl -k http://<HTTP_server>/worker.ign
  5. Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

     $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
  6. Move the downloaded RHEL live kernel, initramfs and rootfs files to an HTTP or HTTPS server before you launch virt-install.
  7. Create the new KVM guest nodes using the RHEL kernel, initramfs, and Ignition files; the new disk image; and adjusted parm line arguments.

    $ virt-install \
       --connect qemu:///system \
       --name <vm_name> \
       --autostart \
       --os-variant rhel9.4 \ 1
       --cpu host \
       --vcpus <vcpus> \
       --memory <memory_mb> \
       --disk <vm_name>.qcow2,size=<image_size> \
       --network network=<virt_network_parm> \
       --location <media_location>,kernel=<rhcos_kernel>,initrd=<rhcos_initrd> \ 2
       --extra-args "rd.neednet=1" \
       --extra-args "coreos.inst.install_dev=/dev/vda" \
       --extra-args "coreos.inst.ignition_url=http://<http_server>/worker.ign " \ 3
       --extra-args "coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img" \ 4
       --extra-args "ip=<ip>::<gateway>:<netmask>:<hostname>::none" \ 5
       --extra-args "nameserver=<dns>" \
       --extra-args "console=ttysclp0" \
       --noautoconsole \
       --wait
    1
    For os-variant, specify the RHEL version for the RHCOS compute machine. rhel9.4 is the recommended version. To query the supported RHEL version of your operating system, run the following command:
    $ osinfo-query os -f short-id
    Note

    The os-variant is case sensitive.

    2
    For --location, specify the location of the kernel/initrd on the HTTP or HTTPS server.
    3
    Specify the location of the worker.ign config file. Only HTTP and HTTPS protocols are supported.
    4
    Specify the location of the rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported
    5
    Optional: For hostname, specify the fully qualified hostname of the client machine.
    Note

    If you are using HAProxy as a load balancer, update your HAProxy rules for ingress-router-443 and ingress-router-80 in the /etc/haproxy/haproxy.cfg configuration file.

  8. Continue to create more compute machines for your cluster.

3.8.3. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.30.3
    master-1  Ready     master  63m  v1.30.3
    master-2  Ready     master  64m  v1.30.3

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.30.3
    master-1  Ready     master  73m  v1.30.3
    master-2  Ready     master  74m  v1.30.3
    worker-0  Ready     worker  11m  v1.30.3
    worker-1  Ready     worker  11m  v1.30.3

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

3.9. Creating a cluster with multi-architecture compute machines on IBM Power

To create a cluster with multi-architecture compute machines on IBM Power® (ppc64le), you must have an existing single-architecture (x86_64) cluster. You can then add ppc64le compute machines to your OpenShift Container Platform cluster.

Important

Before you can add ppc64le nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using an ISO image or network PXE booting. This will allow you to add ppc64le nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Power® (ppc64le) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Power®. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

3.9.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).
Note

When using multiple architectures, hosts for OpenShift Container Platform nodes must share the same storage layer. If they do not have the same storage layer, use a storage provider such as nfs-provisioner.

Note

You should limit the number of network hops between the compute and control plane as much as possible.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

3.9.2. Creating RHCOS machines using an ISO image

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using an ISO image to create the machines.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • You must have the OpenShift CLI (oc) installed.

Procedure

  1. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  2. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URLs of these files.
  3. You can validate that the ignition files are available on the URLs. The following example gets the Ignition config files for the compute node:

    $ curl -k http://<HTTP_server>/worker.ign
  4. You can access the ISO image for booting your new machine by running to following command:

    RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.<architecture>.artifacts.metal.formats.iso.disk.location')
  5. Use the ISO file to install RHCOS on more compute machines. Use the same method that you used when you created machines before you installed the cluster:

    • Burn the ISO image to a disk and boot it directly.
    • Use ISO redirection with a LOM interface.
  6. Boot the RHCOS ISO image without specifying any options, or interrupting the live boot sequence. Wait for the installer to boot into a shell prompt in the RHCOS live environment.

    Note

    You can interrupt the RHCOS installation boot process to add kernel arguments. However, for this ISO procedure you must use the coreos-installer command as outlined in the following steps, instead of adding kernel arguments.

  7. Run the coreos-installer command and specify the options that meet your installation requirements. At a minimum, you must specify the URL that points to the Ignition config file for the node type, and the device that you are installing to:

    $ sudo coreos-installer install --ignition-url=http://<HTTP_server>/<node_type>.ign <device> --ignition-hash=sha512-<digest> 12
    1
    You must run the coreos-installer command by using sudo, because the core user does not have the required root privileges to perform the installation.
    2
    The --ignition-hash option is required when the Ignition config file is obtained through an HTTP URL to validate the authenticity of the Ignition config file on the cluster node. <digest> is the Ignition config file SHA512 digest obtained in a preceding step.
    Note

    If you want to provide your Ignition config files through an HTTPS server that uses TLS, you can add the internal certificate authority (CA) to the system trust store before running coreos-installer.

    The following example initializes a bootstrap node installation to the /dev/sda device. The Ignition config file for the bootstrap node is obtained from an HTTP web server with the IP address 192.168.1.2:

    $ sudo coreos-installer install --ignition-url=http://192.168.1.2:80/installation_directory/bootstrap.ign /dev/sda --ignition-hash=sha512-a5a2d43879223273c9b60af66b44202a1d1248fc01cf156c46d4a79f552b6bad47bc8cc78ddf0116e80c59d2ea9e32ba53bc807afbca581aa059311def2c3e3b
  8. Monitor the progress of the RHCOS installation on the console of the machine.

    Important

    Ensure that the installation is successful on each node before commencing with the OpenShift Container Platform installation. Observing the installation process can also help to determine the cause of RHCOS installation issues that might arise.

  9. Continue to create more compute machines for your cluster.

3.9.3. Creating RHCOS machines by PXE or iPXE booting

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using PXE or iPXE booting.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • Obtain the URLs of the RHCOS ISO image, compressed metal BIOS, kernel, and initramfs files that you uploaded to your HTTP server during cluster installation.
  • You have access to the PXE booting infrastructure that you used to create the machines for your OpenShift Container Platform cluster during installation. The machines must boot from their local disks after RHCOS is installed on them.
  • If you use UEFI, you have access to the grub.conf file that you modified during OpenShift Container Platform installation.

Procedure

  1. Confirm that your PXE or iPXE installation for the RHCOS images is correct.

    • For PXE:

      DEFAULT pxeboot
      TIMEOUT 20
      PROMPT 0
      LABEL pxeboot
          KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> 1
          APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img 2
      1
      Specify the location of the live kernel file that you uploaded to your HTTP server.
      2
      Specify locations of the RHCOS files that you uploaded to your HTTP server. The initrd parameter value is the location of the live initramfs file, the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file, and the coreos.live.rootfs_url parameter value is the location of the live rootfs file. The coreos.inst.ignition_url and coreos.live.rootfs_url parameters only support HTTP and HTTPS.
      Note

      This configuration does not enable serial console access on machines with a graphical console. To configure a different console, add one or more console= arguments to the APPEND line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?.

    • For iPXE (x86_64 + ppc64le):

      kernel http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> initrd=main coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
      initrd --name main http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 3
      boot
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP server. The kernel parameter value is the location of the kernel file, the initrd=main argument is needed for booting on UEFI systems, the coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your HTTP server.
      Note

      This configuration does not enable serial console access on machines with a graphical console To configure a different console, add one or more console= arguments to the kernel line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux? and "Enabling the serial console for PXE and ISO installation" in the "Advanced RHCOS installation configuration" section.

      Note

      To network boot the CoreOS kernel on ppc64le architecture, you need to use a version of iPXE build with the IMAGE_GZIP option enabled. See IMAGE_GZIP option in iPXE.

    • For PXE (with UEFI and GRUB as second stage) on ppc64le:

      menuentry 'Install CoreOS' {
          linux rhcos-<version>-live-kernel-<architecture>  coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
          initrd rhcos-<version>-live-initramfs.<architecture>.img 3
      }
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP/TFTP server. The kernel parameter value is the location of the kernel file on your TFTP server. The coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file on your HTTP Server.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your TFTP server.
  2. Use the PXE or iPXE infrastructure to create the required compute machines for your cluster.

3.9.4. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.30.3
    master-1  Ready     master  63m  v1.30.3
    master-2  Ready     master  64m  v1.30.3

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes -o wide

    Example output

    NAME               STATUS   ROLES                  AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                  CONTAINER-RUNTIME
    worker-0-ppc64le   Ready    worker                 42d   v1.30.3   192.168.200.21   <none>        Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.ppc64le   cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
    worker-1-ppc64le   Ready    worker                 42d   v1.30.3   192.168.200.20   <none>        Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.ppc64le   cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
    master-0-x86       Ready    control-plane,master   75d   v1.30.3   10.248.0.38      10.248.0.38   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
    master-1-x86       Ready    control-plane,master   75d   v1.30.3   10.248.0.39      10.248.0.39   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
    master-2-x86       Ready    control-plane,master   75d   v1.30.3   10.248.0.40      10.248.0.40   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
    worker-0-x86       Ready    worker                 75d   v1.30.3   10.248.0.43      10.248.0.43   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
    worker-1-x86       Ready    worker                 75d   v1.30.3   10.248.0.44      10.248.0.44   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

3.10. Managing a cluster with multi-architecture compute machines

Managing a cluster that has nodes with multiple architectures requires you to consider node architecture as you monitor the cluster and manage your workloads. This requires you to take additional considerations into account when you configure cluster resource requirements and behavior, or schedule workloads in a multi-architecture cluster.

3.10.1. Scheduling workloads on clusters with multi-architecture compute machines

When you deploy workloads on a cluster with compute nodes that use different architectures, you must align pod architecture with the architecture of the underlying node. Your workload may also require additional configuration to particular resources depending on the underlying node architecture.

You can use the Multiarch Tuning Operator to enable architecture-aware scheduling of workloads on clusters with multi-architecture compute machines. The Multiarch Tuning Operator implements additional scheduler predicates in the pods specifications based on the architectures that the pods can support at creation time.

3.10.1.1. Sample multi-architecture node workload deployments

Scheduling a workload to an appropriate node based on architecture works in the same way as scheduling based on any other node characteristic. Consider the following options when determining how to schedule your workloads.

Using nodeAffinity to schedule nodes with specific architectures

You can allow a workload to be scheduled on only a set of nodes with architectures supported by its images, you can set the spec.affinity.nodeAffinity field in your pod’s template specification.

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
   # ...
  template:
     # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values: 1
                - amd64
                - arm64
1
Specify the supported architectures. Valid values include amd64,arm64, or both values.
Tainting each node for a specific architecture

You can taint a node to avoid the node scheduling workloads that are incompatible with its architecture. When your cluster uses a MachineSet object, you can add parameters to the .spec.template.spec.taints field to avoid workloads being scheduled on nodes with non-supported architectures.

Before you add a taint to a node, you must scale down the MachineSet object or remove existing available machines. For more information, see Modifying a compute machine set.

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      # ...
      taints:
      - effect: NoSchedule
        key: multiarch.openshift.io/arch
        value: arm64

You can also set a taint on a specific node by running the following command:

$ oc adm taint nodes <node-name> multiarch.openshift.io/arch=arm64:NoSchedule
Creating a default toleration in a namespace

When a node or machine set has a taint, only workloads that tolerate that taint can be scheduled. You can annotate a namespace so all of the workloads get the same default toleration by running the following command:

$ oc annotate namespace my-namespace \
  'scheduler.alpha.kubernetes.io/defaultTolerations'='[{"operator": "Exists", "effect": "NoSchedule", "key": "multiarch.openshift.io/arch"}]'
Tolerating architecture taints in workloads

When a node or machine set has a taint, only workloads that tolerate that taint can be scheduled. You can configure your workload with a toleration so that it is scheduled on nodes with specific architecture taints.

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      tolerations:
      - key: "multiarch.openshift.io/arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoSchedule"

This example deployment can be scheduled on nodes and machine sets that have the multiarch.openshift.io/arch=arm64 taint specified.

Using node affinity with taints and tolerations

When a scheduler computes the set of nodes to schedule a pod, tolerations can broaden the set while node affinity restricts the set. If you set a taint on nodes that have a specific architecture, you must also add a toleration to workloads that you want to be scheduled there.

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
      tolerations:
      - key: "multiarch.openshift.io/arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoSchedule"

3.10.2. Enabling 64k pages on the Red Hat Enterprise Linux CoreOS (RHCOS) kernel

You can enable the 64k memory page in the Red Hat Enterprise Linux CoreOS (RHCOS) kernel on the 64-bit ARM compute machines in your cluster. The 64k page size kernel specification can be used for large GPU or high memory workloads. This is done using the Machine Config Operator (MCO) which uses a machine config pool to update the kernel. To enable 64k page sizes, you must dedicate a machine config pool for ARM64 to enable on the kernel.

Important

Using 64k pages is exclusive to 64-bit ARM architecture compute nodes or clusters installed on 64-bit ARM machines. If you configure the 64k pages kernel on a machine config pool using 64-bit x86 machines, the machine config pool and MCO will degrade.

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You created a cluster with compute nodes of different architecture on one of the supported platforms.

Procedure

  1. Label the nodes where you want to run the 64k page size kernel:

    $ oc label node <node_name> <label>

    Example command

    $ oc label node worker-arm64-01 node-role.kubernetes.io/worker-64k-pages=

  2. Create a machine config pool that contains the worker role that uses the ARM64 architecture and the worker-64k-pages role:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      name: worker-64k-pages
    spec:
      machineConfigSelector:
        matchExpressions:
          - key: machineconfiguration.openshift.io/role
            operator: In
            values:
            - worker
            - worker-64k-pages
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker-64k-pages: ""
          kubernetes.io/arch: arm64
  3. Create a machine config on your compute node to enable 64k-pages with the 64k-pages parameter.

    $ oc create -f <filename>.yaml

    Example MachineConfig

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "worker-64k-pages" 1
      name: 99-worker-64kpages
    spec:
      kernelType: 64k-pages 2

    1
    Specify the value of the machineconfiguration.openshift.io/role label in the custom machine config pool. The example MachineConfig uses the worker-64k-pages label to enable 64k pages in the worker-64k-pages pool.
    2
    Specify your desired kernel type. Valid values are 64k-pages and default
    Note

    The 64k-pages type is supported on only 64-bit ARM architecture based compute nodes. The realtime type is supported on only 64-bit x86 architecture based compute nodes.

Verification

  • To view your new worker-64k-pages machine config pool, run the following command:

    $ oc get mcp

    Example output

    NAME     CONFIG                                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master   rendered-master-9d55ac9a91127c36314e1efe7d77fbf8                      True      False      False      3              3                   3                     0                      361d
    worker   rendered-worker-e7b61751c4a5b7ff995d64b967c421ff                      True      False      False      7              7                   7                     0                      361d
    worker-64k-pages  rendered-worker-64k-pages-e7b61751c4a5b7ff995d64b967c421ff   True      False      False      2              2                   2                     0                      35m

3.10.3. Importing manifest lists in image streams on your multi-architecture compute machines

On an OpenShift Container Platform 4.17 cluster with multi-architecture compute machines, the image streams in the cluster do not import manifest lists automatically. You must manually change the default importMode option to the PreserveOriginal option in order to import the manifest list.

Prerequisites

  • You installed the OpenShift Container Platform CLI (oc).

Procedure

  • The following example command shows how to patch the ImageStream cli-artifacts so that the cli-artifacts:latest image stream tag is imported as a manifest list.

    $ oc patch is/cli-artifacts -n openshift -p '{"spec":{"tags":[{"name":"latest","importPolicy":{"importMode":"PreserveOriginal"}}]}}'

Verification

  • You can check that the manifest lists imported properly by inspecting the image stream tag. The following command will list the individual architecture manifests for a particular tag.

    $ oc get istag cli-artifacts:latest -n openshift -oyaml

    If the dockerImageManifests object is present, then the manifest list import was successful.

    Example output of the dockerImageManifests object

    dockerImageManifests:
      - architecture: amd64
        digest: sha256:16d4c96c52923a9968fbfa69425ec703aff711f1db822e4e9788bf5d2bee5d77
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux
      - architecture: arm64
        digest: sha256:6ec8ad0d897bcdf727531f7d0b716931728999492709d19d8b09f0d90d57f626
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux
      - architecture: ppc64le
        digest: sha256:65949e3a80349cdc42acd8c5b34cde6ebc3241eae8daaeea458498fedb359a6a
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux
      - architecture: s390x
        digest: sha256:75f4fa21224b5d5d511bea8f92dfa8e1c00231e5c81ab95e83c3013d245d1719
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux

3.11. Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator

The Multiarch Tuning Operator optimizes workload management within multi-architecture clusters and in single-architecture clusters transitioning to multi-architecture environments.

Architecture-aware workload scheduling allows the scheduler to place pods onto nodes that match the architecture of the pod images.

By default, the scheduler does not consider the architecture of a pod’s container images when determining the placement of new pods onto nodes.

To enable architecture-aware workload scheduling, you must create the ClusterPodPlacementConfig object. When you create the ClusterPodPlacementConfig object, the Multiarch Tuning Operator deploys the necessary operands to support architecture-aware workload scheduling.

When a pod is created, the operands perform the following actions:

  1. Add the multiarch.openshift.io/scheduling-gate scheduling gate that prevents the scheduling of the pod.
  2. Compute a scheduling predicate that includes the supported architecture values for the kubernetes.io/arch label.
  3. Integrate the scheduling predicate as a nodeAffinity requirement in the pod specification.
  4. Remove the scheduling gate from the pod.
Important

Note the following operand behaviors:

  • If the nodeSelector field is already configured with the kubernetes.io/arch label for a workload, the operand does not update the nodeAffinity field for that workload.
  • If the nodeSelector field is not configured with the kubernetes.io/arch label for a workload, the operand updates the nodeAffinity field for that workload. However, in that nodeAffinity field, the operand updates only the node selector terms that are not configured with the kubernetes.io/arch label.
  • If the nodeName field is already set, the Multiarch Tuning Operator does not process the pod.

3.11.1. Installing the Multiarch Tuning Operator by using the CLI

You can install the Multiarch Tuning Operator by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.

Procedure

  1. Create a new project named openshift-multiarch-tuning-operator by running the following command:

    $ oc create ns openshift-multiarch-tuning-operator
  2. Create an OperatorGroup object:

    1. Create a YAML file with the configuration for creating an OperatorGroup object.

      Example YAML configuration for creating an OperatorGroup object

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: openshift-multiarch-tuning-operator
        namespace: openshift-multiarch-tuning-operator
      spec: {}

    2. Create the OperatorGroup object by running the following command:

      $ oc create -f <file_name> 1
      1
      Replace <file_name> with the name of the YAML file that contains the OperatorGroup object configuration.
  3. Create a Subscription object:

    1. Create a YAML file with the configuration for creating a Subscription object.

      Example YAML configuration for creating a Subscription object

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: openshift-multiarch-tuning-operator
        namespace: openshift-multiarch-tuning-operator
      spec:
        channel: stable
        name: multiarch-tuning-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
        installPlanApproval: Automatic
        startingCSV: multiarch-tuning-operator.v1.0.0

    2. Create the Subscription object by running the following command:

      $ oc create -f <file_name> 1
      1
      Replace <file_name> with the name of the YAML file that contains the Subscription object configuration.
Note

For more details about configuring the Subscription object and OperatorGroup object, see "Installing from OperatorHub using the CLI".

Verification

  1. To verify that the Multiarch Tuning Operator is installed, run the following command:

    $ oc get csv -n openshift-multiarch-tuning-operator

    Example output

    NAME                               DISPLAY                     VERSION   REPLACES                              PHASE
    multiarch-tuning-operator.v1.0.0   Multiarch Tuning Operator   1.0.0     multiarch-tuning-operator.v0.9.0      Succeeded

    The installation is successful if the Operator is in Succeeded phase.

  2. Optional: To verify that the OperatorGroup object is created, run the following command:

    $ oc get operatorgroup -n openshift-multiarch-tuning-operator

    Example output

    NAME                                        AGE
    openshift-multiarch-tuning-operator-q8zbb   133m

  3. Optional: To verify that the Subscription object is created, run the following command:

    $ oc get subscription -n openshift-multiarch-tuning-operator

    Example output

    NAME                        PACKAGE                     SOURCE                  CHANNEL
    multiarch-tuning-operator   multiarch-tuning-operator   redhat-operators        stable

3.11.2. Installing the Multiarch Tuning Operator by using the web console

You can install the Multiarch Tuning Operator by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to Operators → OperatorHub.
  3. Enter Multiarch Tuning Operator in the search field.
  4. Click Multiarch Tuning Operator.
  5. Select the Multiarch Tuning Operator version from the Version list.
  6. Click Install
  7. Set the following options on the Operator Installation page:

    1. Set Update Channel to stable.
    2. Set Installation Mode to All namespaces on the cluster.
    3. Set Installed Namespace to Operator recommended Namespace or Select a Namespace.

      The recommended Operator namespace is openshift-multiarch-tuning-operator. If the openshift-multiarch-tuning-operator namespace does not exist, it is created during the operator installation.

      If you select Select a namespace, you must select a namespace for the Operator from the Select Project list.

    4. Update approval as Automatic or Manual.

      If you select Automatic updates, Operator Lifecycle Manager (OLM) automatically updates the running instance of the Multiarch Tuning Operator without any intervention.

      If you select Manual updates, OLM creates an update request. As a cluster administrator, you must manually approve the update request to update the Multiarch Tuning Operator to a newer version.

  8. Optional: Select the Enable Operator recommended cluster monitoring on this Namespace checkbox.
  9. Click Install.

Verification

  1. Navigate to OperatorsInstalled Operators.
  2. Verify that the Multiarch Tuning Operator is listed with the Status field as Succeeded in the openshift-multiarch-tuning-operator namespace.

3.11.3. Multiarch Tuning Operator pod labels and architecture support overview

After installing the Multiarch Tuning Operator, you can verify the multi-architecture support for workloads in your cluster. You can identify and manage pods based on their architecture compatibility by using the pod labels. These labels are automatically set on the newly created pods to provide insights into their architecture support.

The following table describes the labels that the Multiarch Tuning Operator adds when you create a pod:

Table 3.2. Pod labels that the Multiarch Tuning Operator adds when you create a pod
LabelDescription

multiarch.openshift.io/multi-arch: ""

The pod supports multiple architectures.

multiarch.openshift.io/single-arch: ""

The pod supports only a single architecture.

multiarch.openshift.io/arm64: ""

The pod supports the arm64 architecture.

multiarch.openshift.io/amd64: ""

The pod supports the amd64 architecture.

multiarch.openshift.io/ppc64le: ""

The pod supports the ppc64le architecture.

multiarch.openshift.io/s390x: ""

The pod supports the s390x architecture.

multirach.openshift.io/node-affinity: set

The Operator has set the node affinity requirement for the architecture.

multirach.openshift.io/node-affinity: not-set

The Operator did not set the node affinity requirement. For example, when the pod already has a node affinity for the architecture, the Multiarch Tuning Operator adds this label to the pod.

multiarch.openshift.io/scheduling-gate: gated

The pod is gated.

multiarch.openshift.io/scheduling-gate: removed

The pod gate has been removed.

multiarch.openshift.io/inspection-error: ""

An error has occurred while building the node affinity requirements.

3.11.4. Creating the ClusterPodPlacementConfig object

After installing the Multiarch Tuning Operator, you must create the ClusterPodPlacementConfig object. When you create this object, the Multiarch Tuning Operator deploys an operand that enables architecture-aware workload scheduling.

Note

You can create only one instance of the ClusterPodPlacementConfig object.

Example ClusterPodPlacementConfig object configuration

apiVersion: multiarch.openshift.io/v1beta1
kind: ClusterPodPlacementConfig
metadata:
  name: cluster 1
spec:
  logVerbosityLevel: Normal 2
  namespaceSelector: 3
    matchExpressions:
      - key: multiarch.openshift.io/exclude-pod-placement
        operator: DoesNotExist

1
You must set this field value to cluster.
2
Optional: You can set the field value to Normal, Debug, Trace, or TraceAll. The value is set to Normal by default.
3
Optional: You can configure the namespaceSelector to select the namespaces in which the Multiarch Tuning Operator’s pod placement operand must process the nodeAffinity of the pods. All namespaces are considered by default.

In this example, the operator field value is set to DoesNotExist. Therefore, if the key field value (multiarch.openshift.io/exclude-pod-placement) is set as a label in a namespace, the operand does not process the nodeAffinity of the pods in that namespace. Instead, the operand processes the nodeAffinity of the pods in namespaces that do not contain the label.

If you want the operand to process the nodeAffinity of the pods only in specific namespaces, you can configure the namespaceSelector as follows:

namespaceSelector:
  matchExpressions:
    - key: multiarch.openshift.io/include-pod-placement
      operator: Exists

In this example, the operator field value is set to Exists. Therefore, the operand processes the nodeAffinity of the pods only in namespaces that contain the multiarch.openshift.io/include-pod-placement label.

Important

This Operator excludes pods in namespaces starting with kube-. It also excludes pods that are expected to be scheduled on control plane nodes.

3.11.4.1. Creating the ClusterPodPlacementConfig object by using the CLI

To deploy the pod placement operand that enables architecture-aware workload scheduling, you can create the ClusterPodPlacementConfig object by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.
  • You have installed the Multiarch Tuning Operator.

Procedure

  1. Create a ClusterPodPlacementConfig object YAML file:

    Example ClusterPodPlacementConfig object configuration

    apiVersion: multiarch.openshift.io/v1beta1
    kind: ClusterPodPlacementConfig
    metadata:
      name: cluster
    spec:
      logVerbosityLevel: Normal
      namespaceSelector:
        matchExpressions:
          - key: multiarch.openshift.io/exclude-pod-placement
            operator: DoesNotExist

  2. Create the ClusterPodPlacementConfig object by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the ClusterPodPlacementConfig object YAML file.

Verification

  • To check that the ClusterPodPlacementConfig object is created, run the following command:

    $ oc get clusterpodplacementconfig

    Example output

    NAME      AGE
    cluster   29s

3.11.4.2. Creating the ClusterPodPlacementConfig object by using the web console

To deploy the pod placement operand that enables architecture-aware workload scheduling, you can create the ClusterPodPlacementConfig object by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the Multiarch Tuning Operator.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to OperatorsInstalled Operators.
  3. On the Installed Operators page, click Multiarch Tuning Operator.
  4. Click the Cluster Pod Placement Config tab.
  5. Select either Form view or YAML view.
  6. Configure the ClusterPodPlacementConfig object parameters.
  7. Click Create.
  8. Optional: If you want to edit the ClusterPodPlacementConfig object, perform the following actions:

    1. Click the Cluster Pod Placement Config tab.
    2. Select Edit ClusterPodPlacementConfig from the options menu.
    3. Click YAML and edit the ClusterPodPlacementConfig object parameters.
    4. Click Save.

Verification

  • On the Cluster Pod Placement Config page, check that the ClusterPodPlacementConfig object is in the Ready state.

3.11.5. Deleting the ClusterPodPlacementConfig object by using the CLI

You can create only one instance of the ClusterPodPlacementConfig object. If you want to re-create this object, you must first delete the existing instance.

You can delete this object by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Delete the ClusterPodPlacementConfig object by running the following command:

    $ oc delete clusterpodplacementconfig cluster

Verification

  • To check that the ClusterPodPlacementConfig object is deleted, run the following command:

    $ oc get clusterpodplacementconfig

    Example output

    No resources found

3.11.6. Deleting the ClusterPodPlacementConfig object by using the web console

You can create only one instance of the ClusterPodPlacementConfig object. If you want to re-create this object, you must first delete the existing instance.

You can delete this object by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have created the ClusterPodPlacementConfig object.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to OperatorsInstalled Operators.
  3. On the Installed Operators page, click Multiarch Tuning Operator.
  4. Click the Cluster Pod Placement Config tab.
  5. Select Delete ClusterPodPlacementConfig from the options menu.
  6. Click Delete.

Verification

  • On the Cluster Pod Placement Config page, check that the ClusterPodPlacementConfig object has been deleted.

3.11.7. Uninstalling the Multiarch Tuning Operator by using the CLI

You can uninstall the Multiarch Tuning Operator by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.
  • You deleted the ClusterPodPlacementConfig object.

    Important

    You must delete the ClusterPodPlacementConfig object before uninstalling the Multiarch Tuning Operator. Uninstalling the Operator without deleting the ClusterPodPlacementConfig object can lead to unexpected behavior.

Procedure

  1. Get the Subscription object name for the Multiarch Tuning Operator by running the following command:

    $ oc get subscription.operators.coreos.com -n <namespace> 1
    1
    Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    NAME                                  PACKAGE                     SOURCE             CHANNEL
    openshift-multiarch-tuning-operator   multiarch-tuning-operator   redhat-operators   stable

  2. Get the currentCSV value for the Multiarch Tuning Operator by running the following command:

    $ oc get subscription.operators.coreos.com <subscription_name> -n <namespace> -o yaml | grep currentCSV 1
    1
    Replace <subscription_name> with the Subscription object name. For example: openshift-multiarch-tuning-operator. Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    currentCSV: multiarch-tuning-operator.v1.0.0

  3. Delete the Subscription object by running the following command:

    $ oc delete subscription.operators.coreos.com <subscription_name> -n <namespace> 1
    1
    Replace <subscription_name> with the Subscription object name. Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    subscription.operators.coreos.com "openshift-multiarch-tuning-operator" deleted

  4. Delete the CSV for the Multiarch Tuning Operator in the target namespace using the currentCSV value by running the following command:

    $ oc delete clusterserviceversion <currentCSV_value> -n <namespace> 1
    1
    Replace <currentCSV> with the currentCSV value for the Multiarch Tuning Operator. For example: multiarch-tuning-operator.v1.0.0. Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    clusterserviceversion.operators.coreos.com "multiarch-tuning-operator.v1.0.0" deleted

Verification

  • To verify that the Multiarch Tuning Operator is uninstalled, run the following command:

    $ oc get csv -n <namespace> 1
    1
    Replace <namespace> with the name of the namespace where you have uninstalled the Multiarch Tuning Operator.

    Example output

    No resources found in openshift-multiarch-tuning-operator namespace.

3.11.8. Uninstalling the Multiarch Tuning Operator by using the web console

You can uninstall the Multiarch Tuning Operator by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin permissions.
  • You deleted the ClusterPodPlacementConfig object.

    Important

    You must delete the ClusterPodPlacementConfig object before uninstalling the Multiarch Tuning Operator. Uninstalling the Operator without deleting the ClusterPodPlacementConfig object can lead to unexpected behavior.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to Operators → OperatorHub.
  3. Enter Multiarch Tuning Operator in the search field.
  4. Click Multiarch Tuning Operator.
  5. Click the Details tab.
  6. From the Actions menu, select Uninstall Operator.
  7. When prompted, click Uninstall.

Verification

  1. Navigate to OperatorsInstalled Operators.
  2. On the Installed Operators page, verify that the Multiarch Tuning Operator is not listed.

3.12. Multiarch Tuning Operator release notes

The Multiarch Tuning Operator optimizes workload management within multi-architecture clusters and in single-architecture clusters transitioning to multi-architecture environments.

These release notes track the development of the Multiarch Tuning Operator.

For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

3.12.1. Release notes for the Multiarch Tuning Operator 1.0.0

Issued: 31 October 2024

3.12.1.1. New features and enhancements
  • With this release, the Multiarch Tuning Operator supports custom network scenarios and cluster-wide custom registries configurations.
  • With this release, you can identify pods based on their architecture compatibility by using the pod labels that the Multiarch Tuning Operator adds to newly created pods.
  • With this release, you can monitor the behavior of the Multiarch Tuning Operator by using the metrics and alerts that are registered in the Cluster Monitoring Operator.

Chapter 4. Postinstallation cluster tasks

After installing OpenShift Container Platform, you can further expand and customize your cluster to your requirements.

4.1. Available cluster customizations

You complete most of the cluster configuration and customization after you deploy your OpenShift Container Platform cluster. A number of configuration resources are available.

Note

If you install your cluster on IBM Z®, not all features and functions are available.

You modify the configuration resources to configure the major features of the cluster, such as the image registry, networking configuration, image build behavior, and the identity provider.

For current documentation of the settings that you control by using these resources, use the oc explain command, for example oc explain builds --api-version=config.openshift.io/v1

4.1.1. Cluster configuration resources

All cluster configuration resources are globally scoped (not namespaced) and named cluster.

Resource nameDescription

apiserver.config.openshift.io

Provides API server configuration such as certificates and certificate authorities.

authentication.config.openshift.io

Controls the identity provider and authentication configuration for the cluster.

build.config.openshift.io

Controls default and enforced configuration for all builds on the cluster.

console.config.openshift.io

Configures the behavior of the web console interface, including the logout behavior.

featuregate.config.openshift.io

Enables FeatureGates so that you can use Tech Preview features.

image.config.openshift.io

Configures how specific image registries should be treated (allowed, disallowed, insecure, CA details).

ingress.config.openshift.io

Configuration details related to routing such as the default domain for routes.

oauth.config.openshift.io

Configures identity providers and other behavior related to internal OAuth server flows.

project.config.openshift.io

Configures how projects are created including the project template.

proxy.config.openshift.io

Defines proxies to be used by components needing external network access. Note: not all components currently consume this value.

scheduler.config.openshift.io

Configures scheduler behavior such as profiles and default node selectors.

4.1.2. Operator configuration resources

These configuration resources are cluster-scoped instances, named cluster, which control the behavior of a specific component as owned by a particular Operator.

Resource nameDescription

consoles.operator.openshift.io

Controls console appearance such as branding customizations

config.imageregistry.operator.openshift.io

Configures OpenShift image registry settings such as public routing, log levels, proxy settings, resource constraints, replica counts, and storage type.

config.samples.operator.openshift.io

Configures the Samples Operator to control which example image streams and templates are installed on the cluster.

4.1.3. Additional configuration resources

These configuration resources represent a single instance of a particular component. In some cases, you can request multiple instances by creating multiple instances of the resource. In other cases, the Operator can use only a specific resource instance name in a specific namespace. Reference the component-specific documentation for details on how and when you can create additional resource instances.

Resource nameInstance nameNamespaceDescription

alertmanager.monitoring.coreos.com

main

openshift-monitoring

Controls the Alertmanager deployment parameters.

ingresscontroller.operator.openshift.io

default

openshift-ingress-operator

Configures Ingress Operator behavior such as domain, number of replicas, certificates, and controller placement.

4.1.4. Informational Resources

You use these resources to retrieve information about the cluster. Some configurations might require you to edit these resources directly.

Resource nameInstance nameDescription

clusterversion.config.openshift.io

version

In OpenShift Container Platform 4.17, you must not customize the ClusterVersion resource for production clusters. Instead, follow the process to update a cluster.

dns.config.openshift.io

cluster

You cannot modify the DNS settings for your cluster. You can check the DNS Operator status.

infrastructure.config.openshift.io

cluster

Configuration details allowing the cluster to interact with its cloud provider.

network.config.openshift.io

cluster

You cannot modify your cluster networking after installation. To customize your network, follow the process to customize networking during installation.

4.2. Adding worker nodes

After you deploy your OpenShift Container Platform cluster, you can add worker nodes to scale cluster resources. There are different ways you can add worker nodes depending on the installation method and the environment of your cluster.

4.2.1. Adding worker nodes to an on-premise cluster

For on-premise clusters, you can add worker nodes by using the OpenShift Container Platform CLI (oc) to generate an ISO image, which can then be used to boot one or more nodes in your target cluster. This process can be used regardless of how you installed your cluster.

You can add one or more nodes at a time while customizing each node with more complex configurations, such as static network configuration, or you can specify only the MAC address of each node. Any configurations that are not specified during ISO generation are retrieved from the target cluster and applied to the new nodes.

Preflight validation checks are also performed when booting the ISO image to inform you of failure-causing issues before you attempt to boot each node.

Adding worker nodes to an on-premise cluster

4.2.2. Adding worker nodes to installer-provisioned infrastructure clusters

For installer-provisioned infrastructure clusters, you can manually or automatically scale the MachineSet object to match the number of available bare-metal hosts.

To add a bare-metal host, you must configure all network prerequisites, configure an associated baremetalhost object, then provision the worker node to the cluster. You can add a bare-metal host manually or by using the web console.

4.2.3. Adding worker nodes to user-provisioned infrastructure clusters

For user-provisioned infrastructure clusters, you can add worker nodes by using a RHEL or RHCOS ISO image and connecting it to your cluster using cluster Ignition config files. For RHEL worker nodes, the following example uses Ansible playbooks to add worker nodes to the cluster. For RHCOS worker nodes, the following example uses an ISO image and network booting to add worker nodes to the cluster.

4.2.4. Adding worker nodes to clusters managed by the Assisted Installer

For clusters managed by the Assisted Installer, you can add worker nodes by using the Red Hat OpenShift Cluster Manager console, the Assisted Installer REST API or you can manually add worker nodes using an ISO image and cluster Ignition config files.

4.2.5. Adding worker nodes to clusters managed by the multicluster engine for Kubernetes

For clusters managed by the multicluster engine for Kubernetes, you can add worker nodes by using the dedicated multicluster engine console.

4.3. Adjust worker nodes

If you incorrectly sized the worker nodes during deployment, adjust them by creating one or more new compute machine sets, scale them up, then scale the original compute machine set down before removing them.

4.3.1. Understanding the difference between compute machine sets and the machine config pool

MachineSet objects describe OpenShift Container Platform nodes with respect to the cloud or machine provider.

The MachineConfigPool object allows MachineConfigController components to define and provide the status of machines in the context of upgrades.

The MachineConfigPool object allows users to configure how upgrades are rolled out to the OpenShift Container Platform nodes in the machine config pool.

The NodeSelector object can be replaced with a reference to the MachineSet object.

4.3.2. Scaling a compute machine set manually

To add or remove an instance of a machine in a compute machine set, you can manually scale the compute machine set.

This guidance is relevant to fully automated, installer-provisioned infrastructure installations. Customized, user-provisioned infrastructure installations do not have compute machine sets.

Prerequisites

  • Install an OpenShift Container Platform cluster and the oc command line.
  • Log in to oc as a user with cluster-admin permission.

Procedure

  1. View the compute machine sets that are in the cluster by running the following command:

    $ oc get machinesets.machine.openshift.io -n openshift-machine-api

    The compute machine sets are listed in the form of <clusterid>-worker-<aws-region-az>.

  2. View the compute machines that are in the cluster by running the following command:

    $ oc get machines.machine.openshift.io -n openshift-machine-api
  3. Set the annotation on the compute machine that you want to delete by running the following command:

    $ oc annotate machines.machine.openshift.io/<machine_name> -n openshift-machine-api machine.openshift.io/delete-machine="true"
  4. Scale the compute machine set by running one of the following commands:

    $ oc scale --replicas=2 machinesets.machine.openshift.io <machineset> -n openshift-machine-api

    Or:

    $ oc edit machinesets.machine.openshift.io <machineset> -n openshift-machine-api
    Tip

    You can alternatively apply the following YAML to scale the compute machine set:

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      name: <machineset>
      namespace: openshift-machine-api
    spec:
      replicas: 2

    You can scale the compute machine set up or down. It takes several minutes for the new machines to be available.

    Important

    By default, the machine controller tries to drain the node that is backed by the machine until it succeeds. In some situations, such as with a misconfigured pod disruption budget, the drain operation might not be able to succeed. If the drain operation fails, the machine controller cannot proceed removing the machine.

    You can skip draining the node by annotating machine.openshift.io/exclude-node-draining in a specific machine.

Verification

  • Verify the deletion of the intended machine by running the following command:

    $ oc get machines.machine.openshift.io

4.3.3. The compute machine set deletion policy

Random, Newest, and Oldest are the three supported deletion options. The default is Random, meaning that random machines are chosen and deleted when scaling compute machine sets down. The deletion policy can be set according to the use case by modifying the particular compute machine set:

spec:
  deletePolicy: <delete_policy>
  replicas: <desired_replica_count>

Specific machines can also be prioritized for deletion by adding the annotation machine.openshift.io/delete-machine=true to the machine of interest, regardless of the deletion policy.

Important

By default, the OpenShift Container Platform router pods are deployed on workers. Because the router is required to access some cluster resources, including the web console, do not scale the worker compute machine set to 0 unless you first relocate the router pods.

Note

Custom compute machine sets can be used for use cases requiring that services run on specific nodes and that those services are ignored by the controller when the worker compute machine sets are scaling down. This prevents service disruption.

4.3.4. Creating default cluster-wide node selectors

You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.

With cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.

You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a compute machine set, or a machine config. Adding the label to the compute machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.

Note

You can add additional key/value pairs to a pod. But you cannot add a different value for a default key.

Procedure

To add a default cluster-wide node selector:

  1. Edit the Scheduler Operator CR to add the default cluster-wide node selectors:

    $ oc edit scheduler cluster

    Example Scheduler Operator CR with a node selector

    apiVersion: config.openshift.io/v1
    kind: Scheduler
    metadata:
      name: cluster
    ...
    spec:
      defaultNodeSelector: type=user-node,region=east 1
      mastersSchedulable: false

    1
    Add a node selector with the appropriate <key>:<value> pairs.

    After making this change, wait for the pods in the openshift-kube-apiserver project to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.

  2. Add labels to a node by using a compute machine set or editing the node directly:

    • Use a compute machine set to add labels to nodes managed by the compute machine set when a node is created:

      1. Run the following command to add labels to a MachineSet object:

        $ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]'  -n openshift-machine-api 1
        1
        Add a <key>/<value> pair for each label.

        For example:

        $ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]'  -n openshift-machine-api
        Tip

        You can alternatively apply the following YAML to add labels to a compute machine set:

        apiVersion: machine.openshift.io/v1beta1
        kind: MachineSet
        metadata:
          name: <machineset>
          namespace: openshift-machine-api
        spec:
          template:
            spec:
              metadata:
                labels:
                  region: "east"
                  type: "user-node"
      2. Verify that the labels are added to the MachineSet object by using the oc edit command:

        For example:

        $ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-api

        Example MachineSet object

        apiVersion: machine.openshift.io/v1beta1
        kind: MachineSet
          ...
        spec:
          ...
          template:
            metadata:
          ...
            spec:
              metadata:
                labels:
                  region: east
                  type: user-node
          ...

      3. Redeploy the nodes associated with that compute machine set by scaling down to 0 and scaling up the nodes:

        For example:

        $ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
        $ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
      4. When the nodes are ready and available, verify that the label is added to the nodes by using the oc get command:

        $ oc get nodes -l <key>=<value>

        For example:

        $ oc get nodes -l type=user-node

        Example output

        NAME                                       STATUS   ROLES    AGE   VERSION
        ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp   Ready    worker   61s   v1.30.3

    • Add labels directly to a node:

      1. Edit the Node object for the node:

        $ oc label nodes <name> <key>=<value>

        For example, to label a node:

        $ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=east
        Tip

        You can alternatively apply the following YAML to add labels to a node:

        kind: Node
        apiVersion: v1
        metadata:
          name: <node_name>
          labels:
            type: "user-node"
            region: "east"
      2. Verify that the labels are added to the node using the oc get command:

        $ oc get nodes -l <key>=<value>,<key>=<value>

        For example:

        $ oc get nodes -l type=user-node,region=east

        Example output

        NAME                                       STATUS   ROLES    AGE   VERSION
        ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49   Ready    worker   17m   v1.30.3

4.4. Improving cluster stability in high latency environments using worker latency profiles

If the cluster administrator has performed latency tests for platform verification, they can discover the need to adjust the operation of the cluster to ensure stability in cases of high latency. The cluster administrator needs to change only one parameter, recorded in a file, which controls four parameters affecting how supervisory processes read status and interpret the health of the cluster. Changing only the one parameter provides cluster tuning in an easy, supportable manner.

The Kubelet process provides the starting point for monitoring cluster health. The Kubelet sets status values for all nodes in the OpenShift Container Platform cluster. The Kubernetes Controller Manager (kube controller) reads the status values every 10 seconds, by default. If the kube controller cannot read a node status value, it loses contact with that node after a configured period. The default behavior is:

  1. The node controller on the control plane updates the node health to Unhealthy and marks the node Ready condition`Unknown`.
  2. In response, the scheduler stops scheduling pods to that node.
  3. The Node Lifecycle Controller adds a node.kubernetes.io/unreachable taint with a NoExecute effect to the node and schedules any pods on the node for eviction after five minutes, by default.

This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager might not receive an update from a healthy node due to network latency. The Kubelet evicts pods from the node even though the node is healthy.

To avoid this problem, you can use worker latency profiles to adjust the frequency that the Kubelet and the Kubernetes Controller Manager wait for status updates before taking action. These adjustments help to ensure that your cluster runs properly if network latency between the control plane and the worker nodes is not optimal.

These worker latency profiles contain three sets of parameters that are predefined with carefully tuned values to control the reaction of the cluster to increased latency. There is no need to experimentally find the best values manually.

You can configure worker latency profiles when installing a cluster or at any time you notice increased latency in your cluster network.

4.4.1. Understanding worker latency profiles

Worker latency profiles are four different categories of carefully-tuned parameters. The four parameters which implement these values are node-status-update-frequency, node-monitor-grace-period, default-not-ready-toleration-seconds and default-unreachable-toleration-seconds. These parameters can use values which allow you to control the reaction of the cluster to latency issues without needing to determine the best values by using manual methods.

Important

Setting these parameters manually is not supported. Incorrect parameter settings adversely affect cluster stability.

All worker latency profiles configure the following parameters:

node-status-update-frequency
Specifies how often the kubelet posts node status to the API server.
node-monitor-grace-period
Specifies the amount of time in seconds that the Kubernetes Controller Manager waits for an update from a kubelet before marking the node unhealthy and adding the node.kubernetes.io/not-ready or node.kubernetes.io/unreachable taint to the node.
default-not-ready-toleration-seconds
Specifies the amount of time in seconds after marking a node unhealthy that the Kube API Server Operator waits before evicting pods from that node.
default-unreachable-toleration-seconds
Specifies the amount of time in seconds after marking a node unreachable that the Kube API Server Operator waits before evicting pods from that node.

The following Operators monitor the changes to the worker latency profiles and respond accordingly:

  • The Machine Config Operator (MCO) updates the node-status-update-frequency parameter on the worker nodes.
  • The Kubernetes Controller Manager updates the node-monitor-grace-period parameter on the control plane nodes.
  • The Kubernetes API Server Operator updates the default-not-ready-toleration-seconds and default-unreachable-toleration-seconds parameters on the control plane nodes.

Although the default configuration works in most cases, OpenShift Container Platform offers two other worker latency profiles for situations where the network is experiencing higher latency than usual. The three worker latency profiles are described in the following sections:

Default worker latency profile

With the Default profile, each Kubelet updates its status every 10 seconds (node-status-update-frequency). The Kube Controller Manager checks the statuses of Kubelet every 5 seconds.

The Kubernetes Controller Manager waits 40 seconds (node-monitor-grace-period) for a status update from Kubelet before considering the Kubelet unhealthy. If no status is made available to the Kubernetes Controller Manager, it then marks the node with the node.kubernetes.io/not-ready or node.kubernetes.io/unreachable taint and evicts the pods on that node.

If a pod is on a node that has the NoExecute taint, the pod runs according to tolerationSeconds. If the node has no taint, it will be evicted in 300 seconds (default-not-ready-toleration-seconds and default-unreachable-toleration-seconds settings of the Kube API Server).

ProfileComponentParameterValue

Default

kubelet

node-status-update-frequency

10s

Kubelet Controller Manager

node-monitor-grace-period

40s

Kubernetes API Server Operator

default-not-ready-toleration-seconds

300s

Kubernetes API Server Operator

default-unreachable-toleration-seconds

300s

Medium worker latency profile

Use the MediumUpdateAverageReaction profile if the network latency is slightly higher than usual.

The MediumUpdateAverageReaction profile reduces the frequency of kubelet updates to 20 seconds and changes the period that the Kubernetes Controller Manager waits for those updates to 2 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has the tolerationSeconds parameter, the eviction waits for the period specified by that parameter.

The Kubernetes Controller Manager waits for 2 minutes to consider a node unhealthy. In another minute, the eviction process starts.

ProfileComponentParameterValue

MediumUpdateAverageReaction

kubelet

node-status-update-frequency

20s

Kubelet Controller Manager

node-monitor-grace-period

2m

Kubernetes API Server Operator

default-not-ready-toleration-seconds

60s

Kubernetes API Server Operator

default-unreachable-toleration-seconds

60s

Low worker latency profile

Use the LowUpdateSlowReaction profile if the network latency is extremely high.

The LowUpdateSlowReaction profile reduces the frequency of kubelet updates to 1 minute and changes the period that the Kubernetes Controller Manager waits for those updates to 5 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has the tolerationSeconds parameter, the eviction waits for the period specified by that parameter.

The Kubernetes Controller Manager waits for 5 minutes to consider a node unhealthy. In another minute, the eviction process starts.

ProfileComponentParameterValue

LowUpdateSlowReaction

kubelet

node-status-update-frequency

1m

Kubelet Controller Manager

node-monitor-grace-period

5m

Kubernetes API Server Operator

default-not-ready-toleration-seconds

60s

Kubernetes API Server Operator

default-unreachable-toleration-seconds

60s

4.4.2. Using and changing worker latency profiles

To change a worker latency profile to deal with network latency, edit the node.config object to add the name of the profile. You can change the profile at any time as latency increases or decreases.

You must move one worker latency profile at a time. For example, you cannot move directly from the Default profile to the LowUpdateSlowReaction worker latency profile. You must move from the Default worker latency profile to the MediumUpdateAverageReaction profile first, then to LowUpdateSlowReaction. Similarly, when returning to the Default profile, you must move from the low profile to the medium profile first, then to Default.

Note

You can also configure worker latency profiles upon installing an OpenShift Container Platform cluster.

Procedure

To move from the default worker latency profile:

  1. Move to the medium worker latency profile:

    1. Edit the node.config object:

      $ oc edit nodes.config/cluster
    2. Add spec.workerLatencyProfile: MediumUpdateAverageReaction:

      Example node.config object

      apiVersion: config.openshift.io/v1
      kind: Node
      metadata:
        annotations:
          include.release.openshift.io/ibm-cloud-managed: "true"
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
          release.openshift.io/create-only: "true"
        creationTimestamp: "2022-07-08T16:02:51Z"
        generation: 1
        name: cluster
        ownerReferences:
        - apiVersion: config.openshift.io/v1
          kind: ClusterVersion
          name: version
          uid: 36282574-bf9f-409e-a6cd-3032939293eb
        resourceVersion: "1865"
        uid: 0c0f7a4c-4307-4187-b591-6155695ac85b
      spec:
        workerLatencyProfile: MediumUpdateAverageReaction 1
      
      # ...

      1
      Specifies the medium worker latency policy.

      Scheduling on each worker node is disabled as the change is being applied.

  2. Optional: Move to the low worker latency profile:

    1. Edit the node.config object:

      $ oc edit nodes.config/cluster
    2. Change the spec.workerLatencyProfile value to LowUpdateSlowReaction:

      Example node.config object

      apiVersion: config.openshift.io/v1
      kind: Node
      metadata:
        annotations:
          include.release.openshift.io/ibm-cloud-managed: "true"
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
          release.openshift.io/create-only: "true"
        creationTimestamp: "2022-07-08T16:02:51Z"
        generation: 1
        name: cluster
        ownerReferences:
        - apiVersion: config.openshift.io/v1
          kind: ClusterVersion
          name: version
          uid: 36282574-bf9f-409e-a6cd-3032939293eb
        resourceVersion: "1865"
        uid: 0c0f7a4c-4307-4187-b591-6155695ac85b
      spec:
        workerLatencyProfile: LowUpdateSlowReaction 1
      
      # ...

      1
      Specifies use of the low worker latency policy.

Scheduling on each worker node is disabled as the change is being applied.

Verification

  • When all nodes return to the Ready condition, you can use the following command to look in the Kubernetes Controller Manager to ensure it was applied:

    $ oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5

    Example output

    # ...
        - lastTransitionTime: "2022-07-11T19:47:10Z"
          reason: ProfileUpdated
          status: "False"
          type: WorkerLatencyProfileProgressing
        - lastTransitionTime: "2022-07-11T19:47:10Z" 1
          message: all static pod revision(s) have updated latency profile
          reason: ProfileUpdated
          status: "True"
          type: WorkerLatencyProfileComplete
        - lastTransitionTime: "2022-07-11T19:20:11Z"
          reason: AsExpected
          status: "False"
          type: WorkerLatencyProfileDegraded
        - lastTransitionTime: "2022-07-11T19:20:36Z"
          status: "False"
    # ...

    1
    Specifies that the profile is applied and active.

To change the medium profile to default or change the default to medium, edit the node.config object and set the spec.workerLatencyProfile parameter to the appropriate value.

4.5. Managing control plane machines

Control plane machine sets provide management capabilities for control plane machines that are similar to what compute machine sets provide for compute machines. The availability and initial status of control plane machine sets on your cluster depend on your cloud provider and the version of OpenShift Container Platform that you installed. For more information, see Getting started with control plane machine sets.

4.6. Creating infrastructure machine sets for production environments

You can create a compute machine set to create machines that host only infrastructure components, such as the default router, the integrated container image registry, and components for cluster metrics and monitoring. These infrastructure machines are not counted toward the total number of subscriptions that are required to run the environment.

In a production deployment, it is recommended that you deploy at least three compute machine sets to hold infrastructure components. Both OpenShift Logging and Red Hat OpenShift Service Mesh deploy Elasticsearch, which requires three instances to be installed on different nodes. Each of these nodes can be deployed to different availability zones for high availability. A configuration like this requires three different compute machine sets, one for each availability zone. In global Azure regions that do not have multiple availability zones, you can use availability sets to ensure high availability.

For information on infrastructure nodes and which components can run on infrastructure nodes, see Creating infrastructure machine sets.

To create an infrastructure node, you can use a machine set, assign a label to the nodes, or use a machine config pool.

For sample machine sets that you can use with these procedures, see Creating machine sets for different clouds.

Applying a specific node selector to all infrastructure components causes OpenShift Container Platform to schedule those workloads on nodes with that label.

4.6.1. Creating a compute machine set

In addition to the compute machine sets created by the installation program, you can create your own to dynamically manage the machine compute resources for specific workloads of your choice.

Prerequisites

  • Deploy an OpenShift Container Platform cluster.
  • Install the OpenShift CLI (oc).
  • Log in to oc as a user with cluster-admin permission.

Procedure

  1. Create a new YAML file that contains the compute machine set custom resource (CR) sample and is named <file_name>.yaml.

    Ensure that you set the <clusterID> and <role> parameter values.

  2. Optional: If you are not sure which value to set for a specific field, you can check an existing compute machine set from your cluster.

    1. To list the compute machine sets in your cluster, run the following command:

      $ oc get machinesets -n openshift-machine-api

      Example output

      NAME                                DESIRED   CURRENT   READY   AVAILABLE   AGE
      agl030519-vplxk-worker-us-east-1a   1         1         1       1           55m
      agl030519-vplxk-worker-us-east-1b   1         1         1       1           55m
      agl030519-vplxk-worker-us-east-1c   1         1         1       1           55m
      agl030519-vplxk-worker-us-east-1d   0         0                             55m
      agl030519-vplxk-worker-us-east-1e   0         0                             55m
      agl030519-vplxk-worker-us-east-1f   0         0                             55m

    2. To view values of a specific compute machine set custom resource (CR), run the following command:

      $ oc get machineset <machineset_name> \
        -n openshift-machine-api -o yaml

      Example output

      apiVersion: machine.openshift.io/v1beta1
      kind: MachineSet
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1
        name: <infrastructure_id>-<role> 2
        namespace: openshift-machine-api
      spec:
        replicas: 1
        selector:
          matchLabels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>
        template:
          metadata:
            labels:
              machine.openshift.io/cluster-api-cluster: <infrastructure_id>
              machine.openshift.io/cluster-api-machine-role: <role>
              machine.openshift.io/cluster-api-machine-type: <role>
              machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>
          spec:
            providerSpec: 3
              ...

      1
      The cluster infrastructure ID.
      2
      A default node label.
      Note

      For clusters that have user-provisioned infrastructure, a compute machine set can only create worker and infra type machines.

      3
      The values in the <providerSpec> section of the compute machine set CR are platform-specific. For more information about <providerSpec> parameters in the CR, see the sample compute machine set CR configuration for your provider.
  3. Create a MachineSet CR by running the following command:

    $ oc create -f <file_name>.yaml

Verification

  • View the list of compute machine sets by running the following command:

    $ oc get machineset -n openshift-machine-api

    Example output

    NAME                                DESIRED   CURRENT   READY   AVAILABLE   AGE
    agl030519-vplxk-infra-us-east-1a    1         1         1       1           11m
    agl030519-vplxk-worker-us-east-1a   1         1         1       1           55m
    agl030519-vplxk-worker-us-east-1b   1         1         1       1           55m
    agl030519-vplxk-worker-us-east-1c   1         1         1       1           55m
    agl030519-vplxk-worker-us-east-1d   0         0                             55m
    agl030519-vplxk-worker-us-east-1e   0         0                             55m
    agl030519-vplxk-worker-us-east-1f   0         0                             55m

    When the new compute machine set is available, the DESIRED and CURRENT values match. If the compute machine set is not available, wait a few minutes and run the command again.

4.6.2. Creating an infrastructure node

Important

See Creating infrastructure machine sets for installer-provisioned infrastructure environments or for any cluster where the control plane nodes are managed by the machine API.

Requirements of the cluster dictate that infrastructure, also called infra nodes, be provisioned. The installer only provides provisions for control plane and worker nodes. Worker nodes can be designated as infrastructure nodes or application, also called app, nodes through labeling.

Procedure

  1. Add a label to the worker node that you want to act as application node:

    $ oc label node <node-name> node-role.kubernetes.io/app=""
  2. Add a label to the worker nodes that you want to act as infrastructure nodes:

    $ oc label node <node-name> node-role.kubernetes.io/infra=""
  3. Check to see if applicable nodes now have the infra role and app roles:

    $ oc get nodes
  4. Create a default cluster-wide node selector. The default node selector is applied to pods created in all namespaces. This creates an intersection with any existing node selectors on a pod, which additionally constrains the pod’s selector.

    Important

    If the default node selector key conflicts with the key of a pod’s label, then the default node selector is not applied.

    However, do not set a default node selector that might cause a pod to become unschedulable. For example, setting the default node selector to a specific node role, such as node-role.kubernetes.io/infra="", when a pod’s label is set to a different node role, such as node-role.kubernetes.io/master="", can cause the pod to become unschedulable. For this reason, use caution when setting the default node selector to specific node roles.

    You can alternatively use a project node selector to avoid cluster-wide node selector key conflicts.

    1. Edit the Scheduler object:

      $ oc edit scheduler cluster
    2. Add the defaultNodeSelector field with the appropriate node selector:

      apiVersion: config.openshift.io/v1
      kind: Scheduler
      metadata:
        name: cluster
      spec:
        defaultNodeSelector: node-role.kubernetes.io/infra="" 1
      # ...
      1
      This example node selector deploys pods on infrastructure nodes by default.
    3. Save the file to apply the changes.

You can now move infrastructure resources to the newly labeled infra nodes.

Additional resources

  • For information on how to configure project node selectors to avoid cluster-wide node selector key conflicts, see Project node selectors.

4.6.3. Creating a machine config pool for infrastructure machines

If you need infrastructure machines to have dedicated configurations, you must create an infra pool.

Important

Creating a custom machine configuration pool overrides default worker pool configurations if they refer to the same file or unit.

Procedure

  1. Add a label to the node you want to assign as the infra node with a specific label:

    $ oc label node <node_name> <label>
    $ oc label node ci-ln-n8mqwr2-f76d1-xscn2-worker-c-6fmtx node-role.kubernetes.io/infra=
  2. Create a machine config pool that contains both the worker role and your custom role as machine config selector:

    $ cat infra.mcp.yaml

    Example output

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      name: infra
    spec:
      machineConfigSelector:
        matchExpressions:
          - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} 1
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/infra: "" 2

    1
    Add the worker role and your custom role.
    2
    Add the label you added to the node as a nodeSelector.
    Note

    Custom machine config pools inherit machine configs from the worker pool. Custom pools use any machine config targeted for the worker pool, but add the ability to also deploy changes that are targeted at only the custom pool. Because a custom pool inherits resources from the worker pool, any change to the worker pool also affects the custom pool.

  3. After you have the YAML file, you can create the machine config pool:

    $ oc create -f infra.mcp.yaml
  4. Check the machine configs to ensure that the infrastructure configuration rendered successfully:

    $ oc get machineconfig

    Example output

    NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
    00-master                                                   365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    00-worker                                                   365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    01-master-container-runtime                                 365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    01-master-kubelet                                           365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    01-worker-container-runtime                                 365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    01-worker-kubelet                                           365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    99-master-1ae2a1e0-a115-11e9-8f14-005056899d54-registries   365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    99-master-ssh                                                                                          3.2.0             31d
    99-worker-1ae64748-a115-11e9-8f14-005056899d54-registries   365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             31d
    99-worker-ssh                                                                                          3.2.0             31d
    rendered-infra-4e48906dca84ee702959c71a53ee80e7             365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             23m
    rendered-master-072d4b2da7f88162636902b074e9e28e            5b6fb8349a29735e48446d435962dec4547d3090   3.2.0             31d
    rendered-master-3e88ec72aed3886dec061df60d16d1af            02c07496ba0417b3e12b78fb32baf6293d314f79   3.2.0             31d
    rendered-master-419bee7de96134963a15fdf9dd473b25            365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             17d
    rendered-master-53f5c91c7661708adce18739cc0f40fb            365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             13d
    rendered-master-a6a357ec18e5bce7f5ac426fc7c5ffcd            365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             7d3h
    rendered-master-dc7f874ec77fc4b969674204332da037            5b6fb8349a29735e48446d435962dec4547d3090   3.2.0             31d
    rendered-worker-1a75960c52ad18ff5dfa6674eb7e533d            5b6fb8349a29735e48446d435962dec4547d3090   3.2.0             31d
    rendered-worker-2640531be11ba43c61d72e82dc634ce6            5b6fb8349a29735e48446d435962dec4547d3090   3.2.0             31d
    rendered-worker-4e48906dca84ee702959c71a53ee80e7            365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             7d3h
    rendered-worker-4f110718fe88e5f349987854a1147755            365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             17d
    rendered-worker-afc758e194d6188677eb837842d3b379            02c07496ba0417b3e12b78fb32baf6293d314f79   3.2.0             31d
    rendered-worker-daa08cc1e8f5fcdeba24de60cd955cc3            365c1cfd14de5b0e3b85e0fc815b0060f36ab955   3.2.0             13d

    You should see a new machine config, with the rendered-infra-* prefix.

  5. Optional: To deploy changes to a custom pool, create a machine config that uses the custom pool name as the label, such as infra. Note that this is not required and only shown for instructional purposes. In this manner, you can apply any custom configurations specific to only your infra nodes.

    Note

    After you create the new machine config pool, the MCO generates a new rendered config for that pool, and associated nodes of that pool reboot to apply the new configuration.

    1. Create a machine config:

      $ cat infra.mc.yaml

      Example output

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        name: 51-infra
        labels:
          machineconfiguration.openshift.io/role: infra 1
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files:
            - path: /etc/infratest
              mode: 0644
              contents:
                source: data:,infra

      1
      Add the label you added to the node as a nodeSelector.
    2. Apply the machine config to the infra-labeled nodes:

      $ oc create -f infra.mc.yaml
  6. Confirm that your new machine config pool is available:

    $ oc get mcp

    Example output

    NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    infra    rendered-infra-60e35c2e99f42d976e084fa94da4d0fc    True      False      False      1              1                   1                     0                      4m20s
    master   rendered-master-9360fdb895d4c131c7c4bebbae099c90   True      False      False      3              3                   3                     0                      91m
    worker   rendered-worker-60e35c2e99f42d976e084fa94da4d0fc   True      False      False      2              2                   2                     0                      91m

    In this example, a worker node was changed to an infra node.

Additional resources

4.7. Assigning machine set resources to infrastructure nodes

After creating an infrastructure machine set, the worker and infra roles are applied to new infra nodes. Nodes with the infra role are not counted toward the total number of subscriptions that are required to run the environment, even when the worker role is also applied.

However, when an infra node is assigned the worker role, there is a chance that user workloads can get assigned inadvertently to the infra node. To avoid this, you can apply a taint to the infra node and tolerations for the pods that you want to control.

4.7.1. Binding infrastructure node workloads using taints and tolerations

If you have an infra node that has the infra and worker roles assigned, you must configure the node so that user workloads are not assigned to it.

Important

It is recommended that you preserve the dual infra,worker label that is created for infra nodes and use taints and tolerations to manage nodes that user workloads are scheduled on. If you remove the worker label from the node, you must create a custom pool to manage it. A node with a label other than master or worker is not recognized by the MCO without a custom pool. Maintaining the worker label allows the node to be managed by the default worker machine config pool, if no custom pools that select the custom label exists. The infra label communicates to the cluster that it does not count toward the total number of subscriptions.

Prerequisites

  • Configure additional MachineSet objects in your OpenShift Container Platform cluster.

Procedure

  1. Add a taint to the infra node to prevent scheduling user workloads on it:

    1. Determine if the node has the taint:

      $ oc describe nodes <node_name>

      Sample output

      oc describe node ci-ln-iyhx092-f76d1-nvdfm-worker-b-wln2l
      Name:               ci-ln-iyhx092-f76d1-nvdfm-worker-b-wln2l
      Roles:              worker
       ...
      Taints:             node-role.kubernetes.io/infra:NoSchedule
       ...

      This example shows that the node has a taint. You can proceed with adding a toleration to your pod in the next step.

    2. If you have not configured a taint to prevent scheduling user workloads on it:

      $ oc adm taint nodes <node_name> <key>=<value>:<effect>

      For example:

      $ oc adm taint nodes node1 node-role.kubernetes.io/infra=reserved:NoSchedule
      Tip

      You can alternatively apply the following YAML to add the taint:

      kind: Node
      apiVersion: v1
      metadata:
        name: <node_name>
        labels:
          ...
      spec:
        taints:
          - key: node-role.kubernetes.io/infra
            effect: NoSchedule
            value: reserved
        ...

      This example places a taint on node1 that has key node-role.kubernetes.io/infra and taint effect NoSchedule. Nodes with the NoSchedule effect schedule only pods that tolerate the taint, but allow existing pods to remain scheduled on the node.

      Note

      If a descheduler is used, pods violating node taints could be evicted from the cluster.

    3. Add the taint with NoExecute Effect along with the above taint with NoSchedule Effect:

      $ oc adm taint nodes <node_name> <key>=<value>:<effect>

      For example:

      $ oc adm taint nodes node1 node-role.kubernetes.io/infra=reserved:NoExecute
      Tip

      You can alternatively apply the following YAML to add the taint:

      kind: Node
      apiVersion: v1
      metadata:
        name: <node_name>
        labels:
          ...
      spec:
        taints:
          - key: node-role.kubernetes.io/infra
            effect: NoExecute
            value: reserved
        ...

      This example places a taint on node1 that has the key node-role.kubernetes.io/infra and taint effect NoExecute. Nodes with the NoExecute effect schedule only pods that tolerate the taint. The effect will remove any existing pods from the node that do not have a matching toleration.

  2. Add tolerations for the pod configurations you want to schedule on the infra node, like router, registry, and monitoring workloads. Add the following code to the Pod object specification:

    tolerations:
      - effect: NoSchedule 1
        key: node-role.kubernetes.io/infra 2
        value: reserved 3
      - effect: NoExecute 4
        key: node-role.kubernetes.io/infra 5
        operator: Exists 6
        value: reserved 7
    1
    Specify the effect that you added to the node.
    2
    Specify the key that you added to the node.
    3
    Specify the value of the key-value pair taint that you added to the node.
    4
    Specify the effect that you added to the node.
    5
    Specify the key that you added to the node.
    6
    Specify the Exists Operator to require a taint with the key node-role.kubernetes.io/infra to be present on the node.
    7
    Specify the value of the key-value pair taint that you added to the node.

    This toleration matches the taint created by the oc adm taint command. A pod with this toleration can be scheduled onto the infra node.

    Note

    Moving pods for an Operator installed via OLM to an infra node is not always possible. The capability to move Operator pods depends on the configuration of each Operator.

  3. Schedule the pod to the infra node using a scheduler. See the documentation for Controlling pod placement onto nodes for details.

Additional resources

4.8. Moving resources to infrastructure machine sets

Some of the infrastructure resources are deployed in your cluster by default. You can move them to the infrastructure machine sets that you created.

4.8.1. Moving the router

You can deploy the router pod to a different compute machine set. By default, the pod is deployed to a worker node.

Prerequisites

  • Configure additional compute machine sets in your OpenShift Container Platform cluster.

Procedure

  1. View the IngressController custom resource for the router Operator:

    $ oc get ingresscontroller default -n openshift-ingress-operator -o yaml

    The command output resembles the following text:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      creationTimestamp: 2019-04-18T12:35:39Z
      finalizers:
      - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
      generation: 1
      name: default
      namespace: openshift-ingress-operator
      resourceVersion: "11341"
      selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/default
      uid: 79509e05-61d6-11e9-bc55-02ce4781844a
    spec: {}
    status:
      availableReplicas: 2
      conditions:
      - lastTransitionTime: 2019-04-18T12:36:15Z
        status: "True"
        type: Available
      domain: apps.<cluster>.example.com
      endpointPublishingStrategy:
        type: LoadBalancerService
      selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
  2. Edit the ingresscontroller resource and change the nodeSelector to use the infra label:

    $ oc edit ingresscontroller default -n openshift-ingress-operator
      spec:
        nodePlacement:
          nodeSelector: 1
            matchLabels:
              node-role.kubernetes.io/infra: ""
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/infra
            value: reserved
          - effect: NoExecute
            key: node-role.kubernetes.io/infra
            value: reserved
    1
    Add a nodeSelector parameter with the appropriate value to the component you want to move. You can use a nodeSelector in the format shown or use <key>: <value> pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
  3. Confirm that the router pod is running on the infra node.

    1. View the list of router pods and note the node name of the running pod:

      $ oc get pod -n openshift-ingress -o wide

      Example output

      NAME                              READY     STATUS        RESTARTS   AGE       IP           NODE                           NOMINATED NODE   READINESS GATES
      router-default-86798b4b5d-bdlvd   1/1      Running       0          28s       10.130.2.4   ip-10-0-217-226.ec2.internal   <none>           <none>
      router-default-955d875f4-255g8    0/1      Terminating   0          19h       10.129.2.4   ip-10-0-148-172.ec2.internal   <none>           <none>

      In this example, the running pod is on the ip-10-0-217-226.ec2.internal node.

    2. View the node status of the running pod:

      $ oc get node <node_name> 1
      1
      Specify the <node_name> that you obtained from the pod list.

      Example output

      NAME                          STATUS  ROLES         AGE   VERSION
      ip-10-0-217-226.ec2.internal  Ready   infra,worker  17h   v1.30.3

      Because the role list includes infra, the pod is running on the correct node.

4.8.2. Moving the default registry

You configure the registry Operator to deploy its pods to different nodes.

Prerequisites

  • Configure additional compute machine sets in your OpenShift Container Platform cluster.

Procedure

  1. View the config/instance object:

    $ oc get configs.imageregistry.operator.openshift.io/cluster -o yaml

    Example output

    apiVersion: imageregistry.operator.openshift.io/v1
    kind: Config
    metadata:
      creationTimestamp: 2019-02-05T13:52:05Z
      finalizers:
      - imageregistry.operator.openshift.io/finalizer
      generation: 1
      name: cluster
      resourceVersion: "56174"
      selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/cluster
      uid: 36fd3724-294d-11e9-a524-12ffeee2931b
    spec:
      httpSecret: d9a012ccd117b1e6616ceccb2c3bb66a5fed1b5e481623
      logging: 2
      managementState: Managed
      proxy: {}
      replicas: 1
      requests:
        read: {}
        write: {}
      storage:
        s3:
          bucket: image-registry-us-east-1-c92e88cad85b48ec8b312344dff03c82-392c
          region: us-east-1
    status:
    ...

  2. Edit the config/instance object:

    $ oc edit configs.imageregistry.operator.openshift.io/cluster
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              namespaces:
              - openshift-image-registry
              topologyKey: kubernetes.io/hostname
            weight: 100
      logLevel: Normal
      managementState: Managed
      nodeSelector: 1
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    1
    Add a nodeSelector parameter with the appropriate value to the component you want to move. You can use a nodeSelector in the format shown or use <key>: <value> pairs, based on the value specified for the node. If you added a taint to the infrasructure node, also add a matching toleration.
  3. Verify the registry pod has been moved to the infrastructure node.

    1. Run the following command to identify the node where the registry pod is located:

      $ oc get pods -o wide -n openshift-image-registry
    2. Confirm the node has the label you specified:

      $ oc describe node <node_name>

      Review the command output and confirm that node-role.kubernetes.io/infra is in the LABELS list.

4.8.3. Moving the monitoring solution

The monitoring stack includes multiple components, including Prometheus, Thanos Querier, and Alertmanager. The Cluster Monitoring Operator manages this stack. To redeploy the monitoring stack to infrastructure nodes, you can create and apply a custom config map.

Procedure

  1. Edit the cluster-monitoring-config config map and change the nodeSelector to use the infra label:

    $ oc edit configmap cluster-monitoring-config -n openshift-monitoring
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |+
        alertmanagerMain:
          nodeSelector: 1
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        prometheusK8s:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        prometheusOperator:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        metricsServer:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        kubeStateMetrics:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        telemeterClient:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        openshiftStateMetrics:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        thanosQuerier:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
        monitoringPlugin:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
          tolerations:
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoSchedule
          - key: node-role.kubernetes.io/infra
            value: reserved
            effect: NoExecute
    1
    Add a nodeSelector parameter with the appropriate value to the component you want to move. You can use a nodeSelector in the format shown or use <key>: <value> pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
  2. Watch the monitoring pods move to the new machines:

    $ watch 'oc get pod -n openshift-monitoring -o wide'
  3. If a component has not moved to the infra node, delete the pod with this component:

    $ oc delete pod -n openshift-monitoring <pod>

    The component from the deleted pod is re-created on the infra node.

4.9. Applying autoscaling to your cluster

Applying autoscaling to an OpenShift Container Platform cluster involves deploying a cluster autoscaler and then deploying machine autoscalers for each machine type in your cluster.

For more information, see Applying autoscaling to an OpenShift Container Platform cluster.

4.10. Configuring Linux cgroup

As of OpenShift Container Platform 4.14, OpenShift Container Platform uses Linux control group version 2 (cgroup v2) in your cluster. If you are using cgroup v1 on OpenShift Container Platform 4.13 or earlier, migrating to OpenShift Container Platform 4.14 or later will not automatically update your cgroup configuration to version 2. A fresh installation of OpenShift Container Platform 4.14 or later will use cgroup v2 by default. However, you can enable Linux control group version 1 (cgroup v1) upon installation.

cgroup v2 is the current version of the Linux cgroup API. cgroup v2 offers several improvements over cgroup v1, including a unified hierarchy, safer sub-tree delegation, new features such as Pressure Stall Information, and enhanced resource management and isolation. However, cgroup v2 has different CPU, memory, and I/O management characteristics than cgroup v1. Therefore, some workloads might experience slight differences in memory or CPU usage on clusters that run cgroup v2.

You can change between cgroup v1 and cgroup v2, as needed. Enabling cgroup v1 in OpenShift Container Platform disables all cgroup v2 controllers and hierarchies in your cluster.

Important

cgroup v1 is a deprecated feature. Deprecated functionality is still included in OpenShift Container Platform and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.

For the most recent list of major functionality that has been deprecated or removed within OpenShift Container Platform, refer to the Deprecated and removed features section of the OpenShift Container Platform release notes.

Prerequisites

  • You have a running OpenShift Container Platform cluster that uses version 4.12 or later.
  • You are logged in to the cluster as a user with administrative privileges.

Procedure

  1. Enable cgroup v1 on nodes:

    1. Edit the node.config object:

      $ oc edit nodes.config/cluster
    2. Add spec.cgroupMode: "v1":

      Example node.config object

      apiVersion: config.openshift.io/v2
      kind: Node
      metadata:
        annotations:
          include.release.openshift.io/ibm-cloud-managed: "true"
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
          release.openshift.io/create-only: "true"
        creationTimestamp: "2022-07-08T16:02:51Z"
        generation: 1
        name: cluster
        ownerReferences:
        - apiVersion: config.openshift.io/v2
          kind: ClusterVersion
          name: version
          uid: 36282574-bf9f-409e-a6cd-3032939293eb
        resourceVersion: "1865"
        uid: 0c0f7a4c-4307-4187-b591-6155695ac85b
      spec:
        cgroupMode: "v1" 1
      ...

      1
      Enables cgroup v1.

Verification

  1. Check the machine configs to see that the new machine configs were added:

    $ oc get mc

    Example output

    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    97-master-generated-kubelet                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-generated-kubelet                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23d4317815a5f854bd3553d689cfe2e9   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             10s 1
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-dcc7f1b92892d34db74d6832bcc9ccd4   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             10s

    1
    New machine configs are created, as expected.
  2. Check that the new kernelArguments were added to the new machine configs:

    $ oc describe mc <name>

    Example output for cgroup v1

    apiVersion: machineconfiguration.openshift.io/v2
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 05-worker-kernelarg-selinuxpermissive
    spec:
      kernelArguments:
        systemd.unified_cgroup_hierarchy=0 1
        systemd.legacy_systemd_cgroup_controller=1 2

    1
    Disables cgroup v2.
    2
    Enables cgroup v1 in systemd.
  3. Check the nodes to see that scheduling on the nodes is disabled. This indicates that the change is being applied:

    $ oc get nodes

    Example output

    NAME                                       STATUS                     ROLES    AGE   VERSION
    ci-ln-fm1qnwt-72292-99kt6-master-0         Ready,SchedulingDisabled   master   58m   v1.30.3
    ci-ln-fm1qnwt-72292-99kt6-master-1         Ready                      master   58m   v1.30.3
    ci-ln-fm1qnwt-72292-99kt6-master-2         Ready                      master   58m   v1.30.3
    ci-ln-fm1qnwt-72292-99kt6-worker-a-h5gt4   Ready,SchedulingDisabled   worker   48m   v1.30.3
    ci-ln-fm1qnwt-72292-99kt6-worker-b-7vtmd   Ready                      worker   48m   v1.30.3
    ci-ln-fm1qnwt-72292-99kt6-worker-c-rhzkv   Ready                      worker   48m   v1.30.3

  4. After a node returns to the Ready state, start a debug session for that node:

    $ oc debug node/<node_name>
  5. Set /host as the root directory within the debug shell:

    sh-4.4# chroot /host
  6. Check that the sys/fs/cgroup/cgroup2fs file is present on your nodes. This file is created by cgroup v1:

    $ stat -c %T -f /sys/fs/cgroup

    Example output

    cgroup2fs

4.11. Enabling Technology Preview features using FeatureGates

You can turn on a subset of the current Technology Preview features on for all nodes in the cluster by editing the FeatureGate custom resource (CR).

4.11.1. Understanding feature gates

You can use the FeatureGate custom resource (CR) to enable specific feature sets in your cluster. A feature set is a collection of OpenShift Container Platform features that are not enabled by default.

You can activate the following feature set by using the FeatureGate CR:

  • TechPreviewNoUpgrade. This feature set is a subset of the current Technology Preview features. This feature set allows you to enable these Technology Preview features on test clusters, where you can fully test them, while leaving the features disabled on production clusters.

    Warning

    Enabling the TechPreviewNoUpgrade feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.

    The following Technology Preview features are enabled by this feature set:

    • External cloud providers. Enables support for external cloud providers for clusters on vSphere, AWS, Azure, and GCP. Support for OpenStack is GA. This is an internal feature that most users do not need to interact with. (ExternalCloudProvider)
    • Swap memory on nodes. Enables swap memory use for OpenShift Container Platform workloads on a per-node basis. (NodeSwap)
    • OpenStack Machine API Provider. This gate has no effect and is planned to be removed from this feature set in a future release. (MachineAPIProviderOpenStack)
    • Insights Operator. Enables the InsightsDataGather CRD, which allows users to configure some Insights data gathering options. The feature set also enables the DataGather CRD, which allows users to run Insights data gathering on-demand. (InsightsConfigAPI)
    • Dynamic Resource Allocation API. Enables a new API for requesting and sharing resources between pods and containers. This is an internal feature that most users do not need to interact with. (DynamicResourceAllocation)
    • Pod security admission enforcement. Enables the restricted enforcement mode for pod security admission. Instead of only logging a warning, pods are rejected if they violate pod security standards. (OpenShiftPodSecurityAdmission)
    • StatefulSet pod availability upgrading limits. Enables users to define the maximum number of statefulset pods unavailable during updates which reduces application downtime. (MaxUnavailableStatefulSet)
    • gcpLabelsTags
    • vSphereStaticIPs
    • routeExternalCertificate
    • automatedEtcdBackup
    • gcpClusterHostedDNS
    • vSphereControlPlaneMachineset
    • dnsNameResolver
    • machineConfigNodes
    • metricsServer
    • installAlternateInfrastructureAWS
    • mixedCPUsAllocation
    • managedBootImages
    • onClusterBuild
    • signatureStores
    • SigstoreImageVerification
    • DisableKubeletCloudCredentialProviders
    • BareMetalLoadBalancer
    • ClusterAPIInstallAWS
    • ClusterAPIInstallAzure
    • ClusterAPIInstallNutanix
    • ClusterAPIInstallOpenStack
    • ClusterAPIInstallVSphere
    • HardwareSpeed
    • KMSv1
    • NetworkDiagnosticsConfig
    • VSphereDriverConfiguration
    • ExternalOIDC
    • ChunkSizeMiB
    • ClusterAPIInstallGCP
    • ClusterAPIInstallPowerVS
    • EtcdBackendQuota
    • InsightsConfig
    • InsightsOnDemandDataGather
    • MetricsCollectionProfiles
    • NewOLM
    • NodeDisruptionPolicy
    • PinnedImages
    • PlatformOperators
    • ServiceAccountTokenNodeBinding
    • TranslateStreamCloseWebsocketRequests
    • UpgradeStatus
    • VSphereMultiVCenters
    • VolumeGroupSnapshot
    • AdditionalRoutingCapabilities
    • BootcNodeManagement
    • ClusterMonitoringConfig
    • DNSNameResolver
    • ManagedBootImagesAWS
    • NetworkSegmentation
    • OVNObservability
    • PersistentIPsForVirtualization
    • ProcMountType
    • RouteAdvertisements
    • UserNamespacesSupport
    • AWSEFSDriverVolumeMetrics
    • AlibabaPlatform
    • AzureWorkloadIdentity
    • BuildCSIVolumes
    • CloudDualStackNodeIPs
    • ExternalCloudProviderAzure
    • ExternalCloudProviderExternal
    • ExternalCloudProviderGCP
    • IngressControllerLBSubnetsAWS
    • MultiArchInstallAWS
    • MultiArchInstallGCP
    • NetworkLiveMigration
    • PrivateHostedZoneAWS
    • SetEIPForNLBIngressController
    • ValidatingAdmissionPolicy

4.11.2. Enabling feature sets using the web console

You can use the OpenShift Container Platform web console to enable feature sets for all of the nodes in a cluster by editing the FeatureGate custom resource (CR).

Procedure

To enable feature sets:

  1. In the OpenShift Container Platform web console, switch to the AdministrationCustom Resource Definitions page.
  2. On the Custom Resource Definitions page, click FeatureGate.
  3. On the Custom Resource Definition Details page, click the Instances tab.
  4. Click the cluster feature gate, then click the YAML tab.
  5. Edit the cluster instance to add specific feature sets:

    Warning

    Enabling the TechPreviewNoUpgrade feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.

    Sample Feature Gate custom resource

    apiVersion: config.openshift.io/v1
    kind: FeatureGate
    metadata:
      name: cluster 1
    # ...
    spec:
      featureSet: TechPreviewNoUpgrade 2

    1
    The name of the FeatureGate CR must be cluster.
    2
    Add the feature set that you want to enable:
    • TechPreviewNoUpgrade enables specific Technology Preview features.

    After you save the changes, new machine configs are created, the machine config pools are updated, and scheduling on each node is disabled while the change is being applied.

Verification

You can verify that the feature gates are enabled by looking at the kubelet.conf file on a node after the nodes return to the ready state.

  1. From the Administrator perspective in the web console, navigate to ComputeNodes.
  2. Select a node.
  3. In the Node details page, click Terminal.
  4. In the terminal window, change your root directory to /host:

    sh-4.2# chroot /host
  5. View the kubelet.conf file:

    sh-4.2# cat /etc/kubernetes/kubelet.conf

    Sample output

    # ...
    featureGates:
      InsightsOperatorPullingSCA: true,
      LegacyNodeRoleBehavior: false
    # ...

    The features that are listed as true are enabled on your cluster.

    Note

    The features listed vary depending upon the OpenShift Container Platform version.

4.11.3. Enabling feature sets using the CLI

You can use the OpenShift CLI (oc) to enable feature sets for all of the nodes in a cluster by editing the FeatureGate custom resource (CR).

Prerequisites

  • You have installed the OpenShift CLI (oc).

Procedure

To enable feature sets:

  1. Edit the FeatureGate CR named cluster:

    $ oc edit featuregate cluster
    Warning

    Enabling the TechPreviewNoUpgrade feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.

    Sample FeatureGate custom resource

    apiVersion: config.openshift.io/v1
    kind: FeatureGate
    metadata:
      name: cluster 1
    # ...
    spec:
      featureSet: TechPreviewNoUpgrade 2

    1
    The name of the FeatureGate CR must be cluster.
    2
    Add the feature set that you want to enable:
    • TechPreviewNoUpgrade enables specific Technology Preview features.

    After you save the changes, new machine configs are created, the machine config pools are updated, and scheduling on each node is disabled while the change is being applied.

Verification

You can verify that the feature gates are enabled by looking at the kubelet.conf file on a node after the nodes return to the ready state.

  1. From the Administrator perspective in the web console, navigate to ComputeNodes.
  2. Select a node.
  3. In the Node details page, click Terminal.
  4. In the terminal window, change your root directory to /host:

    sh-4.2# chroot /host
  5. View the kubelet.conf file:

    sh-4.2# cat /etc/kubernetes/kubelet.conf

    Sample output

    # ...
    featureGates:
      InsightsOperatorPullingSCA: true,
      LegacyNodeRoleBehavior: false
    # ...

    The features that are listed as true are enabled on your cluster.

    Note

    The features listed vary depending upon the OpenShift Container Platform version.

4.12. etcd tasks

Back up etcd, enable or disable etcd encryption, or defragment etcd data.

Note

If you deployed a bare-metal cluster, you can scale the cluster up to 5 nodes as part of your post-installation tasks. For more information, see Node scaling for etcd.

4.12.1. About etcd encryption

By default, etcd data is not encrypted in OpenShift Container Platform. You can enable etcd encryption for your cluster to provide an additional layer of data security. For example, it can help protect the loss of sensitive data if an etcd backup is exposed to the incorrect parties.

When you enable etcd encryption, the following OpenShift API server and Kubernetes API server resources are encrypted:

  • Secrets
  • Config maps
  • Routes
  • OAuth access tokens
  • OAuth authorize tokens

When you enable etcd encryption, encryption keys are created. You must have these keys to restore from an etcd backup.

Note

Etcd encryption only encrypts values, not keys. Resource types, namespaces, and object names are unencrypted.

If etcd encryption is enabled during a backup, the static_kuberesources_<datetimestamp>.tar.gz file contains the encryption keys for the etcd snapshot. For security reasons, store this file separately from the etcd snapshot. However, this file is required to restore a previous state of etcd from the respective etcd snapshot.

4.12.2. Supported encryption types

The following encryption types are supported for encrypting etcd data in OpenShift Container Platform:

AES-CBC
Uses AES-CBC with PKCS#7 padding and a 32 byte key to perform the encryption. The encryption keys are rotated weekly.
AES-GCM
Uses AES-GCM with a random nonce and a 32 byte key to perform the encryption. The encryption keys are rotated weekly.

4.12.3. Enabling etcd encryption

You can enable etcd encryption to encrypt sensitive resources in your cluster.

Warning

Do not back up etcd resources until the initial encryption process is completed. If the encryption process is not completed, the backup might be only partially encrypted.

After you enable etcd encryption, several changes can occur:

  • The etcd encryption might affect the memory consumption of a few resources.
  • You might notice a transient affect on backup performance because the leader must serve the backup.
  • A disk I/O can affect the node that receives the backup state.

You can encrypt the etcd database in either AES-GCM or AES-CBC encryption.

Note

To migrate your etcd database from one encryption type to the other, you can modify the API server’s spec.encryption.type field. Migration of the etcd data to the new encryption type occurs automatically.

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Modify the APIServer object:

    $ oc edit apiserver
  2. Set the spec.encryption.type field to aesgcm or aescbc:

    spec:
      encryption:
        type: aesgcm 1
    1
    Set to aesgcm for AES-GCM encryption or aescbc for AES-CBC encryption.
  3. Save the file to apply the changes.

    The encryption process starts. It can take 20 minutes or longer for this process to complete, depending on the size of the etcd database.

  4. Verify that etcd encryption was successful.

    1. Review the Encrypted status condition for the OpenShift API server to verify that its resources were successfully encrypted:

      $ oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'

      The output shows EncryptionCompleted upon successful encryption:

      EncryptionCompleted
      All resources encrypted: routes.route.openshift.io

      If the output shows EncryptionInProgress, encryption is still in progress. Wait a few minutes and try again.

    2. Review the Encrypted status condition for the Kubernetes API server to verify that its resources were successfully encrypted:

      $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'

      The output shows EncryptionCompleted upon successful encryption:

      EncryptionCompleted
      All resources encrypted: secrets, configmaps

      If the output shows EncryptionInProgress, encryption is still in progress. Wait a few minutes and try again.

    3. Review the Encrypted status condition for the OpenShift OAuth API server to verify that its resources were successfully encrypted:

      $ oc get authentication.operator.openshift.io -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'

      The output shows EncryptionCompleted upon successful encryption:

      EncryptionCompleted
      All resources encrypted: oauthaccesstokens.oauth.openshift.io, oauthauthorizetokens.oauth.openshift.io

      If the output shows EncryptionInProgress, encryption is still in progress. Wait a few minutes and try again.

4.12.4. Disabling etcd encryption

You can disable encryption of etcd data in your cluster.

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Modify the APIServer object:

    $ oc edit apiserver
  2. Set the encryption field type to identity:

    spec:
      encryption:
        type: identity 1
    1
    The identity type is the default value and means that no encryption is performed.
  3. Save the file to apply the changes.

    The decryption process starts. It can take 20 minutes or longer for this process to complete, depending on the size of your cluster.

  4. Verify that etcd decryption was successful.

    1. Review the Encrypted status condition for the OpenShift API server to verify that its resources were successfully decrypted:

      $ oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'

      The output shows DecryptionCompleted upon successful decryption:

      DecryptionCompleted
      Encryption mode set to identity and everything is decrypted

      If the output shows DecryptionInProgress, decryption is still in progress. Wait a few minutes and try again.

    2. Review the Encrypted status condition for the Kubernetes API server to verify that its resources were successfully decrypted:

      $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'

      The output shows DecryptionCompleted upon successful decryption:

      DecryptionCompleted
      Encryption mode set to identity and everything is decrypted

      If the output shows DecryptionInProgress, decryption is still in progress. Wait a few minutes and try again.

    3. Review the Encrypted status condition for the OpenShift OAuth API server to verify that its resources were successfully decrypted:

      $ oc get authentication.operator.openshift.io -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'

      The output shows DecryptionCompleted upon successful decryption:

      DecryptionCompleted
      Encryption mode set to identity and everything is decrypted

      If the output shows DecryptionInProgress, decryption is still in progress. Wait a few minutes and try again.

4.12.5. Backing up etcd data

Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.

Important

Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have checked whether the cluster-wide proxy is enabled.

    Tip

    You can check whether the proxy is enabled by reviewing the output of oc get proxy cluster -o yaml. The proxy is enabled if the httpProxy, httpsProxy, and noProxy fields have values set.

Procedure

  1. Start a debug session as root for a control plane node:

    $ oc debug --as-root node/<node_name>
  2. Change your root directory to /host in the debug shell:

    sh-4.4# chroot /host
  3. If the cluster-wide proxy is enabled, export the NO_PROXY, HTTP_PROXY, and HTTPS_PROXY environment variables by running the following commands:

    $ export HTTP_PROXY=http://<your_proxy.example.com>:8080
    $ export HTTPS_PROXY=https://<your_proxy.example.com>:8080
    $ export NO_PROXY=<example.com>
  4. Run the cluster-backup.sh script in the debug shell and pass in the location to save the backup to.

    Tip

    The cluster-backup.sh script is maintained as a component of the etcd Cluster Operator and is a wrapper around the etcdctl snapshot save command.

    sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup

    Example script output

    found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6
    found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7
    found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6
    found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3
    ede95fe6b88b87ba86a03c15e669fb4aa5bf0991c180d3c6895ce72eaade54a1
    etcdctl version: 3.4.14
    API version: 3.4
    {"level":"info","ts":1624647639.0188997,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db.part"}
    {"level":"info","ts":"2021-06-25T19:00:39.030Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
    {"level":"info","ts":1624647639.0301006,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.0.5:2379"}
    {"level":"info","ts":"2021-06-25T19:00:40.215Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
    {"level":"info","ts":1624647640.6032252,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.0.5:2379","size":"114 MB","took":1.584090459}
    {"level":"info","ts":1624647640.6047094,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db"}
    Snapshot saved at /home/core/assets/backup/snapshot_2021-06-25_190035.db
    {"hash":3866667823,"revision":31407,"totalKey":12828,"totalSize":114446336}
    snapshot db and kube resources are successfully saved to /home/core/assets/backup

    In this example, two files are created in the /home/core/assets/backup/ directory on the control plane host:

    • snapshot_<datetimestamp>.db: This file is the etcd snapshot. The cluster-backup.sh script confirms its validity.
    • static_kuberesources_<datetimestamp>.tar.gz: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.

      Note

      If etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required to restore from the etcd snapshot.

      Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted.

4.12.6. Defragmenting etcd data

For large and dense clusters, etcd can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment etcd to free up space in the data store. Monitor Prometheus for etcd metrics and defragment it when required; otherwise, etcd can raise a cluster-wide alarm that puts the cluster into a maintenance mode that accepts only key reads and deletes.

Monitor these key metrics:

  • etcd_server_quota_backend_bytes, which is the current quota limit
  • etcd_mvcc_db_total_size_in_use_in_bytes, which indicates the actual database usage after a history compaction
  • etcd_mvcc_db_total_size_in_bytes, which shows the database size, including free space waiting for defragmentation

Defragment etcd data to reclaim disk space after events that cause disk fragmentation, such as etcd history compaction.

History compaction is performed automatically every five minutes and leaves gaps in the back-end database. This fragmented space is available for use by etcd, but is not available to the host file system. You must defragment etcd to make this space available to the host file system.

Defragmentation occurs automatically, but you can also trigger it manually.

Note

Automatic defragmentation is good for most cases, because the etcd operator uses cluster information to determine the most efficient operation for the user.

4.12.6.1. Automatic defragmentation

The etcd Operator automatically defragments disks. No manual intervention is needed.

Verify that the defragmentation process is successful by viewing one of these logs:

  • etcd logs
  • cluster-etcd-operator pod
  • operator status error log
Warning

Automatic defragmentation can cause leader election failure in various OpenShift core components, such as the Kubernetes controller manager, which triggers a restart of the failing component. The restart is harmless and either triggers failover to the next running instance or the component resumes work again after the restart.

Example log output for successful defragmentation

etcd member has been defragmented: <member_name>, memberID: <member_id>

Example log output for unsuccessful defragmentation

failed defrag on member: <member_name>, memberID: <member_id>: <error_message>

4.12.6.2. Manual defragmentatio