Postinstallation configuration


OpenShift Container Platform 4.16

Day 2 operations for OpenShift Container Platform

Red Hat OpenShift Documentation Team

Abstract

This document provides instructions and guidance on post installation activities for OpenShift Container Platform.

Chapter 1. Postinstallation configuration overview

After installing OpenShift Container Platform, a cluster administrator can configure and customize the following components:

  • Machine
  • Bare metal
  • Cluster
  • Node
  • Network
  • Storage
  • Users
  • Alerts and notifications

1.1. Post-installation configuration tasks

You can perform the post-installation configuration tasks to configure your environment to meet your need.

The following lists details these configurations:

  • Configure operating system features: The Machine Config Operator (MCO) manages MachineConfig objects. By using the MCO, you can configure nodes and custom resources.
  • Configure bare metal nodes: You can use the Bare Metal Operator (BMO) to manage bare metal hosts. The BMO can complete the following operations:

    • Inspects hardware details of the host and report them to the bare metal host.
    • Inspect firmware and configure BIOS settings.
    • Provision hosts with a desired image.
    • Clean disk contents for the host before or after provisioning the host.
  • Configure cluster features. You can modify the following features of an OpenShift Container Platform cluster:

    • Image registry
    • Networking configuration
    • Image build behavior
    • Identity provider
    • The etcd configuration
    • Machine set creation to handle the workloads
    • Cloud provider credential management
  • Configuring a private cluster: By default, the installation program provisions OpenShift Container Platform by using a publicly accessible DNS and endpoints. To make your cluster accessible only from within an internal network, configure the following components to make them private:

    • DNS
    • Ingress Controller
    • API server
  • Perform node operations: By default, OpenShift Container Platform uses Red Hat Enterprise Linux CoreOS (RHCOS) compute machines. You can perform the following node operations:

    • Add and remove compute machines.
    • Add and remove taints and tolerations.
    • Configure the maximum number of pods per node.
    • Enable Device Manager.
  • Configure users: OAuth access tokens allow users to authenticate themselves to the API. You can configure OAuth to perform the following tasks:
  • Specify an identity provider
  • Use role-based access control to define and supply permissions to users
  • Install an Operator from OperatorHub
  • Configuring alert notifications: By default, firing alerts are displayed on the Alerting UI of the web console. You can also configure OpenShift Container Platform to send alert notifications to external systems.

Chapter 2. Configuring a private cluster

After you install an OpenShift Container Platform version 4.16 cluster, you can set some of its core components to be private.

2.1. About private clusters

By default, OpenShift Container Platform is provisioned using publicly-accessible DNS and endpoints. You can set the DNS, Ingress Controller, and API server to private after you deploy your private cluster.

Important

If the cluster has any public subnets, load balancer services created by administrators might be publicly accessible. To ensure cluster security, verify that these services are explicitly annotated as private.

DNS

If you install OpenShift Container Platform on installer-provisioned infrastructure, the installation program creates records in a pre-existing public zone and, where possible, creates a private zone for the cluster’s own DNS resolution. In both the public zone and the private zone, the installation program or cluster creates DNS entries for *.apps, for the Ingress object, and api, for the API server.

The *.apps records in the public and private zone are identical, so when you delete the public zone, the private zone seamlessly provides all DNS resolution for the cluster.

Ingress Controller

Because the default Ingress object is created as public, the load balancer is internet-facing and in the public subnets.

The Ingress Operator generates a default certificate for an Ingress Controller to serve as a placeholder until you configure a custom default certificate. Do not use Operator-generated default certificates in production clusters. The Ingress Operator does not rotate its own signing certificate or the default certificates that it generates. Operator-generated default certificates are intended as placeholders for custom default certificates that you configure.

API server

By default, the installation program creates appropriate network load balancers for the API server to use for both internal and external traffic.

On Amazon Web Services (AWS), separate public and private load balancers are created. The load balancers are identical except that an additional port is available on the internal one for use within the cluster. Although the installation program automatically creates or destroys the load balancer based on API server requirements, the cluster does not manage or maintain them. As long as you preserve the cluster’s access to the API server, you can manually modify or move the load balancers. For the public load balancer, port 6443 is open and the health check is configured for HTTPS against the /readyz path.

On Google Cloud Platform, a single load balancer is created to manage both internal and external API traffic, so you do not need to modify the load balancer.

On Microsoft Azure, both public and private load balancers are created. However, because of limitations in current implementation, you just retain both load balancers in a private cluster.

2.2. Configuring DNS records to be published in a private zone

For all OpenShift Container Platform clusters, whether public or private, DNS records are published in a public zone by default.

You can remove the public zone from the cluster DNS configuration to avoid exposing DNS records to the public. You might want to avoid exposing sensitive information, such as internal domain names, internal IP addresses, or the number of clusters at an organization, or you might simply have no need to publish records publicly. If all the clients that should be able to connect to services within the cluster use a private DNS service that has the DNS records from the private zone, then there is no need to have a public DNS record for the cluster.

After you deploy a cluster, you can modify its DNS to use only a private zone by modifying the DNS custom resource (CR). Modifying the DNS CR in this way means that any DNS records that are subsequently created are not published to public DNS servers, which keeps knowledge of the DNS records isolated to internal users. This can be done when you configure the cluster to be private, or if you never want DNS records to be publicly resolvable.

Alternatively, even in a private cluster, you might keep the public zone for DNS records because it allows clients to resolve DNS names for applications running on that cluster. For example, an organization can have machines that connect to the public internet and then establish VPN connections for certain private IP ranges in order to connect to private IP addresses. The DNS lookups from these machines use the public DNS to determine the private addresses of those services, and then connect to the private addresses over the VPN.

Procedure

  1. Review the DNS CR for your cluster by running the following command and observing the output:

    $ oc get dnses.config.openshift.io/cluster -o yaml

    Example output

    apiVersion: config.openshift.io/v1
    kind: DNS
    metadata:
      creationTimestamp: "2019-10-25T18:27:09Z"
      generation: 2
      name: cluster
      resourceVersion: "37966"
      selfLink: /apis/config.openshift.io/v1/dnses/cluster
      uid: 0e714746-f755-11f9-9cb1-02ff55d8f976
    spec:
      baseDomain: <base_domain>
      privateZone:
        tags:
          Name: <infrastructure_id>-int
          kubernetes.io/cluster/<infrastructure_id>: owned
      publicZone:
        id: Z2XXXXXXXXXXA4
    status: {}

    Note that the spec section contains both a private and a public zone.

  2. Patch the DNS CR to remove the public zone by running the following command:

    $ oc patch dnses.config.openshift.io/cluster --type=merge --patch='{"spec": {"publicZone": null}}'

    Example output

    dns.config.openshift.io/cluster patched

    The Ingress Operator consults the DNS CR definition when it creates DNS records for IngressController objects. If only private zones are specified, only private records are created.

    Important

    Existing DNS records are not modified when you remove the public zone. You must manually delete previously published public DNS records if you no longer want them to be published publicly.

Verification

  • Review the DNS CR for your cluster and confirm that the public zone was removed, by running the following command and observing the output:

    $ oc get dnses.config.openshift.io/cluster -o yaml

    Example output

    apiVersion: config.openshift.io/v1
    kind: DNS
    metadata:
      creationTimestamp: "2019-10-25T18:27:09Z"
      generation: 2
      name: cluster
      resourceVersion: "37966"
      selfLink: /apis/config.openshift.io/v1/dnses/cluster
      uid: 0e714746-f755-11f9-9cb1-02ff55d8f976
    spec:
      baseDomain: <base_domain>
      privateZone:
        tags:
          Name: <infrastructure_id>-int
          kubernetes.io/cluster/<infrastructure_id>-wfpg4: owned
    status: {}

2.3. Setting the Ingress Controller to private

After you deploy a cluster, you can modify its Ingress Controller to use only a private zone.

Procedure

  1. Modify the default Ingress Controller to use only an internal endpoint:

    $ oc replace --force --wait --filename - <<EOF
    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      namespace: openshift-ingress-operator
      name: default
    spec:
      endpointPublishingStrategy:
        type: LoadBalancerService
        loadBalancer:
          scope: Internal
    EOF

    Example output

    ingresscontroller.operator.openshift.io "default" deleted
    ingresscontroller.operator.openshift.io/default replaced

    The public DNS entry is removed, and the private zone entry is updated.

2.4. Restricting the API server to private

After you deploy a cluster to Amazon Web Services (AWS) or Microsoft Azure, you can reconfigure the API server to use only the private zone.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Have access to the web console as a user with admin privileges.

Procedure

  1. In the web portal or console for your cloud provider, take the following actions:

    1. Locate and delete the appropriate load balancer component:

      • For AWS, delete the external load balancer. The API DNS entry in the private zone already points to the internal load balancer, which uses an identical configuration, so you do not need to modify the internal load balancer.
      • For Azure, delete the api-internal-v4 rule for the public load balancer.
    2. For Azure, configure the Ingress Controller endpoint publishing scope to Internal. For more information, see "Configuring the Ingress Controller endpoint publishing scope to Internal".
    3. For the Azure public load balancer, if you configure the Ingress Controller endpoint publishing scope to Internal and there are no existing inbound rules in the public load balancer, you must create an outbound rule explicitly to provide outbound traffic for the backend address pool. For more information, see the Microsoft Azure documentation about adding outbound rules.
    4. Delete the api.$clustername.$yourdomain or api.$clustername DNS entry in the public zone.
  2. AWS clusters: Remove the external load balancers:

    Important

    You can run the following steps only for an installer-provisioned infrastructure (IPI) cluster. For a user-provisioned infrastructure (UPI) cluster, you must manually remove or disable the external load balancers.

    • If your cluster uses a control plane machine set, delete the lines in the control plane machine set custom resource that configure your public or external load balancer:

      # ...
      providerSpec:
        value:
      # ...
          loadBalancers:
          - name: lk4pj-ext 1
            type: network 2
          - name: lk4pj-int
            type: network
      # ...
      1
      Delete the name value for the external load balancer, which ends in -ext.
      2
      Delete the type value for the external load balancer.
    • If your cluster does not use a control plane machine set, you must delete the external load balancers from each control plane machine.

      1. From your terminal, list the cluster machines by running the following command:

        $ oc get machine -n openshift-machine-api

        Example output

        NAME                            STATE     TYPE        REGION      ZONE         AGE
        lk4pj-master-0                  running   m4.xlarge   us-east-1   us-east-1a   17m
        lk4pj-master-1                  running   m4.xlarge   us-east-1   us-east-1b   17m
        lk4pj-master-2                  running   m4.xlarge   us-east-1   us-east-1a   17m
        lk4pj-worker-us-east-1a-5fzfj   running   m4.xlarge   us-east-1   us-east-1a   15m
        lk4pj-worker-us-east-1a-vbghs   running   m4.xlarge   us-east-1   us-east-1a   15m
        lk4pj-worker-us-east-1b-zgpzg   running   m4.xlarge   us-east-1   us-east-1b   15m

        The control plane machines contain master in the name.

      2. Remove the external load balancer from each control plane machine:

        1. Edit a control plane machine object to by running the following command:

          $ oc edit machines -n openshift-machine-api <control_plane_name> 1
          1
          Specify the name of the control plane machine object to modify.
        2. Remove the lines that describe the external load balancer, which are marked in the following example:

          # ...
          providerSpec:
            value:
          # ...
              loadBalancers:
              - name: lk4pj-ext 1
                type: network 2
              - name: lk4pj-int
                type: network
          # ...
          1
          Delete the name value for the external load balancer, which ends in -ext.
          2
          Delete the type value for the external load balancer.
        3. Save your changes and exit the object specification.
        4. Repeat this process for each of the control plane machines.

2.5. Configuring a private storage endpoint on Azure

You can leverage the Image Registry Operator to use private endpoints on Azure, which enables seamless configuration of private storage accounts when OpenShift Container Platform is deployed on private Azure clusters. This allows you to deploy the image registry without exposing public-facing storage endpoints.

Important

Do not configure a private storage endpoint on Microsoft Azure Red Hat OpenShift (ARO), because the endpoint can put your Microsoft Azure Red Hat OpenShift cluster in an unrecoverable state.

You can configure the Image Registry Operator to use private storage endpoints on Azure in one of two ways:

  • By configuring the Image Registry Operator to discover the VNet and subnet names
  • With user-provided Azure Virtual Network (VNet) and subnet names

2.5.1. Limitations for configuring a private storage endpoint on Azure

The following limitations apply when configuring a private storage endpoint on Azure:

  • When configuring the Image Registry Operator to use a private storage endpoint, public network access to the storage account is disabled. Consequently, pulling images from the registry outside of OpenShift Container Platform only works by setting disableRedirect: true in the registry Operator configuration. With redirect enabled, the registry redirects the client to pull images directly from the storage account, which will no longer work due to disabled public network access. For more information, see "Disabling redirect when using a private storage endpoint on Azure".
  • This operation cannot be undone by the Image Registry Operator.

2.5.2. Configuring a private storage endpoint on Azure by enabling the Image Registry Operator to discover VNet and subnet names

The following procedure shows you how to set up a private storage endpoint on Azure by configuring the Image Registry Operator to discover VNet and subnet names.

Prerequisites

  • You have configured the image registry to run on Azure.
  • Your network has been set up using the Installer Provisioned Infrastructure installation method.

    For users with a custom network setup, see "Configuring a private storage endpoint on Azure with user-provided VNet and subnet names".

Procedure

  1. Edit the Image Registry Operator config object and set networkAccess.type to Internal:

    $ oc edit configs.imageregistry/cluster
    # ...
    spec:
      # ...
       storage:
          azure:
            # ...
            networkAccess:
              type: Internal
    # ...
  2. Optional: Enter the following command to confirm that the Operator has completed provisioning. This might take a few minutes.

    $ oc get configs.imageregistry/cluster -o=jsonpath="{.spec.storage.azure.privateEndpointName}" -w
  3. Optional: If the registry is exposed by a route, and you are configuring your storage account to be private, you must disable redirect if you want pulls external to the cluster to continue to work. Enter the following command to disable redirect on the Image Operator configuration:

    $ oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"disableRedirect": true}}'
    Note

    When redirect is enabled, pulling images from outside of the cluster will not work.

Verification

  1. Fetch the registry service name by running the following command:

    $ oc registry info --internal=true

    Example output

    image-registry.openshift-image-registry.svc:5000

  2. Enter debug mode by running the following command:

    $ oc debug node/<node_name>
  3. Run the suggested chroot command. For example:

    $ chroot /host
  4. Enter the following command to log in to your container registry:

    $ podman login --tls-verify=false -u unused -p $(oc whoami -t) image-registry.openshift-image-registry.svc:5000

    Example output

    Login Succeeded!

  5. Enter the following command to verify that you can pull an image from the registry:

    $ podman pull --tls-verify=false image-registry.openshift-image-registry.svc:5000/openshift/tools

    Example output

    Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/tools/openshift/tools...
    Getting image source signatures
    Copying blob 6b245f040973 done
    Copying config 22667f5368 done
    Writing manifest to image destination
    Storing signatures
    22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9

2.5.3. Configuring a private storage endpoint on Azure with user-provided VNet and subnet names

Use the following procedure to configure a storage account that has public network access disabled and is exposed behind a private storage endpoint on Azure.

Prerequisites

  • You have configured the image registry to run on Azure.
  • You must know the VNet and subnet names used for your Azure environment.
  • If your network was configured in a separate resource group in Azure, you must also know its name.

Procedure

  1. Edit the Image Registry Operator config object and configure the private endpoint using your VNet and subnet names:

    $ oc edit configs.imageregistry/cluster
    # ...
    spec:
      # ...
       storage:
          azure:
            # ...
            networkAccess:
              type: Internal
              internal:
                subnetName: <subnet_name>
                vnetName: <vnet_name>
                networkResourceGroupName: <network_resource_group_name>
    # ...
  2. Optional: Enter the following command to confirm that the Operator has completed provisioning. This might take a few minutes.

    $ oc get configs.imageregistry/cluster -o=jsonpath="{.spec.storage.azure.privateEndpointName}" -w
    Note

    When redirect is enabled, pulling images from outside of the cluster will not work.

Verification

  1. Fetch the registry service name by running the following command:

    $ oc registry info --internal=true

    Example output

    image-registry.openshift-image-registry.svc:5000

  2. Enter debug mode by running the following command:

    $ oc debug node/<node_name>
  3. Run the suggested chroot command. For example:

    $ chroot /host
  4. Enter the following command to log in to your container registry:

    $ podman login --tls-verify=false -u unused -p $(oc whoami -t) image-registry.openshift-image-registry.svc:5000

    Example output

    Login Succeeded!

  5. Enter the following command to verify that you can pull an image from the registry:

    $ podman pull --tls-verify=false image-registry.openshift-image-registry.svc:5000/openshift/tools

    Example output

    Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/tools/openshift/tools...
    Getting image source signatures
    Copying blob 6b245f040973 done
    Copying config 22667f5368 done
    Writing manifest to image destination
    Storing signatures
    22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9

2.5.4. Optional: Disabling redirect when using a private storage endpoint on Azure

By default, redirect is enabled when using the image registry. Redirect allows off-loading of traffic from the registry pods into the object storage, which makes pull faster. When redirect is enabled and the storage account is private, users from outside of the cluster are unable to pull images from the registry.

In some cases, users might want to disable redirect so that users from outside of the cluster can pull images from the registry.

Use the following procedure to disable redirect.

Prerequisites

  • You have configured the image registry to run on Azure.
  • You have configured a route.

Procedure

  • Enter the following command to disable redirect on the image registry configuration:

    $ oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"disableRedirect": true}}'

Verification

  1. Fetch the registry service name by running the following command:

    $ oc registry info

    Example output

    default-route-openshift-image-registry.<cluster_dns>

  2. Enter the following command to log in to your container registry:

    $ podman login --tls-verify=false -u unused -p $(oc whoami -t) default-route-openshift-image-registry.<cluster_dns>

    Example output

    Login Succeeded!

  3. Enter the following command to verify that you can pull an image from the registry:

    $ podman pull --tls-verify=false default-route-openshift-image-registry.<cluster_dns>
    /openshift/tools

    Example output

    Trying to pull default-route-openshift-image-registry.<cluster_dns>/openshift/tools...
    Getting image source signatures
    Copying blob 6b245f040973 done
    Copying config 22667f5368 done
    Writing manifest to image destination
    Storing signatures
    22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9

Chapter 3. Bare-metal configuration

When deploying OpenShift Container Platform on bare-metal hosts, there are times when you need to make changes to the host either before or after provisioning. This can include inspecting the host’s hardware, firmware, and firmware details. It can also include formatting disks or changing modifiable firmware settings.

3.1. Services for a user-managed load balancer

You can configure an OpenShift Container Platform cluster to use a user-managed load balancer in place of the default load balancer.

Important

Configuring a user-managed load balancer depends on your vendor’s load balancer.

The information and examples in this section are for guideline purposes only. Consult the vendor documentation for more specific information about the vendor’s load balancer.

Red Hat supports the following services for a user-managed load balancer:

  • Ingress Controller
  • OpenShift API
  • OpenShift MachineConfig API

You can choose whether you want to configure one or all of these services for a user-managed load balancer. Configuring only the Ingress Controller service is a common configuration option. To better understand each service, view the following diagrams:

Figure 3.1. Example network workflow that shows an Ingress Controller operating in an OpenShift Container Platform environment

An image that shows an example network workflow of an Ingress Controller operating in an OpenShift Container Platform environment.

Figure 3.2. Example network workflow that shows an OpenShift API operating in an OpenShift Container Platform environment

An image that shows an example network workflow of an OpenShift API operating in an OpenShift Container Platform environment.

Figure 3.3. Example network workflow that shows an OpenShift MachineConfig API operating in an OpenShift Container Platform environment

An image that shows an example network workflow of an OpenShift MachineConfig API operating in an OpenShift Container Platform environment.

The following configuration options are supported for user-managed load balancers:

  • Use a node selector to map the Ingress Controller to a specific set of nodes. You must assign a static IP address to each node in this set, or configure each node to receive the same IP address from the Dynamic Host Configuration Protocol (DHCP). Infrastructure nodes commonly receive this type of configuration.
  • Target all IP addresses on a subnet. This configuration can reduce maintenance overhead, because you can create and destroy nodes within those networks without reconfiguring the load balancer targets. If you deploy your ingress pods by using a machine set on a smaller network, such as a /27 or /28, you can simplify your load balancer targets.

    Tip

    You can list all IP addresses that exist in a network by checking the machine config pool’s resources.

Before you configure a user-managed load balancer for your OpenShift Container Platform cluster, consider the following information:

  • For a front-end IP address, you can use the same IP address for the front-end IP address, the Ingress Controller’s load balancer, and API load balancer. Check the vendor’s documentation for this capability.
  • For a back-end IP address, ensure that an IP address for an OpenShift Container Platform control plane node does not change during the lifetime of the user-managed load balancer. You can achieve this by completing one of the following actions:

    • Assign a static IP address to each control plane node.
    • Configure each node to receive the same IP address from the DHCP every time the node requests a DHCP lease. Depending on the vendor, the DHCP lease might be in the form of an IP reservation or a static DHCP assignment.
  • Manually define each node that runs the Ingress Controller in the user-managed load balancer for the Ingress Controller back-end service. For example, if the Ingress Controller moves to an undefined node, a connection outage can occur.

3.1.1. Configuring a user-managed load balancer

You can configure an OpenShift Container Platform cluster to use a user-managed load balancer in place of the default load balancer.

Important

Before you configure a user-managed load balancer, ensure that you read the "Services for a user-managed load balancer" section.

Read the following prerequisites that apply to the service that you want to configure for your user-managed load balancer.

Note

MetalLB, which runs on a cluster, functions as a user-managed load balancer.

OpenShift API prerequisites

  • You defined a front-end IP address.
  • TCP ports 6443 and 22623 are exposed on the front-end IP address of your load balancer. Check the following items:

    • Port 6443 provides access to the OpenShift API service.
    • Port 22623 can provide ignition startup configurations to nodes.
  • The front-end IP address and port 6443 are reachable by all users of your system with a location external to your OpenShift Container Platform cluster.
  • The front-end IP address and port 22623 are reachable only by OpenShift Container Platform nodes.
  • The load balancer backend can communicate with OpenShift Container Platform control plane nodes on port 6443 and 22623.

Ingress Controller prerequisites

  • You defined a front-end IP address.
  • TCP ports 443 and 80 are exposed on the front-end IP address of your load balancer.
  • The front-end IP address, port 80 and port 443 are be reachable by all users of your system with a location external to your OpenShift Container Platform cluster.
  • The front-end IP address, port 80 and port 443 are reachable to all nodes that operate in your OpenShift Container Platform cluster.
  • The load balancer backend can communicate with OpenShift Container Platform nodes that run the Ingress Controller on ports 80, 443, and 1936.

Prerequisite for health check URL specifications

You can configure most load balancers by setting health check URLs that determine if a service is available or unavailable. OpenShift Container Platform provides these health checks for the OpenShift API, Machine Configuration API, and Ingress Controller backend services.

The following examples show health check specifications for the previously listed backend services:

Example of a Kubernetes API health check specification

Path: HTTPS:6443/readyz
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 10
Interval: 10

Example of a Machine Config API health check specification

Path: HTTPS:22623/healthz
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 10
Interval: 10

Example of an Ingress Controller health check specification

Path: HTTP:1936/healthz/ready
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 5
Interval: 10

Procedure

  1. Configure the HAProxy Ingress Controller, so that you can enable access to the cluster from your load balancer on ports 6443, 22623, 443, and 80. Depending on your needs, you can specify the IP address of a single subnet or IP addresses from multiple subnets in your HAProxy configuration.

    Example HAProxy configuration with one listed subnet

    # ...
    listen my-cluster-api-6443
        bind 192.168.1.100:6443
        mode tcp
        balance roundrobin
      option httpchk
      http-check connect
      http-check send meth GET uri /readyz
      http-check expect status 200
        server my-cluster-master-2 192.168.1.101:6443 check inter 10s rise 2 fall 2
        server my-cluster-master-0 192.168.1.102:6443 check inter 10s rise 2 fall 2
        server my-cluster-master-1 192.168.1.103:6443 check inter 10s rise 2 fall 2
    
    listen my-cluster-machine-config-api-22623
        bind 192.168.1.100:22623
        mode tcp
        balance roundrobin
      option httpchk
      http-check connect
      http-check send meth GET uri /healthz
      http-check expect status 200
        server my-cluster-master-2 192.168.1.101:22623 check inter 10s rise 2 fall 2
        server my-cluster-master-0 192.168.1.102:22623 check inter 10s rise 2 fall 2
        server my-cluster-master-1 192.168.1.103:22623 check inter 10s rise 2 fall 2
    
    listen my-cluster-apps-443
        bind 192.168.1.100:443
        mode tcp
        balance roundrobin
      option httpchk
      http-check connect
      http-check send meth GET uri /healthz/ready
      http-check expect status 200
        server my-cluster-worker-0 192.168.1.111:443 check port 1936 inter 10s rise 2 fall 2
        server my-cluster-worker-1 192.168.1.112:443 check port 1936 inter 10s rise 2 fall 2
        server my-cluster-worker-2 192.168.1.113:443 check port 1936 inter 10s rise 2 fall 2
    
    listen my-cluster-apps-80
       bind 192.168.1.100:80
       mode tcp
       balance roundrobin
      option httpchk
      http-check connect
      http-check send meth GET uri /healthz/ready
      http-check expect status 200
        server my-cluster-worker-0 192.168.1.111:80 check port 1936 inter 10s rise 2 fall 2
        server my-cluster-worker-1 192.168.1.112:80 check port 1936 inter 10s rise 2 fall 2
        server my-cluster-worker-2 192.168.1.113:80 check port 1936 inter 10s rise 2 fall 2
    # ...

    Example HAProxy configuration with multiple listed subnets

    # ...
    listen api-server-6443
        bind *:6443
        mode tcp
          server master-00 192.168.83.89:6443 check inter 1s
          server master-01 192.168.84.90:6443 check inter 1s
          server master-02 192.168.85.99:6443 check inter 1s
          server bootstrap 192.168.80.89:6443 check inter 1s
    
    listen machine-config-server-22623
        bind *:22623
        mode tcp
          server master-00 192.168.83.89:22623 check inter 1s
          server master-01 192.168.84.90:22623 check inter 1s
          server master-02 192.168.85.99:22623 check inter 1s
          server bootstrap 192.168.80.89:22623 check inter 1s
    
    listen ingress-router-80
        bind *:80
        mode tcp
        balance source
          server worker-00 192.168.83.100:80 check inter 1s
          server worker-01 192.168.83.101:80 check inter 1s
    
    listen ingress-router-443
        bind *:443
        mode tcp
        balance source
          server worker-00 192.168.83.100:443 check inter 1s
          server worker-01 192.168.83.101:443 check inter 1s
    
    listen ironic-api-6385
        bind *:6385
        mode tcp
        balance source
          server master-00 192.168.83.89:6385 check inter 1s
          server master-01 192.168.84.90:6385 check inter 1s
          server master-02 192.168.85.99:6385 check inter 1s
          server bootstrap 192.168.80.89:6385 check inter 1s
    
    listen inspector-api-5050
        bind *:5050
        mode tcp
        balance source
          server master-00 192.168.83.89:5050 check inter 1s
          server master-01 192.168.84.90:5050 check inter 1s
          server master-02 192.168.85.99:5050 check inter 1s
          server bootstrap 192.168.80.89:5050 check inter 1s
    # ...

  2. Use the curl CLI command to verify that the user-managed load balancer and its resources are operational:

    1. Verify that the cluster machine configuration API is accessible to the Kubernetes API server resource, by running the following command and observing the response:

      $ curl https://<loadbalancer_ip_address>:6443/version --insecure

      If the configuration is correct, you receive a JSON object in response:

      {
        "major": "1",
        "minor": "11+",
        "gitVersion": "v1.11.0+ad103ed",
        "gitCommit": "ad103ed",
        "gitTreeState": "clean",
        "buildDate": "2019-01-09T06:44:10Z",
        "goVersion": "go1.10.3",
        "compiler": "gc",
        "platform": "linux/amd64"
      }
    2. Verify that the cluster machine configuration API is accessible to the Machine config server resource, by running the following command and observing the output:

      $ curl -v https://<loadbalancer_ip_address>:22623/healthz --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      Content-Length: 0
    3. Verify that the controller is accessible to the Ingress Controller resource on port 80, by running the following command and observing the output:

      $ curl -I -L -H "Host: console-openshift-console.apps.<cluster_name>.<base_domain>" http://<load_balancer_front_end_IP_address>

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 302 Found
      content-length: 0
      location: https://console-openshift-console.apps.ocp4.private.opequon.net/
      cache-control: no-cache
    4. Verify that the controller is accessible to the Ingress Controller resource on port 443, by running the following command and observing the output:

      $ curl -I -L --insecure --resolve console-openshift-console.apps.<cluster_name>.<base_domain>:443:<Load Balancer Front End IP Address> https://console-openshift-console.apps.<cluster_name>.<base_domain>

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=UlYWOyQ62LWjw2h003xtYSKlh1a0Py2hhctw0WmV2YEdhJjFyQwWcGBsja261dGLgaYO0nxzVErhiXt6QepA7g==; Path=/; Secure; SameSite=Lax
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Wed, 04 Oct 2023 16:29:38 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=1bf5e9573c9a2760c964ed1659cc1673; path=/; HttpOnly; Secure; SameSite=None
      cache-control: private
  3. Configure the DNS records for your cluster to target the front-end IP addresses of the user-managed load balancer. You must update records to your DNS server for the cluster API and applications over the load balancer.

    Examples of modified DNS records

    <load_balancer_ip_address>  A  api.<cluster_name>.<base_domain>
    A record pointing to Load Balancer Front End

    <load_balancer_ip_address>   A apps.<cluster_name>.<base_domain>
    A record pointing to Load Balancer Front End
    Important

    DNS propagation might take some time for each DNS record to become available. Ensure that each DNS record propagates before validating each record.

  4. For your OpenShift Container Platform cluster to use the user-managed load balancer, you must specify the following configuration in your cluster’s install-config.yaml file:

    # ...
    platform:
        loadBalancer:
          type: UserManaged 1
        apiVIPs:
        - <api_ip> 2
        ingressVIPs:
        - <ingress_ip> 3
    # ...
    1
    Set UserManaged for the type parameter to specify a user-managed load balancer for your cluster. The parameter defaults to OpenShiftManagedDefault, which denotes the default internal load balancer. For services defined in an openshift-kni-infra namespace, a user-managed load balancer can deploy the coredns service to pods in your cluster but ignores keepalived and haproxy services.
    2
    Required parameter when you specify a user-managed load balancer. Specify the user-managed load balancer’s public IP address, so that the Kubernetes API can communicate with the user-managed load balancer.
    3
    Required parameter when you specify a user-managed load balancer. Specify the user-managed load balancer’s public IP address, so that the user-managed load balancer can manage ingress traffic for your cluster.

Verification

  1. Use the curl CLI command to verify that the user-managed load balancer and DNS record configuration are operational:

    1. Verify that you can access the cluster API, by running the following command and observing the output:

      $ curl https://api.<cluster_name>.<base_domain>:6443/version --insecure

      If the configuration is correct, you receive a JSON object in response:

      {
        "major": "1",
        "minor": "11+",
        "gitVersion": "v1.11.0+ad103ed",
        "gitCommit": "ad103ed",
        "gitTreeState": "clean",
        "buildDate": "2019-01-09T06:44:10Z",
        "goVersion": "go1.10.3",
        "compiler": "gc",
        "platform": "linux/amd64"
        }
    2. Verify that you can access the cluster machine configuration, by running the following command and observing the output:

      $ curl -v https://api.<cluster_name>.<base_domain>:22623/healthz --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      Content-Length: 0
    3. Verify that you can access each cluster application on port, by running the following command and observing the output:

      $ curl http://console-openshift-console.apps.<cluster_name>.<base_domain> -I -L --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 302 Found
      content-length: 0
      location: https://console-openshift-console.apps.<cluster-name>.<base domain>/
      cache-control: no-cacheHTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=39HoZgztDnzjJkq/JuLJMeoKNXlfiVv2YgZc09c3TBOBU4NI6kDXaJH1LdicNhN1UsQWzon4Dor9GWGfopaTEQ==; Path=/; Secure
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Tue, 17 Nov 2020 08:42:10 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=9b714eb87e93cf34853e87a92d6894be; path=/; HttpOnly; Secure; SameSite=None
      cache-control: private
    4. Verify that you can access each cluster application on port 443, by running the following command and observing the output:

      $ curl https://console-openshift-console.apps.<cluster_name>.<base_domain> -I -L --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=UlYWOyQ62LWjw2h003xtYSKlh1a0Py2hhctw0WmV2YEdhJjFyQwWcGBsja261dGLgaYO0nxzVErhiXt6QepA7g==; Path=/; Secure; SameSite=Lax
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Wed, 04 Oct 2023 16:29:38 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=1bf5e9573c9a2760c964ed1659cc1673; path=/; HttpOnly; Secure; SameSite=None
      cache-control: private

3.2. About the Bare Metal Operator

Use the Bare Metal Operator (BMO) to provision, manage, and inspect bare-metal hosts in your cluster.

The BMO uses the following resources to complete these tasks:

  • BareMetalHost
  • HostFirmwareSettings
  • FirmwareSchema
  • HostFirmwareComponents

The BMO maintains an inventory of the physical hosts in the cluster by mapping each bare-metal host to an instance of the BareMetalHost custom resource definition. Each BareMetalHost resource features hardware, software, and firmware details. The BMO continually inspects the bare-metal hosts in the cluster to ensure each BareMetalHost resource accurately details the components of the corresponding host.

The BMO also uses the HostFirmwareSettings resource, the FirmwareSchema resource, and the HostFirmwareComponents resource to detail firmware specifications and upgrade or downgrade firmware for the bare-metal host.

The BMO interfaces with bare-metal hosts in the cluster by using the Ironic API service. The Ironic service uses the Baseboard Management Controller (BMC) on the host to interface with the machine.

Some common tasks you can complete by using the BMO include the following:

  • Provision bare-metal hosts to the cluster with a specific image
  • Format a host’s disk contents before provisioning or after deprovisioning
  • Turn on or off a host
  • Change firmware settings
  • View the host’s hardware details
  • Upgrade or downgrade a host’s firmware to a specific version

3.2.1. Bare Metal Operator architecture

The Bare Metal Operator (BMO) uses the following resources to provision, manage, and inspect bare-metal hosts in your cluster. The following diagram illustrates the architecture of these resources:

BMO architecture overview

BareMetalHost

The BareMetalHost resource defines a physical host and its properties. When you provision a bare-metal host to the cluster, you must define a BareMetalHost resource for that host. For ongoing management of the host, you can inspect the information in the BareMetalHost or update this information.

The BareMetalHost resource features provisioning information such as the following:

  • Deployment specifications such as the operating system boot image or the custom RAM disk
  • Provisioning state
  • Baseboard Management Controller (BMC) address
  • Desired power state

The BareMetalHost resource features hardware information such as the following:

  • Number of CPUs
  • MAC address of a NIC
  • Size of the host’s storage device
  • Current power state

HostFirmwareSettings

You can use the HostFirmwareSettings resource to retrieve and manage the firmware settings for a host. When a host moves to the Available state, the Ironic service reads the host’s firmware settings and creates the HostFirmwareSettings resource. There is a one-to-one mapping between the BareMetalHost resource and the HostFirmwareSettings resource.

You can use the HostFirmwareSettings resource to inspect the firmware specifications for a host or to update a host’s firmware specifications.

Note

You must adhere to the schema specific to the vendor firmware when you edit the spec field of the HostFirmwareSettings resource. This schema is defined in the read-only FirmwareSchema resource.

FirmwareSchema

Firmware settings vary among hardware vendors and host models. A FirmwareSchema resource is a read-only resource that contains the types and limits for each firmware setting on each host model. The data comes directly from the BMC by using the Ironic service. The FirmwareSchema resource enables you to identify valid values you can specify in the spec field of the HostFirmwareSettings resource.

A FirmwareSchema resource can apply to many BareMetalHost resources if the schema is the same.

HostFirmwareComponents

Metal3 provides the HostFirmwareComponents resource, which describes BIOS and baseboard management controller (BMC) firmware versions. You can upgrade or downgrade the host’s firmware to a specific version by editing the spec field of the HostFirmwareComponents resource. This is useful when deploying with validated patterns that have been tested against specific firmware versions.

3.2.2. Optional: Creating a manifest object that includes a customized br-ex bridge

As an alternative to using the configure-ovs.sh shell script to set a br-ex bridge on a bare-metal platform, you can create a NodeNetworkConfigurationPolicy custom resource (CR) that includes an NMState configuration file. The NMState configuration file creates a customized br-ex bridge network configuration on each node in your cluster.

Important

Creating a NodeNetworkConfigurationPolicy CR that includes a customized br-ex bridge is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

This feature supports the following tasks:

  • Modifying the maximum transmission unit (MTU) for your cluster.
  • Modifying attributes of a different bond interface, such as MIImon (Media Independent Interface Monitor), bonding mode, or Quality of Service (QoS).
  • Updating DNS values.

Consider the following use cases for creating a manifest object that includes a customized br-ex bridge:

  • You want to make postinstallation changes to the bridge, such as changing the Open vSwitch (OVS) or OVN-Kubernetes br-ex bridge network. The configure-ovs.sh shell script does not support making postinstallation changes to the bridge.
  • You want to deploy the bridge on a different interface than the interface available on a host or server IP address.
  • You want to make advanced configurations to the bridge that are not possible with the configure-ovs.sh shell script. Using the script for these configurations might result in the bridge failing to connect multiple network interfaces and facilitating data forwarding between the interfaces.

Prerequisites

  • You set a customized br-ex by using the alternative method to configure-ovs.
  • You installed the Kubernetes NMState Operator.

Procedure

  • Create a NodeNetworkConfigurationPolicy (NNCP) CR and define a customized br-ex bridge network configuration. Depending on your needs, ensure that you set a masquerade IP for either the ipv4.address.ip, ipv6.address.ip, or both parameters. A masquerade IP address must match an in-use IP address block.

    Important

    As a post-installation task, you can configure most parameters for a customized br-ex bridge that you defined in an existing NNCP CR, except for the IP address.

    Example of an NNCP CR that sets IPv6 and IPv4 masquerade IP addresses

    apiVersion: nmstate.io/v1
    kind: NodeNetworkConfigurationPolicy
    metadata:
      name: worker-0-br-ex 1
    spec:
      nodeSelector:
        kubernetes.io/hostname: worker-0
        desiredState:
        interfaces:
        - name: enp2s0 2
          type: ethernet 3
          state: up 4
          ipv4:
            enabled: false 5
          ipv6:
            enabled: false
        - name: br-ex
          type: ovs-bridge
          state: up
          ipv4:
            enabled: false
            dhcp: false
          ipv6:
            enabled: false
            dhcp: false
          bridge:
            port:
            - name: enp2s0 6
            - name: br-ex
        - name: br-ex
          type: ovs-interface
          state: up
          copy-mac-from: enp2s0
          ipv4:
            enabled: true
            dhcp: true
            address:
            - ip: "169.254.169.2"
              prefix-length: 29
          ipv6:
            enabled: false
            dhcp: false
            address:
            - ip: "fd69::2"
            prefix-length: 125

    1
    Name of the policy.
    2
    Name of the interface.
    3
    The type of ethernet.
    4
    The requested state for the interface after creation.
    5
    Disables IPv4 and IPv6 in this example.
    6
    The node NIC to which the bridge is attached.

3.3. About the BareMetalHost resource

Metal3 introduces the concept of the BareMetalHost resource, which defines a physical host and its properties. The BareMetalHost resource contains two sections:

  1. The BareMetalHost spec
  2. The BareMetalHost status

3.3.1. The BareMetalHost spec

The spec section of the BareMetalHost resource defines the desired state of the host.

Table 3.1. BareMetalHost spec
ParametersDescription

automatedCleaningMode

An interface to enable or disable automated cleaning during provisioning and de-provisioning. When set to disabled, it skips automated cleaning. When set to metadata, automated cleaning is enabled. The default setting is metadata.

bmc:
  address:
  credentialsName:
  disableCertificateVerification:

The bmc configuration setting contains the connection information for the baseboard management controller (BMC) on the host. The fields are:

  • address: The URL for communicating with the host’s BMC controller.
  • credentialsName: A reference to a secret containing the username and password for the BMC.
  • disableCertificateVerification: A boolean to skip certificate validation when set to true.

bootMACAddress

The MAC address of the NIC used for provisioning the host.

bootMode

The boot mode of the host. It defaults to UEFI, but it can also be set to legacy for BIOS boot, or UEFISecureBoot.

consumerRef

A reference to another resource that is using the host. It could be empty if another resource is not currently using the host. For example, a Machine resource might use the host when the machine-api is using the host.

description

A human-provided string to help identify the host.

externallyProvisioned

A boolean indicating whether the host provisioning and deprovisioning are managed externally. When set:

  • Power status can still be managed using the online field.
  • Hardware inventory will be monitored, but no provisioning or deprovisioning operations are performed on the host.

firmware

Contains information about the BIOS configuration of bare metal hosts. Currently, firmware is only supported by iRMC, iDRAC, iLO4 and iLO5 BMCs. The sub fields are:

  • simultaneousMultithreadingEnabled: Allows a single physical processor core to appear as several logical processors. Valid settings are true or false.
  • sriovEnabled: SR-IOV support enables a hypervisor to create virtual instances of a PCI-express device, potentially increasing performance. Valid settings are true or false.
  • virtualizationEnabled: Supports the virtualization of platform hardware. Valid settings are true or false.
image:
  url:
  checksum:
  checksumType:
  format:

The image configuration setting holds the details for the image to be deployed on the host. Ironic requires the image fields. However, when the externallyProvisioned configuration setting is set to true and the external management does not require power control, the fields can be empty. The setting supports the following fields:

  • url: The URL of an image to deploy to the host.
  • checksum: The actual checksum or a URL to a file containing the checksum for the image at image.url.
  • checksumType: You can specify checksum algorithms. Currently image.checksumType only supports md5, sha256, and sha512. The default checksum type is md5.
  • format: This is the disk format of the image. It can be one of raw, qcow2, vdi, vmdk, live-iso or be left unset. Setting it to raw enables raw image streaming in the Ironic agent for that image. Setting it to live-iso enables iso images to live boot without deploying to disk, and it ignores the checksum fields.

networkData

A reference to the secret containing the network configuration data and its namespace, so that it can be attached to the host before the host boots to set up the network.

online

A boolean indicating whether the host should be powered on (true) or off (false). Changing this value will trigger a change in the power state of the physical host.

raid:
  hardwareRAIDVolumes:
  softwareRAIDVolumes:

(Optional) Contains the information about the RAID configuration for bare metal hosts. If not specified, it retains the current configuration.

Note

OpenShift Container Platform 4.16 supports hardware RAID on the installation drive for BMCs, including:

  • Fujitsu iRMC with support for RAID levels 0, 1, 5, 6, and 10
  • Dell iDRAC using the Redfish API with firmware version 6.10.30.20 or later and RAID levels 0, 1, and 5

OpenShift Container Platform 4.16 does not support software RAID on the installation drive.

See the following configuration settings:

  • hardwareRAIDVolumes: Contains the list of logical drives for hardware RAID, and defines the desired volume configuration in the hardware RAID. If you do not specify rootDeviceHints, the first volume is the root volume. The sub-fields are:

    • level: The RAID level for the logical drive. The following levels are supported: 0,1,2,5,6,1+0,5+0,6+0.
    • name: The name of the volume as a string. It should be unique within the server. If not specified, the volume name will be auto-generated.
    • numberOfPhysicalDisks: The number of physical drives as an integer to use for the logical drove. Defaults to the minimum number of disk drives required for the particular RAID level.
    • physicalDisks: The list of names of physical disk drives as a string. This is an optional field. If specified, the controller field must be specified too.
    • controller: (Optional) The name of the RAID controller as a string to use in the hardware RAID volume.
    • rotational: If set to true, it will only select rotational disk drives. If set to false, it will only select solid-state and NVMe drives. If not set, it selects any drive types, which is the default behavior.
    • sizeGibibytes: The size of the logical drive as an integer to create in GiB. If unspecified or set to 0, it will use the maximum capacity of physical drive for the logical drive.
  • softwareRAIDVolumes: OpenShift Container Platform 4.16 does not support software RAID on the installation drive. This configuration contains the list of logical disks for software RAID. If you do not specify rootDeviceHints, the first volume is the root volume. If you set HardwareRAIDVolumes, this item will be invalid. Software RAIDs will always be deleted. The number of created software RAID devices must be 1 or 2. If there is only one software RAID device, it must be RAID-1. If there are two RAID devices, the first device must be RAID-1, while the RAID level for the second device can be 0, 1, or 1+0. The first RAID device will be the deployment device, which cannot be a software RAID volume. Enforcing RAID-1 reduces the risk of a non-booting node in case of a device failure. The softwareRAIDVolume field defines the desired configuration of the volume in the software RAID. The sub-fields are:

    • level: The RAID level for the logical drive. The following levels are supported: 0,1,1+0.
    • physicalDisks: A list of device hints. The number of items should be greater than or equal to 2.
    • sizeGibibytes: The size of the logical disk drive as an integer to be created in GiB. If unspecified or set to 0, it will use the maximum capacity of physical drive for logical drive.

You can set the hardwareRAIDVolume as an empty slice to clear the hardware RAID configuration. For example:

spec:
   raid:
     hardwareRAIDVolume: []

If you receive an error message indicating that the driver does not support RAID, set the raid, hardwareRAIDVolumes or softwareRAIDVolumes to nil. You might need to ensure the host has a RAID controller.

rootDeviceHints:
  deviceName:
  hctl:
  model:
  vendor:
  serialNumber:
  minSizeGigabytes:
  wwn:
  wwnWithExtension:
  wwnVendorExtension:
  rotational:

The rootDeviceHints parameter enables provisioning of the RHCOS image to a particular device. It examines the devices in the order it discovers them, and compares the discovered values with the hint values. It uses the first discovered device that matches the hint value. The configuration can combine multiple hints, but a device must match all hints to get selected. The fields are:

  • deviceName: A string containing a Linux device name like /dev/vda. The hint must match the actual value exactly.
  • hctl: A string containing a SCSI bus address like 0:0:0:0. The hint must match the actual value exactly.
  • model: A string containing a vendor-specific device identifier. The hint can be a substring of the actual value.
  • vendor: A string containing the name of the vendor or manufacturer of the device. The hint can be a sub-string of the actual value.
  • serialNumber: A string containing the device serial number. The hint must match the actual value exactly.
  • minSizeGigabytes: An integer representing the minimum size of the device in gigabytes.
  • wwn: A string containing the unique storage identifier. The hint must match the actual value exactly.
  • wwnWithExtension: A string containing the unique storage identifier with the vendor extension appended. The hint must match the actual value exactly.
  • wwnVendorExtension: A string containing the unique vendor storage identifier. The hint must match the actual value exactly.
  • rotational: A boolean indicating whether the device should be a rotating disk (true) or not (false).

3.3.2. The BareMetalHost status

The BareMetalHost status represents the host’s current state, and includes tested credentials, current hardware details, and other information.

Table 3.2. BareMetalHost status
ParametersDescription

goodCredentials

A reference to the secret and its namespace holding the last set of baseboard management controller (BMC) credentials the system was able to validate as working.

errorMessage

Details of the last error reported by the provisioning backend, if any.

errorType

Indicates the class of problem that has caused the host to enter an error state. The error types are:

  • provisioned registration error: Occurs when the controller is unable to re-register an already provisioned host.
  • registration error: Occurs when the controller is unable to connect to the host’s baseboard management controller.
  • inspection error: Occurs when an attempt to obtain hardware details from the host fails.
  • preparation error: Occurs when cleaning fails.
  • provisioning error: Occurs when the controller fails to provision or deprovision the host.
  • power management error: Occurs when the controller is unable to modify the power state of the host.
  • detach error: Occurs when the controller is unable to detatch the host from the provisioner.
hardware:
  cpu
    arch:
    model:
    clockMegahertz:
    flags:
    count:

The hardware.cpu field details of the CPU(s) in the system. The fields include:

  • arch: The architecture of the CPU.
  • model: The CPU model as a string.
  • clockMegahertz: The speed in MHz of the CPU.
  • flags: The list of CPU flags. For example, 'mmx','sse','sse2','vmx' etc.
  • count: The number of CPUs available in the system.
hardware:
  firmware:

Contains BIOS firmware information. For example, the hardware vendor and version.

hardware:
  nics:
  - ip:
    name:
    mac:
    speedGbps:
    vlans:
    vlanId:
    pxe:

The hardware.nics field contains a list of network interfaces for the host. The fields include:

  • ip: The IP address of the NIC, if one was assigned when the discovery agent ran.
  • name: A string identifying the network device. For example, nic-1.
  • mac: The MAC address of the NIC.
  • speedGbps: The speed of the device in Gbps.
  • vlans: A list holding all the VLANs available for this NIC.
  • vlanId: The untagged VLAN ID.
  • pxe: Whether the NIC is able to boot using PXE.
hardware:
  ramMebibytes:

The host’s amount of memory in Mebibytes (MiB).

hardware:
  storage:
  - name:
    rotational:
    sizeBytes:
    serialNumber:

The hardware.storage field contains a list of storage devices available to the host. The fields include:

  • name: A string identifying the storage device. For example, disk 1 (boot).
  • rotational: Indicates whether the disk is rotational, and returns either true or false.
  • sizeBytes: The size of the storage device.
  • serialNumber: The device’s serial number.
hardware:
  systemVendor:
    manufacturer:
    productName:
    serialNumber:

Contains information about the host’s manufacturer, the productName, and the serialNumber.

lastUpdated

The timestamp of the last time the status of the host was updated.

operationalStatus

The status of the server. The status is one of the following:

  • OK: Indicates all the details for the host are known, correctly configured, working, and manageable.
  • discovered: Implies some of the host’s details are either not working correctly or missing. For example, the BMC address is known but the login credentials are not.
  • error: Indicates the system found some sort of irrecoverable error. Refer to the errorMessage field in the status section for more details.
  • delayed: Indicates that provisioning is delayed to limit simultaneous provisioning of multiple hosts.
  • detached: Indicates the host is marked unmanaged.

poweredOn

Boolean indicating whether the host is powered on.

provisioning:
  state:
  id:
  image:
  raid:
  firmware:
  rootDeviceHints:

The provisioning field contains values related to deploying an image to the host. The sub-fields include:

  • state: The current state of any ongoing provisioning operation. The states include:

    • <empty string>: There is no provisioning happening at the moment.
    • unmanaged: There is insufficient information available to register the host.
    • registering: The agent is checking the host’s BMC details.
    • match profile: The agent is comparing the discovered hardware details on the host against known profiles.
    • available: The host is available for provisioning. This state was previously known as ready.
    • preparing: The existing configuration will be removed, and the new configuration will be set on the host.
    • provisioning: The provisioner is writing an image to the host’s storage.
    • provisioned: The provisioner wrote an image to the host’s storage.
    • externally provisioned: Metal3 does not manage the image on the host.
    • deprovisioning: The provisioner is wiping the image from the host’s storage.
    • inspecting: The agent is collecting hardware details for the host.
    • deleting: The agent is deleting the from the cluster.
  • id: The unique identifier for the service in the underlying provisioning tool.
  • image: The image most recently provisioned to the host.
  • raid: The list of hardware or software RAID volumes recently set.
  • firmware: The BIOS configuration for the bare metal server.
  • rootDeviceHints: The root device selection instructions used for the most recent provisioning operation.

triedCredentials

A reference to the secret and its namespace holding the last set of BMC credentials that were sent to the provisioning backend.

3.4. Getting the BareMetalHost resource

The BareMetalHost resource contains the properties of a physical host. You must get the BareMetalHost resource for a physical host to review its properties.

Procedure

  1. Get the list of BareMetalHost resources:

    $ oc get bmh -n openshift-machine-api -o yaml
    Note

    You can use baremetalhost as the long form of bmh with oc get command.

  2. Get the list of hosts:

    $ oc get bmh -n openshift-machine-api
  3. Get the BareMetalHost resource for a specific host:

    $ oc get bmh <host_name> -n openshift-machine-api -o yaml

    Where <host_name> is the name of the host.

    Example output

    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    metadata:
      creationTimestamp: "2022-06-16T10:48:33Z"
      finalizers:
      - baremetalhost.metal3.io
      generation: 2
      name: openshift-worker-0
      namespace: openshift-machine-api
      resourceVersion: "30099"
      uid: 1513ae9b-e092-409d-be1b-ad08edeb1271
    spec:
      automatedCleaningMode: metadata
      bmc:
        address: redfish://10.46.61.19:443/redfish/v1/Systems/1
        credentialsName: openshift-worker-0-bmc-secret
        disableCertificateVerification: true
      bootMACAddress: 48:df:37:c7:f7:b0
      bootMode: UEFI
      consumerRef:
        apiVersion: machine.openshift.io/v1beta1
        kind: Machine
        name: ocp-edge-958fk-worker-0-nrfcg
        namespace: openshift-machine-api
      customDeploy:
        method: install_coreos
      online: true
      rootDeviceHints:
        deviceName: /dev/disk/by-id/scsi-<serial_number>
      userData:
        name: worker-user-data-managed
        namespace: openshift-machine-api
    status:
      errorCount: 0
      errorMessage: ""
      goodCredentials:
        credentials:
          name: openshift-worker-0-bmc-secret
          namespace: openshift-machine-api
        credentialsVersion: "16120"
      hardware:
        cpu:
          arch: x86_64
          clockMegahertz: 2300
          count: 64
          flags:
          - 3dnowprefetch
          - abm
          - acpi
          - adx
          - aes
          model: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
        firmware:
          bios:
            date: 10/26/2020
            vendor: HPE
            version: U30
        hostname: openshift-worker-0
        nics:
        - mac: 48:df:37:c7:f7:b3
          model: 0x8086 0x1572
          name: ens1f3
        ramMebibytes: 262144
        storage:
        - hctl: "0:0:0:0"
          model: VK000960GWTTB
          name: /dev/disk/by-id/scsi-<serial_number>
          sizeBytes: 960197124096
          type: SSD
          vendor: ATA
        systemVendor:
          manufacturer: HPE
          productName: ProLiant DL380 Gen10 (868703-B21)
          serialNumber: CZ200606M3
      lastUpdated: "2022-06-16T11:41:42Z"
      operationalStatus: OK
      poweredOn: true
      provisioning:
        ID: 217baa14-cfcf-4196-b764-744e184a3413
        bootMode: UEFI
        customDeploy:
          method: install_coreos
        image:
          url: ""
        raid:
          hardwareRAIDVolumes: null
          softwareRAIDVolumes: []
        rootDeviceHints:
          deviceName: /dev/disk/by-id/scsi-<serial_number>
        state: provisioned
      triedCredentials:
        credentials:
          name: openshift-worker-0-bmc-secret
          namespace: openshift-machine-api
        credentialsVersion: "16120"

3.5. Editing a BareMetalHost resource

After you deploy an OpenShift Container Platform cluster on bare metal, you might need to edit a node’s BareMetalHost resource. Consider the following examples:

  • You deploy a cluster with the Assisted Installer and need to add or edit the baseboard management controller (BMC) host name or IP address.
  • You want to move a node from one cluster to another without deprovisioning it.

Prerequisites

  • Ensure the node is in the Provisioned, ExternallyProvisioned, or Available state.

Procedure

  1. Get the list of nodes:

    $ oc get bmh -n openshift-machine-api
  2. Before editing the node’s BareMetalHost resource, detach the node from Ironic by running the following command:

    $ oc annotate baremetalhost <node_name> -n openshift-machine-api 'baremetalhost.metal3.io/detached=true' 1
    1
    Replace <node_name> with the name of the node.
  3. Edit the BareMetalHost resource by running the following command:

    $ oc edit bmh <node_name> -n openshift-machine-api
  4. Reattach the node to Ironic by running the following command:

    $ oc annotate baremetalhost <node_name> -n openshift-machine-api 'baremetalhost.metal3.io/detached'-

3.6. Troubleshooting latency when deleting a BareMetalHost resource

When the Bare Metal Operator (BMO) deletes a BareMetalHost resource, Ironic deprovisions the bare-metal host. For example, this might happen when scaling down a machine set. Deprovisioning involves a process known as "cleaning", which performs the following steps:

  • Powering off the bare-metal host
  • Booting a service RAM disk on the bare-metal host
  • Removing partitioning metadata from all disks
  • Powering off the bare-metal host again

If cleaning does not succeed, the deletion of the BareMetalHost resource will take a long time and might not finish.

Important

Do not remove the finalizers to force deletion of a BareMetalHost resource. The provisioning back-end has its own database, which maintains a host record. Running actions will continue to run even if you try to force the deletion by removing the finalizers. You might face unexpected issues when attempting to add the bare-metal host later.

Procedure

  1. If the cleaning process can recover, wait for it to finish.
  2. If cleaning cannot recover, disable the cleaning process by modifying the BareMetalHost resource and setting the automatedCleaningMode field to disabled.

See "Editing a BareMetalHost resource" for additional details.

3.7. Attaching a non-bootable ISO to a bare-metal node

You can attach a generic, non-bootable ISO virtual media image to a provisioned node by using the DataImage resource. After you apply the resource, the ISO image becomes accessible to the operating system after it has booted. This is useful for configuring a node after provisioning the operating system and before the node boots for the first time.

Prerequisites

  • The node must use Redfish or drivers derived from it to support this feature.
  • The node must be in the Provisioned or ExternallyProvisioned state.
  • The name must be the same as the name of the node defined in its BareMetalHost resource.
  • You have a valid url to the ISO image.

Procedure

  1. Create a DataImage resource:

    apiVersion: metal3.io/v1alpha1
    kind: DataImage
    metadata:
      name: <node_name> 1
    spec:
      url: "http://dataimage.example.com/non-bootable.iso" 2
    1
    Specify the name of the node as defined in its BareMetalHost resource.
    2
    Specify the URL and path to the ISO image.
  2. Save the DataImage resource to a file by running the following command:

    $ vim <node_name>-dataimage.yaml
  3. Apply the DataImage resource by running the following command:

    $ oc apply -f <node_name>-dataimage.yaml -n <node_namespace> 1
    1
    Replace <node_namespace> so that the namespace matches the namespace for the BareMetalHost resource. For example, openshift-machine-api.
  4. Reboot the node.

    Note

    To reboot the node, attach the reboot.metal3.io annotation, or reset set the online status in the BareMetalHost resource. A forced reboot of the bare-metal node will change the state of the node to NotReady for awhile. For example, 5 minutes or more.

  5. View the DataImage resource by running the following command:

    $ oc get dataimage <node_name> -n openshift-machine-api -o yaml

    Example output

    apiVersion: v1
    items:
    - apiVersion: metal3.io/v1alpha1
      kind: DataImage
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"metal3.io/v1alpha1","kind":"DataImage","metadata":{"annotations":{},"name":"bmh-node-1","namespace":"openshift-machine-api"},"spec":{"url":"http://dataimage.example.com/non-bootable.iso"}}
        creationTimestamp: "2024-06-10T12:00:00Z"
        finalizers:
        - dataimage.metal3.io
        generation: 1
        name: bmh-node-1
        namespace: openshift-machine-api
        ownerReferences:
        - apiVersion: metal3.io/v1alpha1
          blockOwnerDeletion: true
          controller: true
          kind: BareMetalHost
          name: bmh-node-1
          uid: 046cdf8e-0e97-485a-8866-e62d20e0f0b3
        resourceVersion: "21695581"
        uid: c5718f50-44b6-4a22-a6b7-71197e4b7b69
      spec:
        url: http://dataimage.example.com/non-bootable.iso
      status:
        attachedImage:
          url: http://dataimage.example.com/non-bootable.iso
        error:
          count: 0
          message: ""
        lastReconciled: "2024-06-10T12:05:00Z"

3.8. About the HostFirmwareSettings resource

You can use the HostFirmwareSettings resource to retrieve and manage the BIOS settings for a host. When a host moves to the Available state, Ironic reads the host’s BIOS settings and creates the HostFirmwareSettings resource. The resource contains the complete BIOS configuration returned from the baseboard management controller (BMC). Whereas, the firmware field in the BareMetalHost resource returns three vendor-independent fields, the HostFirmwareSettings resource typically comprises many BIOS settings of vendor-specific fields per host.

The HostFirmwareSettings resource contains two sections:

  1. The HostFirmwareSettings spec.
  2. The HostFirmwareSettings status.

3.8.1. The HostFirmwareSettings spec

The spec section of the HostFirmwareSettings resource defines the desired state of the host’s BIOS, and it is empty by default. Ironic uses the settings in the spec.settings section to update the baseboard management controller (BMC) when the host is in the Preparing state. Use the FirmwareSchema resource to ensure that you do not send invalid name/value pairs to hosts. See "About the FirmwareSchema resource" for additional details.

Example

spec:
  settings:
    ProcTurboMode: Disabled1

1
In the foregoing example, the spec.settings section contains a name/value pair that will set the ProcTurboMode BIOS setting to Disabled.
Note

Integer parameters listed in the status section appear as strings. For example, "1". When setting integers in the spec.settings section, the values should be set as integers without quotes. For example, 1.

3.8.2. The HostFirmwareSettings status

The status represents the current state of the host’s BIOS.

Table 3.3. HostFirmwareSettings
ParametersDescription
status:
  conditions:
  - lastTransitionTime:
    message:
    observedGeneration:
    reason:
    status:
    type:

The conditions field contains a list of state changes. The sub-fields include:

  • lastTransitionTime: The last time the state changed.
  • message: A description of the state change.
  • observedGeneration: The current generation of the status. If metadata.generation and this field are not the same, the status.conditions might be out of date.
  • reason: The reason for the state change.
  • status: The status of the state change. The status can be True, False or Unknown.
  • type: The type of state change. The types are Valid and ChangeDetected.
status:
  schema:
    name:
    namespace:
    lastUpdated:

The FirmwareSchema for the firmware settings. The fields include:

  • name: The name or unique identifier referencing the schema.
  • namespace: The namespace where the schema is stored.
  • lastUpdated: The last time the resource was updated.
status:
  settings:

The settings field contains a list of name/value pairs of a host’s current BIOS settings.

3.9. Getting the HostFirmwareSettings resource

The HostFirmwareSettings resource contains the vendor-specific BIOS properties of a physical host. You must get the HostFirmwareSettings resource for a physical host to review its BIOS properties.

Procedure

  1. Get the detailed list of HostFirmwareSettings resources:

    $ oc get hfs -n openshift-machine-api -o yaml
    Note

    You can use hostfirmwaresettings as the long form of hfs with the oc get command.

  2. Get the list of HostFirmwareSettings resources:

    $ oc get hfs -n openshift-machine-api
  3. Get the HostFirmwareSettings resource for a particular host

    $ oc get hfs <host_name> -n openshift-machine-api -o yaml

    Where <host_name> is the name of the host.

3.10. Editing the HostFirmwareSettings resource

You can edit the HostFirmwareSettings of provisioned hosts.

Important

You can only edit hosts when they are in the provisioned state, excluding read-only values. You cannot edit hosts in the externally provisioned state.

Procedure

  1. Get the list of HostFirmwareSettings resources:

    $ oc get hfs -n openshift-machine-api
  2. Edit a host’s HostFirmwareSettings resource:

    $ oc edit hfs <host_name> -n openshift-machine-api

    Where <host_name> is the name of a provisioned host. The HostFirmwareSettings resource will open in the default editor for your terminal.

  3. Add name/value pairs to the spec.settings section:

    Example

    spec:
      settings:
        name: value 1

    1
    Use the FirmwareSchema resource to identify the available settings for the host. You cannot set values that are read-only.
  4. Save the changes and exit the editor.
  5. Get the host’s machine name:

     $ oc get bmh <host_name> -n openshift-machine name

    Where <host_name> is the name of the host. The machine name appears under the CONSUMER field.

  6. Annotate the machine to delete it from the machineset:

    $ oc annotate machine <machine_name> machine.openshift.io/delete-machine=true -n openshift-machine-api

    Where <machine_name> is the name of the machine to delete.

  7. Get a list of nodes and count the number of worker nodes:

    $ oc get nodes
  8. Get the machineset:

    $ oc get machinesets -n openshift-machine-api
  9. Scale the machineset:

    $ oc scale machineset <machineset_name> -n openshift-machine-api --replicas=<n-1>

    Where <machineset_name> is the name of the machineset and <n-1> is the decremented number of worker nodes.

  10. When the host enters the Available state, scale up the machineset to make the HostFirmwareSettings resource changes take effect:

    $ oc scale machineset <machineset_name> -n openshift-machine-api --replicas=<n>

    Where <machineset_name> is the name of the machineset and <n> is the number of worker nodes.

3.11. Verifying the HostFirmware Settings resource is valid

When the user edits the spec.settings section to make a change to the HostFirmwareSetting(HFS) resource, the Bare Metal Operator (BMO) validates the change against the FimwareSchema resource, which is a read-only resource. If the setting is invalid, the BMO will set the Type value of the status.Condition setting to False and also generate an event and store it in the HFS resource. Use the following procedure to verify that the resource is valid.

Procedure

  1. Get a list of HostFirmwareSetting resources:

    $ oc get hfs -n openshift-machine-api
  2. Verify that the HostFirmwareSettings resource for a particular host is valid:

    $ oc describe hfs <host_name> -n openshift-machine-api

    Where <host_name> is the name of the host.

    Example output

    Events:
      Type    Reason            Age    From                                    Message
      ----    ------            ----   ----                                    -------
      Normal  ValidationFailed  2m49s  metal3-hostfirmwaresettings-controller  Invalid BIOS setting: Setting ProcTurboMode is invalid, unknown enumeration value - Foo

    Important

    If the response returns ValidationFailed, there is an error in the resource configuration and you must update the values to conform to the FirmwareSchema resource.

3.12. About the FirmwareSchema resource

BIOS settings vary among hardware vendors and host models. A FirmwareSchema resource is a read-only resource that contains the types and limits for each BIOS setting on each host model. The data comes directly from the BMC through Ironic. The FirmwareSchema enables you to identify valid values you can specify in the spec field of the HostFirmwareSettings resource. The FirmwareSchema resource has a unique identifier derived from its settings and limits. Identical host models use the same FirmwareSchema identifier. It is likely that multiple instances of HostFirmwareSettings use the same FirmwareSchema.

Table 3.4. FirmwareSchema specification
ParametersDescription
<BIOS_setting_name>
  attribute_type:
  allowable_values:
  lower_bound:
  upper_bound:
  min_length:
  max_length:
  read_only:
  unique:

The spec is a simple map consisting of the BIOS setting name and the limits of the setting. The fields include:

  • attribute_type: The type of setting. The supported types are:

    • Enumeration
    • Integer
    • String
    • Boolean
  • allowable_values: A list of allowable values when the attribute_type is Enumeration.
  • lower_bound: The lowest allowed value when attribute_type is Integer.
  • upper_bound: The highest allowed value when attribute_type is Integer.
  • min_length: The shortest string length that the value can have when attribute_type is String.
  • max_length: The longest string length that the value can have when attribute_type is String.
  • read_only: The setting is read only and cannot be modified.
  • unique: The setting is specific to this host.

3.13. Getting the FirmwareSchema resource

Each host model from each vendor has different BIOS settings. When editing the HostFirmwareSettings resource’s spec section, the name/value pairs you set must conform to that host’s firmware schema. To ensure you are setting valid name/value pairs, get the FirmwareSchema for the host and review it.

Procedure

  1. To get a list of FirmwareSchema resource instances, execute the following:

    $ oc get firmwareschema -n openshift-machine-api
  2. To get a particular FirmwareSchema instance, execute:

    $ oc get firmwareschema <instance_name> -n openshift-machine-api -o yaml

    Where <instance_name> is the name of the schema instance stated in the HostFirmwareSettings resource (see Table 3).

3.14. About the HostFirmwareComponents resource

Metal3 provides the HostFirmwareComponents resource, which describes BIOS and baseboard management controller (BMC) firmware versions. The HostFirmwareComponents resource contains two sections:

  1. The HostFirmwareComponents spec
  2. The HostFirmwareComponents status

3.14.1. HostFirmwareComponents spec

The spec section of the HostFirmwareComponents resource defines the desired state of the host’s BIOS and BMC versions.

Table 3.5. HostFirmwareComponents spec
ParametersDescription
updates:
  component:
  url:

The updates configuration setting contains the components to update. The fields are:

  • component: The name of the component. The valid settings are bios or bmc.
  • url: The URL to the component’s firmware specification and version.

3.14.2. HostFirmwareComponents status

The status section of the HostFirmwareComponents resource returns the current status of the host’s BIOS and BMC versions.

Table 3.6. HostFirmwareComponents status
ParametersDescription
components:
  component:
  initialVersion:
  currentVersion:
  lastVersionFlashed:
  updatedAt:

The components section contains the status of the components. The fields are:

  • component: The name of the firmware component. It returns bios or bmc.
  • initialVersion: The initial firmware version of the component. Ironic retrieves this information when creating the BareMetalHost resource. You cannot change it.
  • currentVersion: The current firmware version of the component. Initially, the value matches the initialVersion value until Ironic updates the firmware on the bare-metal host.
  • lastVersionFlashed: The last firmware version of the component flashed on the bare-metal host. This field returns null until Ironic updates the firmware.
  • updatedAt: The timestamp when Ironic updated the bare-metal host’s firmware.
updates:
  component:
  url:

The updates configuration setting contains the updated components. The fields are:

  • component: The name of the component.
  • url: The URL to the component’s firmware specification and version.

3.15. Getting the HostFirmwareComponents resource

The HostFirmwareComponents resource contains the specific firmware version of the BIOS and baseboard management controller (BMC) of a physical host. You must get the HostFirmwareComponents resource for a physical host to review the firmware version and status.

Procedure

  1. Get the detailed list of HostFirmwareComponents resources:

    $ oc get hostfirmwarecomponents -n openshift-machine-api -o yaml
  2. Get the list of HostFirmwareComponents resources:

    $ oc get hostfirmwarecomponents -n openshift-machine-api
  3. Get the HostFirmwareComponents resource for a particular host:

    $ oc get hostfirmwarecomponents <host_name> -n openshift-machine-api -o yaml

    Where <host_name> is the name of the host.

    Example output

    ---
    apiVersion: metal3.io/v1alpha1
    kind: HostFirmwareComponents
    metadata:
      creationTimestamp: 2024-04-25T20:32:06Z"
      generation: 1
      name: ostest-master-2
      namespace: openshift-machine-api
      ownerReferences:
      - apiVersion: metal3.io/v1alpha1
        blockOwnerDeletion: true
        controller: true
        kind: BareMetalHost
        name: ostest-master-2
        uid: 16022566-7850-4dc8-9e7d-f216211d4195
      resourceVersion: "2437"
      uid: 2038d63f-afc0-4413-8ffe-2f8e098d1f6c
    spec:
      updates: []
    status:
      components:
      - component: bios
        currentVersion: 1.0.0
        initialVersion: 1.0.0
      - component: bmc
        currentVersion: "1.00"
        initialVersion: "1.00"
      conditions:
      - lastTransitionTime: "2024-04-25T20:32:06Z"
        message: ""
        observedGeneration: 1
        reason: OK
        status: "True"
        type: Valid
      - lastTransitionTime: "2024-04-25T20:32:06Z"
        message: ""
        observedGeneration: 1
        reason: OK
        status: "False"
        type: ChangeDetected
      lastUpdated: "2024-04-25T20:32:06Z"
      updates: []

3.16. Editing the HostFirmwareComponents resource

You can edit the HostFirmwareComponents resource of a node.

Procedure

  1. Get the detailed list of HostFirmwareComponents resources:

    $ oc get hostfirmwarecomponents -n openshift-machine-api -o yaml
  2. Edit a host’s HostFirmwareComponents resource:

    $ oc edit <host_name> hostfirmwarecomponents -n openshift-machine-api 1
    1
    Where <host_name> is the name of the host. The HostFirmwareComponents resource will open in the default editor for your terminal.

    Example output

    ---
    apiVersion: metal3.io/v1alpha1
    kind: HostFirmwareComponents
    metadata:
      creationTimestamp: 2024-04-25T20:32:06Z"
      generation: 1
      name: ostest-master-2
      namespace: openshift-machine-api
      ownerReferences:
      - apiVersion: metal3.io/v1alpha1
        blockOwnerDeletion: true
        controller: true
        kind: BareMetalHost
        name: ostest-master-2
        uid: 16022566-7850-4dc8-9e7d-f216211d4195
      resourceVersion: "2437"
      uid: 2038d63f-afc0-4413-8ffe-2f8e098d1f6c
    spec:
      updates:
        - name: bios 1
          url: https://myurl.with.firmware.for.bios 2
        - name: bmc 3
          url: https://myurl.with.firmware.for.bmc 4
    status:
      components:
      - component: bios
        currentVersion: 1.0.0
        initialVersion: 1.0.0
      - component: bmc
        currentVersion: "1.00"
        initialVersion: "1.00"
      conditions:
      - lastTransitionTime: "2024-04-25T20:32:06Z"
        message: ""
        observedGeneration: 1
        reason: OK
        status: "True"
        type: Valid
      - lastTransitionTime: "2024-04-25T20:32:06Z"
        message: ""
        observedGeneration: 1
        reason: OK
        status: "False"
        type: ChangeDetected
      lastUpdated: "2024-04-25T20:32:06Z"

    1
    To set a BIOS version, set the name attribute to bios.
    2
    To set a BIOS version, set the url attribute to the URL for the firmware version of the BIOS.
    3
    To set a BMC version, set the name attribute to bmc.
    4
    To set a BMC version, set the url attribute to the URL for the firmware verison of the BMC.
  3. Save the changes and exit the editor.
  4. Get the host’s machine name:

    $ oc get bmh <host_name> -n openshift-machine name 1
    1
    Where <host_name> is the name of the host. The machine name appears under the CONSUMER field.
  5. Annotate the machine to delete it from the machine set:

    $ oc annotate machine <machine_name> machine.openshift.io/delete-machine=true -n openshift-machine-api 1
    1
    Where <machine_name> is the name of the machine to delete.
  6. Get a list of nodes and count the number of worker nodes:

    $ oc get nodes
  7. Get the machine set:

    $ oc get machinesets -n openshift-machine-api
  8. Scale the machine set:

    $ oc scale machineset <machineset_name> -n openshift-machine-api --replicas=<n-1> 1
    1
    Where <machineset_name> is the name of the machine set and <n-1> is the decremented number of worker nodes.
  9. When the host enters the Available state, scale up the machine set to make the HostFirmwareComponents resource changes take effect:

    $ oc scale machineset <machineset_name> -n openshift-machine-api --replicas=<n> 1
    1
    Where <machineset_name> is the name of the machine set and <n> is the number of worker nodes.

Chapter 4. Configuring multi-architecture compute machines on an OpenShift cluster

4.1. About clusters with multi-architecture compute machines

An OpenShift Container Platform cluster with multi-architecture compute machines is a cluster that supports compute machines with different architectures.

Note

When there are nodes with multiple architectures in your cluster, the architecture of your image must be consistent with the architecture of the node. You need to ensure that the pod is assigned to the node with the appropriate architecture and that it matches the image architecture. For more information on assigning pods to nodes, see Assigning pods to nodes.

Important

The Cluster Samples Operator is not supported on clusters with multi-architecture compute machines. Your cluster can be created without this capability. For more information, see Cluster capabilities.

For information on migrating your single-architecture cluster to a cluster that supports multi-architecture compute machines, see Migrating to a cluster with multi-architecture compute machines.

4.1.1. Configuring your cluster with multi-architecture compute machines

To create a cluster with multi-architecture compute machines with different installation options and platforms, you can use the documentation in the following table:

Table 4.1. Cluster with multi-architecture compute machine installation options
Documentation sectionPlatformUser-provisioned installationInstaller-provisioned installationControl PlaneCompute node

Creating a cluster with multi-architecture compute machines on Azure

Microsoft Azure

 

aarch64 or x86_64

aarch64, x86_64

Creating a cluster with multi-architecture compute machines on AWS

Amazon Web Services (AWS)

 

aarch64 or x86_64

aarch64, x86_64

Creating a cluster with multi-architecture compute machines on GCP

Google Cloud Platform (GCP)

 

aarch64 or x86_64

aarch64, x86_64

Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z

Bare metal

 

aarch64 or x86_64

aarch64, x86_64

IBM Power

 

x86_64 or ppc64le

x86_64, ppc64le

IBM Z

 

x86_64 or s390x

x86_64, s390x

Creating a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE with z/VM

IBM Z® and IBM® LinuxONE

 

x86_64

x86_64, s390x

Creating a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE with RHEL KVM

IBM Z® and IBM® LinuxONE

 

x86_64

x86_64, s390x

Creating a cluster with multi-architecture compute machines on IBM Power®

IBM Power®

 

x86_64

x86_64, ppc64le

Important

Autoscaling from zero is currently not supported on Google Cloud Platform (GCP).

4.2. Creating a cluster with multi-architecture compute machine on Azure

To deploy an Azure cluster with multi-architecture compute machines, you must first create a single-architecture Azure installer-provisioned cluster that uses the multi-architecture installer binary. For more information on Azure installations, see Installing a cluster on Azure with customizations.

You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.

After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.

4.2.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.2.2. Creating a 64-bit ARM boot image using the Azure image gallery

The following procedure describes how to manually generate a 64-bit ARM boot image.

Prerequisites

  • You installed the Azure CLI (az).
  • You created a single-architecture Azure installer-provisioned cluster with the multi-architecture installer binary.

Procedure

  1. Log in to your Azure account:

    $ az login
  2. Create a storage account and upload the aarch64 virtual hard disk (VHD) to your storage account. The OpenShift Container Platform installation program creates a resource group, however, the boot image can also be uploaded to a custom named resource group:

    $ az storage account create -n ${STORAGE_ACCOUNT_NAME} -g ${RESOURCE_GROUP} -l westus --sku Standard_LRS 1
    1
    The westus object is an example region.
  3. Create a storage container using the storage account you generated:

    $ az storage container create -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME}
  4. You must use the OpenShift Container Platform installation program JSON file to extract the URL and aarch64 VHD name:

    1. Extract the URL field and set it to RHCOS_VHD_ORIGIN_URL as the file name by running the following command:

      $ RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".url')
    2. Extract the aarch64 VHD name and set it to BLOB_NAME as the file name by running the following command:

      $ BLOB_NAME=rhcos-$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".release')-azure.aarch64.vhd
  5. Generate a shared access signature (SAS) token. Use this token to upload the RHCOS VHD to your storage container with the following commands:

    $ end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
    $ sas=`az storage container generate-sas -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME} --https-only --permissions dlrw --expiry $end -o tsv`
  6. Copy the RHCOS VHD into the storage container:

    $ az storage blob copy start --account-name ${STORAGE_ACCOUNT_NAME} --sas-token "$sas" \
     --source-uri "${RHCOS_VHD_ORIGIN_URL}" \
     --destination-blob "${BLOB_NAME}" --destination-container ${CONTAINER_NAME}

    You can check the status of the copying process with the following command:

    $ az storage blob show -c ${CONTAINER_NAME} -n ${BLOB_NAME} --account-name ${STORAGE_ACCOUNT_NAME} | jq .properties.copy

    Example output

    {
     "completionTime": null,
     "destinationSnapshot": null,
     "id": "1fd97630-03ca-489a-8c4e-cfe839c9627d",
     "incrementalCopy": null,
     "progress": "17179869696/17179869696",
     "source": "https://rhcos.blob.core.windows.net/imagebucket/rhcos-411.86.202207130959-0-azure.aarch64.vhd",
     "status": "success", 1
     "statusDescription": null
    }

    1
    If the status parameter displays the success object, the copying process is complete.
  7. Create an image gallery using the following command:

    $ az sig create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME}

    Use the image gallery to create an image definition. In the following example command, rhcos-arm64 is the name of the image definition.

    $ az sig image-definition create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --publisher RedHat --offer arm --sku arm64 --os-type linux --architecture Arm64 --hyper-v-generation V2
  8. To get the URL of the VHD and set it to RHCOS_VHD_URL as the file name, run the following command:

    $ RHCOS_VHD_URL=$(az storage blob url --account-name ${STORAGE_ACCOUNT_NAME} -c ${CONTAINER_NAME} -n "${BLOB_NAME}" -o tsv)
  9. Use the RHCOS_VHD_URL file, your storage account, resource group, and image gallery to create an image version. In the following example, 1.0.0 is the image version.

    $ az sig image-version create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --gallery-image-version 1.0.0 --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} --os-vhd-uri ${RHCOS_VHD_URL}
  10. Your arm64 boot image is now generated. You can access the ID of your image with the following command:

    $ az sig image-version show -r $GALLERY_NAME -g $RESOURCE_GROUP -i rhcos-arm64 -e 1.0.0

    The following example image ID is used in the recourseID parameter of the compute machine set:

    Example resourceID

    /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-arm64/versions/1.0.0

4.2.3. Creating a 64-bit x86 boot image using the Azure image gallery

The following procedure describes how to manually generate a 64-bit x86 boot image.

Prerequisites

  • You installed the Azure CLI (az).
  • You created a single-architecture Azure installer-provisioned cluster with the multi-architecture installer binary.

Procedure

  1. Log in to your Azure account by running the following command:

    $ az login
  2. Create a storage account and upload the x86_64 virtual hard disk (VHD) to your storage account by running the following command. The OpenShift Container Platform installation program creates a resource group. However, the boot image can also be uploaded to a custom named resource group:

    $ az storage account create -n ${STORAGE_ACCOUNT_NAME} -g ${RESOURCE_GROUP} -l westus --sku Standard_LRS 1
    1
    The westus object is an example region.
  3. Create a storage container using the storage account you generated by running the following command:

    $ az storage container create -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME}
  4. Use the OpenShift Container Platform installation program JSON file to extract the URL and x86_64 VHD name:

    1. Extract the URL field and set it to RHCOS_VHD_ORIGIN_URL as the file name by running the following command:

      $ RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.x86_64."rhel-coreos-extensions"."azure-disk".url')
    2. Extract the x86_64 VHD name and set it to BLOB_NAME as the file name by running the following command:

      $ BLOB_NAME=rhcos-$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.x86_64."rhel-coreos-extensions"."azure-disk".release')-azure.x86_64.vhd
  5. Generate a shared access signature (SAS) token. Use this token to upload the RHCOS VHD to your storage container by running the following commands:

    $ end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
    $ sas=`az storage container generate-sas -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME} --https-only --permissions dlrw --expiry $end -o tsv`
  6. Copy the RHCOS VHD into the storage container by running the following command:

    $ az storage blob copy start --account-name ${STORAGE_ACCOUNT_NAME} --sas-token "$sas" \
     --source-uri "${RHCOS_VHD_ORIGIN_URL}" \
     --destination-blob "${BLOB_NAME}" --destination-container ${CONTAINER_NAME}

    You can check the status of the copying process by running the following command:

    $ az storage blob show -c ${CONTAINER_NAME} -n ${BLOB_NAME} --account-name ${STORAGE_ACCOUNT_NAME} | jq .properties.copy

    Example output

    {
     "completionTime": null,
     "destinationSnapshot": null,
     "id": "1fd97630-03ca-489a-8c4e-cfe839c9627d",
     "incrementalCopy": null,
     "progress": "17179869696/17179869696",
     "source": "https://rhcos.blob.core.windows.net/imagebucket/rhcos-411.86.202207130959-0-azure.aarch64.vhd",
     "status": "success", 1
     "statusDescription": null
    }

    1
    If the status parameter displays the success object, the copying process is complete.
  7. Create an image gallery by running the following command:

    $ az sig create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME}
  8. Use the image gallery to create an image definition by running the following command:

    $ az sig image-definition create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-x86_64 --publisher RedHat --offer x86_64 --sku x86_64 --os-type linux --architecture x64 --hyper-v-generation V2

    In this example command, rhcos-x86_64 is the name of the image definition.

  9. To get the URL of the VHD and set it to RHCOS_VHD_URL as the file name, run the following command:

    $ RHCOS_VHD_URL=$(az storage blob url --account-name ${STORAGE_ACCOUNT_NAME} -c ${CONTAINER_NAME} -n "${BLOB_NAME}" -o tsv)
  10. Use the RHCOS_VHD_URL file, your storage account, resource group, and image gallery to create an image version by running the following command:

    $ az sig image-version create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --gallery-image-version 1.0.0 --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} --os-vhd-uri ${RHCOS_VHD_URL}

    In this example, 1.0.0 is the image version.

  11. Optional: Access the ID of the generated x86_64 boot image by running the following command:

    $ az sig image-version show -r $GALLERY_NAME -g $RESOURCE_GROUP -i rhcos-x86_64 -e 1.0.0

    The following example image ID is used in the recourseID parameter of the compute machine set:

    Example resourceID

    /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-x86_64/versions/1.0.0

4.2.4. Adding a multi-architecture compute machine set to your Azure cluster

After creating a multi-architecture cluster, you can add nodes with different architectures.

You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:

  • Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
  • Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.

To create a custom compute machine set on Azure, see "Creating a compute machine set on Azure".

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You created a 64-bit ARM or 64-bit x86 boot image.
  • You used the installation program to create a 64-bit ARM or 64-bit x86 single-architecture Azure cluster with the multi-architecture installer binary.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.

    Example MachineSet object for an Azure 64-bit ARM or 64-bit x86 compute node

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructure_id>
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
      name: <infrastructure_id>-machine-set-0
      namespace: openshift-machine-api
    spec:
      replicas: 2
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id>
          machine.openshift.io/cluster-api-machineset: <infrastructure_id>-machine-set-0
      template:
        metadata:
          labels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machine-role: worker
            machine.openshift.io/cluster-api-machine-type: worker
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-machine-set-0
        spec:
          lifecycleHooks: {}
          metadata: {}
          providerSpec:
            value:
              acceleratedNetworking: true
              apiVersion: machine.openshift.io/v1beta1
              credentialsSecret:
                name: azure-cloud-credentials
                namespace: openshift-machine-api
              image:
                offer: ""
                publisher: ""
                resourceID: /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-arm64/versions/1.0.0 1
                sku: ""
                version: ""
              kind: AzureMachineProviderSpec
              location: <region>
              managedIdentity: <infrastructure_id>-identity
              networkResourceGroup: <infrastructure_id>-rg
              osDisk:
                diskSettings: {}
                diskSizeGB: 128
                managedDisk:
                  storageAccountType: Premium_LRS
                osType: Linux
              publicIP: false
              publicLoadBalancer: <infrastructure_id>
              resourceGroup: <infrastructure_id>-rg
              subnet: <infrastructure_id>-worker-subnet
              userDataSecret:
                name: worker-user-data
              vmSize: Standard_D4ps_v5 2
              vnet: <infrastructure_id>-vnet
              zone: "<zone>"

    1
    Set the resourceID parameter to either arm64 or amd64 boot image.
    2
    Set the vmSize parameter to the instance type used in your installation. Some example instance types are Standard_D4ps_v5 or D8ps.
  3. Create the compute machine set by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the YAML file with compute machine set configuration. For example: arm64-machine-set-0.yaml, or amd64-machine-set-0.yaml.

Verification

  1. Verify that the new machines are running by running the following command:

    $ oc get machineset -n openshift-machine-api

    The output must include the machine set that you created.

    Example output

    NAME                                                DESIRED  CURRENT  READY  AVAILABLE  AGE
    <infrastructure_id>-machine-set-0                   2        2      2          2  10m

  2. You can check if the nodes are ready and schedulable by running the following command:

    $ oc get nodes

4.3. Creating a cluster with multi-architecture compute machines on AWS

To create an AWS cluster with multi-architecture compute machines, you must first create a single-architecture AWS installer-provisioned cluster with the multi-architecture installer binary. For more information on AWS installations, see Installing a cluster on AWS with customizations.

You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.

After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.

4.3.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.3.2. Adding a multi-architecture compute machine set to your AWS cluster

After creating a multi-architecture cluster, you can add nodes with different architectures.

You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:

  • Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
  • Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You used the installation program to create an 64-bit ARM or 64-bit x86 single-architecture AWS cluster with the multi-architecture installer binary.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.

    Example MachineSet object for an AWS 64-bit ARM or x86 compute node

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1
      name: <infrastructure_id>-aws-machine-set-0 2
      namespace: openshift-machine-api
    spec:
      replicas: 1
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id> 3
          machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>-<zone> 4
      template:
        metadata:
          labels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machine-role: <role> 5
            machine.openshift.io/cluster-api-machine-type: <role> 6
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>-<zone> 7
        spec:
          metadata:
            labels:
              node-role.kubernetes.io/<role>: ""
          providerSpec:
            value:
              ami:
                id: ami-02a574449d4f4d280 8
              apiVersion: awsproviderconfig.openshift.io/v1beta1
              blockDevices:
                - ebs:
                    iops: 0
                    volumeSize: 120
                    volumeType: gp2
              credentialsSecret:
                name: aws-cloud-credentials
              deviceIndex: 0
              iamInstanceProfile:
                id: <infrastructure_id>-worker-profile 9
              instanceType: m6g.xlarge 10
              kind: AWSMachineProviderConfig
              placement:
                availabilityZone: us-east-1a 11
                region: <region> 12
              securityGroups:
                - filters:
                    - name: tag:Name
                      values:
                        - <infrastructure_id>-worker-sg 13
              subnet:
                filters:
                  - name: tag:Name
                    values:
                      - <infrastructure_id>-private-<zone>
              tags:
                - name: kubernetes.io/cluster/<infrastructure_id> 14
                  value: owned
                - name: <custom_tag_name>
                  value: <custom_tag_value>
              userDataSecret:
                name: worker-user-data

    1 2 3 9 13 14
    Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI (oc) installed, you can obtain the infrastructure ID by running the following command:
    $ oc get -o jsonpath=‘{.status.infrastructureName}{“\n”}’ infrastructure cluster
    4 7
    Specify the infrastructure ID, role node label, and zone.
    5 6
    Specify the role node label to add.
    8
    Specify a Red Hat Enterprise Linux CoreOS (RHCOS) Amazon Machine Image (AMI) for your AWS zone for the nodes. The RHCOS AMI must be compatible with the machine architecture.
    $ oc get configmap/coreos-bootimages \
    	  -n openshift-machine-config-operator \
    	  -o jsonpath='{.data.stream}' | jq \
    	  -r '.architectures.<arch>.images.aws.regions."<region>".image'
    10
    Specify a machine type that aligns with the CPU architecture of the chosen AMI. For more information, see "Tested instance types for AWS 64-bit ARM"
    11
    Specify the zone. For example, us-east-1a. Ensure that the zone you select has machines with the required architecture.
    12
    Specify the region. For example, us-east-1. Ensure that the zone you select has machines with the required architecture.
  3. Create the compute machine set by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the YAML file with compute machine set configuration. For example: aws-arm64-machine-set-0.yaml, or aws-amd64-machine-set-0.yaml.

Verification

  1. View the list of compute machine sets by running the following command:

    $ oc get machineset -n openshift-machine-api

    The output must include the machine set that you created.

    Example output

    NAME                                                DESIRED  CURRENT  READY  AVAILABLE  AGE
    <infrastructure_id>-aws-machine-set-0                   2        2      2          2  10m

  2. You can check if the nodes are ready and schedulable by running the following command:

    $ oc get nodes

4.4. Creating a cluster with multi-architecture compute machines on GCP

To create a Google Cloud Platform (GCP) cluster with multi-architecture compute machines, you must first create a single-architecture GCP installer-provisioned cluster with the multi-architecture installer binary. For more information on AWS installations, see Installing a cluster on GCP with customizations.

You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.

After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.

Note

Secure booting is currently not supported on 64-bit ARM machines for GCP

4.4.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.4.2. Adding a multi-architecture compute machine set to your GCP cluster

After creating a multi-architecture cluster, you can add nodes with different architectures.

You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:

  • Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
  • Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You used the installation program to create a 64-bit x86 or 64-bit ARM single-architecture GCP cluster with the multi-architecture installer binary.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.

    Example MachineSet object for a GCP 64-bit ARM or 64-bit x86 compute node

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1
      name: <infrastructure_id>-w-a
      namespace: openshift-machine-api
    spec:
      replicas: 1
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <infrastructure_id>
          machine.openshift.io/cluster-api-machineset: <infrastructure_id>-w-a
      template:
        metadata:
          creationTimestamp: null
          labels:
            machine.openshift.io/cluster-api-cluster: <infrastructure_id>
            machine.openshift.io/cluster-api-machine-role: <role> 2
            machine.openshift.io/cluster-api-machine-type: <role>
            machine.openshift.io/cluster-api-machineset: <infrastructure_id>-w-a
        spec:
          metadata:
            labels:
              node-role.kubernetes.io/<role>: ""
          providerSpec:
            value:
              apiVersion: gcpprovider.openshift.io/v1beta1
              canIPForward: false
              credentialsSecret:
                name: gcp-cloud-credentials
              deletionProtection: false
              disks:
              - autoDelete: true
                boot: true
                image: <path_to_image> 3
                labels: null
                sizeGb: 128
                type: pd-ssd
              gcpMetadata: 4
              - key: <custom_metadata_key>
                value: <custom_metadata_value>
              kind: GCPMachineProviderSpec
              machineType: n1-standard-4 5
              metadata:
                creationTimestamp: null
              networkInterfaces:
              - network: <infrastructure_id>-network
                subnetwork: <infrastructure_id>-worker-subnet
              projectID: <project_name> 6
              region: us-central1 7
              serviceAccounts:
              - email: <infrastructure_id>-w@<project_name>.iam.gserviceaccount.com
                scopes:
                - https://www.googleapis.com/auth/cloud-platform
              tags:
                - <infrastructure_id>-worker
              userDataSecret:
                name: worker-user-data
              zone: us-central1-a

    1
    Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. You can obtain the infrastructure ID by running the following command:
    $ oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster
    2
    Specify the role node label to add.
    3
    Specify the path to the image that is used in current compute machine sets. You need the project and image name for your path to image.

    To access the project and image name, run the following command:

    $ oc get configmap/coreos-bootimages \
      -n openshift-machine-config-operator \
      -o jsonpath='{.data.stream}' | jq \
      -r '.architectures.aarch64.images.gcp'

    Example output

      "gcp": {
        "release": "415.92.202309142014-0",
        "project": "rhcos-cloud",
        "name": "rhcos-415-92-202309142014-0-gcp-aarch64"
      }

    Use the project and name parameters from the output to create the path to image field in your machine set. The path to the image should follow the following format:

    $ projects/<project>/global/images/<image_name>
    4
    Optional: Specify custom metadata in the form of a key:value pair. For example use cases, see the GCP documentation for setting custom metadata.
    5
    Specify a machine type that aligns with the CPU architecture of the chosen OS image. For more information, see "Tested instance types for GCP on 64-bit ARM infrastructures".
    6
    Specify the name of the GCP project that you use for your cluster.
    7
    Specify the region. For example, us-central1. Ensure that the zone you select has machines with the required architecture.
  3. Create the compute machine set by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the YAML file with compute machine set configuration. For example: gcp-arm64-machine-set-0.yaml, or gcp-amd64-machine-set-0.yaml.

Verification

  1. View the list of compute machine sets by running the following command:

    $ oc get machineset -n openshift-machine-api

    The output must include the machine set that you created.

    Example output

    NAME                                                DESIRED  CURRENT  READY  AVAILABLE  AGE
    <infrastructure_id>-gcp-machine-set-0                   2        2      2          2  10m

  2. You can check if the nodes are ready and schedulable by running the following command:

    $ oc get nodes

4.5. Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z

To create a cluster with multi-architecture compute machines on bare metal (x86_64 or aarch64), IBM Power® (ppc64le), or IBM Z® (s390x) you must have an existing single-architecture cluster on one of these platforms. Follow the installations procedures for your platform:

Important

The bare metal installer-provisioned infrastructure and the Bare Metal Operator do not support adding secondary architecture nodes during the initial cluster setup. You can add secondary architecture nodes manually only after the initial cluster setup.

Before you can add additional compute nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using an ISO image or network PXE booting. This allows you to add additional nodes to your cluster and deploy a cluster with multi-architecture compute machines.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

4.5.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.5.2. Creating RHCOS machines using an ISO image

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using an ISO image to create the machines.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • You must have the OpenShift CLI (oc) installed.

Procedure

  1. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  2. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URLs of these files.
  3. You can validate that the ignition files are available on the URLs. The following example gets the Ignition config files for the compute node:

    $ curl -k http://<HTTP_server>/worker.ign
  4. You can access the ISO image for booting your new machine by running to following command:

    RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.<architecture>.artifacts.metal.formats.iso.disk.location')
  5. Use the ISO file to install RHCOS on more compute machines. Use the same method that you used when you created machines before you installed the cluster:

    • Burn the ISO image to a disk and boot it directly.
    • Use ISO redirection with a LOM interface.
  6. Boot the RHCOS ISO image without specifying any options, or interrupting the live boot sequence. Wait for the installer to boot into a shell prompt in the RHCOS live environment.

    Note

    You can interrupt the RHCOS installation boot process to add kernel arguments. However, for this ISO procedure you must use the coreos-installer command as outlined in the following steps, instead of adding kernel arguments.

  7. Run the coreos-installer command and specify the options that meet your installation requirements. At a minimum, you must specify the URL that points to the Ignition config file for the node type, and the device that you are installing to:

    $ sudo coreos-installer install --ignition-url=http://<HTTP_server>/<node_type>.ign <device> --ignition-hash=sha512-<digest> 12
    1
    You must run the coreos-installer command by using sudo, because the core user does not have the required root privileges to perform the installation.
    2
    The --ignition-hash option is required when the Ignition config file is obtained through an HTTP URL to validate the authenticity of the Ignition config file on the cluster node. <digest> is the Ignition config file SHA512 digest obtained in a preceding step.
    Note

    If you want to provide your Ignition config files through an HTTPS server that uses TLS, you can add the internal certificate authority (CA) to the system trust store before running coreos-installer.

    The following example initializes a bootstrap node installation to the /dev/sda device. The Ignition config file for the bootstrap node is obtained from an HTTP web server with the IP address 192.168.1.2:

    $ sudo coreos-installer install --ignition-url=http://192.168.1.2:80/installation_directory/bootstrap.ign /dev/sda --ignition-hash=sha512-a5a2d43879223273c9b60af66b44202a1d1248fc01cf156c46d4a79f552b6bad47bc8cc78ddf0116e80c59d2ea9e32ba53bc807afbca581aa059311def2c3e3b
  8. Monitor the progress of the RHCOS installation on the console of the machine.

    Important

    Ensure that the installation is successful on each node before commencing with the OpenShift Container Platform installation. Observing the installation process can also help to determine the cause of RHCOS installation issues that might arise.

  9. Continue to create more compute machines for your cluster.

4.5.3. Creating RHCOS machines by PXE or iPXE booting

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using PXE or iPXE booting.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • Obtain the URLs of the RHCOS ISO image, compressed metal BIOS, kernel, and initramfs files that you uploaded to your HTTP server during cluster installation.
  • You have access to the PXE booting infrastructure that you used to create the machines for your OpenShift Container Platform cluster during installation. The machines must boot from their local disks after RHCOS is installed on them.
  • If you use UEFI, you have access to the grub.conf file that you modified during OpenShift Container Platform installation.

Procedure

  1. Confirm that your PXE or iPXE installation for the RHCOS images is correct.

    • For PXE:

      DEFAULT pxeboot
      TIMEOUT 20
      PROMPT 0
      LABEL pxeboot
          KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> 1
          APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img 2
      1
      Specify the location of the live kernel file that you uploaded to your HTTP server.
      2
      Specify locations of the RHCOS files that you uploaded to your HTTP server. The initrd parameter value is the location of the live initramfs file, the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file, and the coreos.live.rootfs_url parameter value is the location of the live rootfs file. The coreos.inst.ignition_url and coreos.live.rootfs_url parameters only support HTTP and HTTPS.
      Note

      This configuration does not enable serial console access on machines with a graphical console. To configure a different console, add one or more console= arguments to the APPEND line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?.

    • For iPXE (x86_64 + aarch64):

      kernel http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> initrd=main coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
      initrd --name main http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 3
      boot
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP server. The kernel parameter value is the location of the kernel file, the initrd=main argument is needed for booting on UEFI systems, the coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your HTTP server.
      Note

      This configuration does not enable serial console access on machines with a graphical console To configure a different console, add one or more console= arguments to the kernel line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux? and "Enabling the serial console for PXE and ISO installation" in the "Advanced RHCOS installation configuration" section.

      Note

      To network boot the CoreOS kernel on aarch64 architecture, you need to use a version of iPXE build with the IMAGE_GZIP option enabled. See IMAGE_GZIP option in iPXE.

    • For PXE (with UEFI and GRUB as second stage) on aarch64:

      menuentry 'Install CoreOS' {
          linux rhcos-<version>-live-kernel-<architecture>  coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
          initrd rhcos-<version>-live-initramfs.<architecture>.img 3
      }
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP/TFTP server. The kernel parameter value is the location of the kernel file on your TFTP server. The coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file on your HTTP Server.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your TFTP server.
  2. Use the PXE or iPXE infrastructure to create the required compute machines for your cluster.

4.5.4. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.29.4
    master-1  Ready     master  63m  v1.29.4
    master-2  Ready     master  64m  v1.29.4

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.29.4
    master-1  Ready     master  73m  v1.29.4
    master-2  Ready     master  74m  v1.29.4
    worker-0  Ready     worker  11m  v1.29.4
    worker-1  Ready     worker  11m  v1.29.4

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

4.6. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with z/VM

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) with z/VM, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using a z/VM instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

4.6.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.6.2. Creating RHCOS machines on IBM Z with z/VM

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines running on IBM Z® with z/VM and attach them to your existing cluster.

Prerequisites

  • You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
  • You have an HTTP or HTTPS server running on your provisioning machine that is accessible to the machines you create.

Procedure

  1. Disable UDP aggregation.

    Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.

    1. Create a YAML file udp-aggregation-config.yaml with the following content:

      apiVersion: v1
      kind: ConfigMap
      data:
        disable-udp-aggregation: "true"
      metadata:
        name: udp-aggregation-config
        namespace: openshift-network-operator
    2. Create the ConfigMap resource by running the following command:

      $ oc create -f udp-aggregation-config.yaml
  2. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  3. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
  4. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:

    $ curl -k http://<http_server>/worker.ign
  5. Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
  6. Move the downloaded RHEL live kernel, initramfs, and rootfs files to an HTTP or HTTPS server that is accessible from the z/VM guest you want to add.
  7. Create a parameter file for the z/VM guest. The following parameters are specific for the virtual machine:

    • Optional: To specify a static IP address, add an ip= parameter with the following entries, with each separated by a colon:

      1. The IP address for the machine.
      2. An empty string.
      3. The gateway.
      4. The netmask.
      5. The machine host and domain name in the form hostname.domainname. Omit this value to let RHCOS decide.
      6. The network interface name. Omit this value to let RHCOS decide.
      7. The value none.
    • For coreos.inst.ignition_url=, specify the URL to the worker.ign file. Only HTTP and HTTPS protocols are supported.
    • For coreos.live.rootfs_url=, specify the matching rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported.
    • For installations on DASD-type disks, complete the following tasks:

      1. For coreos.inst.install_dev=, specify /dev/dasda.
      2. Use rd.dasd= to specify the DASD where RHCOS is to be installed.
      3. You can adjust further parameters if required.

        The following is an example parameter file, additional-worker-dasd.parm:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/dasda \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        rd.dasd=0.0.3490 \
        zfcp.allow_lun_scan=0

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

    • For installations on FCP-type disks, complete the following tasks:

      1. Use rd.zfcp=<adapter>,<wwpn>,<lun> to specify the FCP disk where RHCOS is to be installed. For multipathing, repeat this step for each additional path.

        Note

        When you install with multiple paths, you must enable multipathing directly after the installation, not at a later point in time, as this can cause problems.

      2. Set the install device as: coreos.inst.install_dev=/dev/sda.

        Note

        If additional LUNs are configured with NPIV, FCP requires zfcp.allow_lun_scan=0. If you must enable zfcp.allow_lun_scan=1 because you use a CSI driver, for example, you must configure your NPIV so that each node cannot access the boot partition of another node.

      3. You can adjust further parameters if required.

        Important

        Additional postinstallation steps are required to fully enable multipathing. For more information, see “Enabling multipathing with kernel arguments on RHCOS" in Postinstallation machine configuration tasks.

        The following is an example parameter file, additional-worker-fcp.parm for a worker node with multipathing:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/sda \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        zfcp.allow_lun_scan=0 \
        rd.zfcp=0.0.1987,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.1987,0x50050763071bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763071bc5e3,0x4008400B00000000

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

  8. Transfer the initramfs, kernel, parameter files, and RHCOS images to z/VM, for example, by using FTP. For details about how to transfer the files with FTP and boot from the virtual reader, see Installing under Z/VM.
  9. Punch the files to the virtual reader of the z/VM guest virtual machine.

    See PUNCH in IBM® Documentation.

    Tip

    You can use the CP PUNCH command or, if you use Linux, the vmur command to transfer files between two z/VM guest virtual machines.

  10. Log in to CMS on the bootstrap machine.
  11. IPL the bootstrap machine from the reader by running the following command:

    $ ipl c

    See IPL in IBM® Documentation.

4.6.3. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.29.4
    master-1  Ready     master  63m  v1.29.4
    master-2  Ready     master  64m  v1.29.4

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.29.4
    master-1  Ready     master  73m  v1.29.4
    master-2  Ready     master  74m  v1.29.4
    worker-0  Ready     worker  11m  v1.29.4
    worker-1  Ready     worker  11m  v1.29.4

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

4.7. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE in an LPAR

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) in an LPAR, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using an LPAR instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

Note

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

4.7.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.7.2. Creating RHCOS machines on IBM Z with z/VM

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines running on IBM Z® with z/VM and attach them to your existing cluster.

Prerequisites

  • You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
  • You have an HTTP or HTTPS server running on your provisioning machine that is accessible to the machines you create.

Procedure

  1. Disable UDP aggregation.

    Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.

    1. Create a YAML file udp-aggregation-config.yaml with the following content:

      apiVersion: v1
      kind: ConfigMap
      data:
        disable-udp-aggregation: "true"
      metadata:
        name: udp-aggregation-config
        namespace: openshift-network-operator
    2. Create the ConfigMap resource by running the following command:

      $ oc create -f udp-aggregation-config.yaml
  2. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  3. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
  4. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:

    $ curl -k http://<http_server>/worker.ign
  5. Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
  6. Move the downloaded RHEL live kernel, initramfs, and rootfs files to an HTTP or HTTPS server that is accessible from the z/VM guest you want to add.
  7. Create a parameter file for the z/VM guest. The following parameters are specific for the virtual machine:

    • Optional: To specify a static IP address, add an ip= parameter with the following entries, with each separated by a colon:

      1. The IP address for the machine.
      2. An empty string.
      3. The gateway.
      4. The netmask.
      5. The machine host and domain name in the form hostname.domainname. Omit this value to let RHCOS decide.
      6. The network interface name. Omit this value to let RHCOS decide.
      7. The value none.
    • For coreos.inst.ignition_url=, specify the URL to the worker.ign file. Only HTTP and HTTPS protocols are supported.
    • For coreos.live.rootfs_url=, specify the matching rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported.
    • For installations on DASD-type disks, complete the following tasks:

      1. For coreos.inst.install_dev=, specify /dev/dasda.
      2. Use rd.dasd= to specify the DASD where RHCOS is to be installed.
      3. You can adjust further parameters if required.

        The following is an example parameter file, additional-worker-dasd.parm:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/dasda \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        rd.dasd=0.0.3490 \
        zfcp.allow_lun_scan=0

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

    • For installations on FCP-type disks, complete the following tasks:

      1. Use rd.zfcp=<adapter>,<wwpn>,<lun> to specify the FCP disk where RHCOS is to be installed. For multipathing, repeat this step for each additional path.

        Note

        When you install with multiple paths, you must enable multipathing directly after the installation, not at a later point in time, as this can cause problems.

      2. Set the install device as: coreos.inst.install_dev=/dev/sda.

        Note

        If additional LUNs are configured with NPIV, FCP requires zfcp.allow_lun_scan=0. If you must enable zfcp.allow_lun_scan=1 because you use a CSI driver, for example, you must configure your NPIV so that each node cannot access the boot partition of another node.

      3. You can adjust further parameters if required.

        Important

        Additional postinstallation steps are required to fully enable multipathing. For more information, see “Enabling multipathing with kernel arguments on RHCOS" in Postinstallation machine configuration tasks.

        The following is an example parameter file, additional-worker-fcp.parm for a worker node with multipathing:

        cio_ignore=all,!condev rd.neednet=1 \
        console=ttysclp0 \
        coreos.inst.install_dev=/dev/sda \
        coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \
        coreos.inst.ignition_url=http://<http_server>/worker.ign \
        ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \
        rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \
        zfcp.allow_lun_scan=0 \
        rd.zfcp=0.0.1987,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763070bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.1987,0x50050763071bc5e3,0x4008400B00000000 \
        rd.zfcp=0.0.19C7,0x50050763071bc5e3,0x4008400B00000000

        Write all options in the parameter file as a single line and make sure that you have no newline characters.

  8. Transfer the initramfs, kernel, parameter files, and RHCOS images to z/VM, for example, by using FTP. For details about how to transfer the files with FTP and boot from the virtual reader, see Installing under Z/VM.
  9. Punch the files to the virtual reader of the z/VM guest virtual machine.

    See PUNCH in IBM® Documentation.

    Tip

    You can use the CP PUNCH command or, if you use Linux, the vmur command to transfer files between two z/VM guest virtual machines.

  10. Log in to CMS on the bootstrap machine.
  11. IPL the bootstrap machine from the reader by running the following command:

    $ ipl c

    See IPL in IBM® Documentation.

4.7.3. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.29.4
    master-1  Ready     master  63m  v1.29.4
    master-2  Ready     master  64m  v1.29.4

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.29.4
    master-1  Ready     master  73m  v1.29.4
    master-2  Ready     master  74m  v1.29.4
    worker-0  Ready     worker  11m  v1.29.4
    worker-1  Ready     worker  11m  v1.29.4

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

4.8. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with RHEL KVM

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) with RHEL KVM, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using a RHEL KVM instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

4.8.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.8.2. Creating RHCOS machines using virt-install

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using virt-install.

Prerequisites

  • You have at least one LPAR running on RHEL 8.7 or later with KVM, referred to as RHEL KVM host in this procedure.
  • The KVM/QEMU hypervisor is installed on the RHEL KVM host.
  • You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
  • An HTTP or HTTPS server is set up.

Procedure

  1. Disable UDP aggregation.

    Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.

    1. Create a YAML file udp-aggregation-config.yaml with the following content:

      apiVersion: v1
      kind: ConfigMap
      data:
        disable-udp-aggregation: "true"
      metadata:
        name: udp-aggregation-config
        namespace: openshift-network-operator
    2. Create the ConfigMap resource by running the following command:

      $ oc create -f udp-aggregation-config.yaml
  2. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  3. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
  4. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:

    $ curl -k http://<HTTP_server>/worker.ign
  5. Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

     $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
    $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
    | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
  6. Move the downloaded RHEL live kernel, initramfs and rootfs files to an HTTP or HTTPS server before you launch virt-install.
  7. Create the new KVM guest nodes using the RHEL kernel, initramfs, and Ignition files; the new disk image; and adjusted parm line arguments.

    $ virt-install \
       --connect qemu:///system \
       --name <vm_name> \
       --autostart \
       --os-variant rhel9.4 \ 1
       --cpu host \
       --vcpus <vcpus> \
       --memory <memory_mb> \
       --disk <vm_name>.qcow2,size=<image_size> \
       --network network=<virt_network_parm> \
       --location <media_location>,kernel=<rhcos_kernel>,initrd=<rhcos_initrd> \ 2
       --extra-args "rd.neednet=1" \
       --extra-args "coreos.inst.install_dev=/dev/vda" \
       --extra-args "coreos.inst.ignition_url=http://<http_server>/worker.ign " \ 3
       --extra-args "coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img" \ 4
       --extra-args "ip=<ip>::<gateway>:<netmask>:<hostname>::none" \ 5
       --extra-args "nameserver=<dns>" \
       --extra-args "console=ttysclp0" \
       --noautoconsole \
       --wait
    1
    For os-variant, specify the RHEL version for the RHCOS compute machine. rhel9.4 is the recommended version. To query the supported RHEL version of your operating system, run the following command:
    $ osinfo-query os -f short-id
    Note

    The os-variant is case sensitive.

    2
    For --location, specify the location of the kernel/initrd on the HTTP or HTTPS server.
    3
    Specify the location of the worker.ign config file. Only HTTP and HTTPS protocols are supported.
    4
    Specify the location of the rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported
    5
    Optional: For hostname, specify the fully qualified hostname of the client machine.
    Note

    If you are using HAProxy as a load balancer, update your HAProxy rules for ingress-router-443 and ingress-router-80 in the /etc/haproxy/haproxy.cfg configuration file.

  8. Continue to create more compute machines for your cluster.

4.8.3. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.29.4
    master-1  Ready     master  63m  v1.29.4
    master-2  Ready     master  64m  v1.29.4

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  73m  v1.29.4
    master-1  Ready     master  73m  v1.29.4
    master-2  Ready     master  74m  v1.29.4
    worker-0  Ready     worker  11m  v1.29.4
    worker-1  Ready     worker  11m  v1.29.4

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

4.9. Creating a cluster with multi-architecture compute machines on IBM Power

To create a cluster with multi-architecture compute machines on IBM Power® (ppc64le), you must have an existing single-architecture (x86_64) cluster. You can then add ppc64le compute machines to your OpenShift Container Platform cluster.

Important

Before you can add ppc64le nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using an ISO image or network PXE booting. This will allow you to add ppc64le nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Power® (ppc64le) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Power®. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Note

Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

4.9.1. Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

  • You installed the OpenShift CLI (oc).
Note

When using multiple architectures, hosts for OpenShift Container Platform nodes must share the same storage layer. If they do not have the same storage layer, use a storage provider such as nfs-provisioner.

Note

You should limit the number of network hops between the compute and control plane as much as possible.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. You can check that your cluster uses the architecture payload by running the following command:

    $ oc adm release info -o jsonpath="{ .metadata.metadata}"

Verification

  • If you see the following output, your cluster is using the multi-architecture payload:

    {
     "release.openshift.io/architecture": "multi",
     "url": "https://access.redhat.com/errata/<errata_version>"
    }

    You can then begin adding multi-arch compute nodes to your cluster.

  • If you see the following output, your cluster is not using the multi-architecture payload:

    {
     "url": "https://access.redhat.com/errata/<errata_version>"
    }
    Important

    To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

4.9.2. Creating RHCOS machines using an ISO image

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using an ISO image to create the machines.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • You must have the OpenShift CLI (oc) installed.

Procedure

  1. Extract the Ignition config file from the cluster by running the following command:

    $ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
  2. Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URLs of these files.
  3. You can validate that the ignition files are available on the URLs. The following example gets the Ignition config files for the compute node:

    $ curl -k http://<HTTP_server>/worker.ign
  4. You can access the ISO image for booting your new machine by running to following command:

    RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.<architecture>.artifacts.metal.formats.iso.disk.location')
  5. Use the ISO file to install RHCOS on more compute machines. Use the same method that you used when you created machines before you installed the cluster:

    • Burn the ISO image to a disk and boot it directly.
    • Use ISO redirection with a LOM interface.
  6. Boot the RHCOS ISO image without specifying any options, or interrupting the live boot sequence. Wait for the installer to boot into a shell prompt in the RHCOS live environment.

    Note

    You can interrupt the RHCOS installation boot process to add kernel arguments. However, for this ISO procedure you must use the coreos-installer command as outlined in the following steps, instead of adding kernel arguments.

  7. Run the coreos-installer command and specify the options that meet your installation requirements. At a minimum, you must specify the URL that points to the Ignition config file for the node type, and the device that you are installing to:

    $ sudo coreos-installer install --ignition-url=http://<HTTP_server>/<node_type>.ign <device> --ignition-hash=sha512-<digest> 12
    1
    You must run the coreos-installer command by using sudo, because the core user does not have the required root privileges to perform the installation.
    2
    The --ignition-hash option is required when the Ignition config file is obtained through an HTTP URL to validate the authenticity of the Ignition config file on the cluster node. <digest> is the Ignition config file SHA512 digest obtained in a preceding step.
    Note

    If you want to provide your Ignition config files through an HTTPS server that uses TLS, you can add the internal certificate authority (CA) to the system trust store before running coreos-installer.

    The following example initializes a bootstrap node installation to the /dev/sda device. The Ignition config file for the bootstrap node is obtained from an HTTP web server with the IP address 192.168.1.2:

    $ sudo coreos-installer install --ignition-url=http://192.168.1.2:80/installation_directory/bootstrap.ign /dev/sda --ignition-hash=sha512-a5a2d43879223273c9b60af66b44202a1d1248fc01cf156c46d4a79f552b6bad47bc8cc78ddf0116e80c59d2ea9e32ba53bc807afbca581aa059311def2c3e3b
  8. Monitor the progress of the RHCOS installation on the console of the machine.

    Important

    Ensure that the installation is successful on each node before commencing with the OpenShift Container Platform installation. Observing the installation process can also help to determine the cause of RHCOS installation issues that might arise.

  9. Continue to create more compute machines for your cluster.

4.9.3. Creating RHCOS machines by PXE or iPXE booting

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using PXE or iPXE booting.

Prerequisites

  • Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
  • Obtain the URLs of the RHCOS ISO image, compressed metal BIOS, kernel, and initramfs files that you uploaded to your HTTP server during cluster installation.
  • You have access to the PXE booting infrastructure that you used to create the machines for your OpenShift Container Platform cluster during installation. The machines must boot from their local disks after RHCOS is installed on them.
  • If you use UEFI, you have access to the grub.conf file that you modified during OpenShift Container Platform installation.

Procedure

  1. Confirm that your PXE or iPXE installation for the RHCOS images is correct.

    • For PXE:

      DEFAULT pxeboot
      TIMEOUT 20
      PROMPT 0
      LABEL pxeboot
          KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> 1
          APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img 2
      1
      Specify the location of the live kernel file that you uploaded to your HTTP server.
      2
      Specify locations of the RHCOS files that you uploaded to your HTTP server. The initrd parameter value is the location of the live initramfs file, the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file, and the coreos.live.rootfs_url parameter value is the location of the live rootfs file. The coreos.inst.ignition_url and coreos.live.rootfs_url parameters only support HTTP and HTTPS.
      Note

      This configuration does not enable serial console access on machines with a graphical console. To configure a different console, add one or more console= arguments to the APPEND line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?.

    • For iPXE (x86_64 + ppc64le):

      kernel http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> initrd=main coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
      initrd --name main http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 3
      boot
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP server. The kernel parameter value is the location of the kernel file, the initrd=main argument is needed for booting on UEFI systems, the coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your HTTP server.
      Note

      This configuration does not enable serial console access on machines with a graphical console To configure a different console, add one or more console= arguments to the kernel line. For example, add console=tty0 console=ttyS0 to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux? and "Enabling the serial console for PXE and ISO installation" in the "Advanced RHCOS installation configuration" section.

      Note

      To network boot the CoreOS kernel on ppc64le architecture, you need to use a version of iPXE build with the IMAGE_GZIP option enabled. See IMAGE_GZIP option in iPXE.

    • For PXE (with UEFI and GRUB as second stage) on ppc64le:

      menuentry 'Install CoreOS' {
          linux rhcos-<version>-live-kernel-<architecture>  coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2
          initrd rhcos-<version>-live-initramfs.<architecture>.img 3
      }
      1
      Specify the locations of the RHCOS files that you uploaded to your HTTP/TFTP server. The kernel parameter value is the location of the kernel file on your TFTP server. The coreos.live.rootfs_url parameter value is the location of the rootfs file, and the coreos.inst.ignition_url parameter value is the location of the worker Ignition config file on your HTTP Server.
      2
      If you use multiple NICs, specify a single interface in the ip option. For example, to use DHCP on a NIC that is named eno1, set ip=eno1:dhcp.
      3
      Specify the location of the initramfs file that you uploaded to your TFTP server.
  2. Use the PXE or iPXE infrastructure to create the required compute machines for your cluster.

4.9.4. Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

  • You added machines to your cluster.

Procedure

  1. Confirm that the cluster recognizes the machines:

    $ oc get nodes

    Example output

    NAME      STATUS    ROLES   AGE  VERSION
    master-0  Ready     master  63m  v1.29.4
    master-1  Ready     master  63m  v1.29.4
    master-2  Ready     master  64m  v1.29.4

    The output lists all of the machines that you created.

    Note

    The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

  2. Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
    ...

    In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

  3. If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

    Note

    Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

    Note

    For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
      Note

      Some Operators might not become available until some CSRs are approved.

  4. Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

    $ oc get csr

    Example output

    NAME        AGE     REQUESTOR                                                                   CONDITION
    csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
    csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
    ...

  5. If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:

    • To approve them individually, run the following command for each valid CSR:

      $ oc adm certificate approve <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    • To approve all pending CSRs, run the following command:

      $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  6. After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

    $ oc get nodes -o wide

    Example output

    NAME               STATUS   ROLES                  AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                  CONTAINER-RUNTIME
    worker-0-ppc64le   Ready    worker                 42d   v1.29.5   192.168.200.21   <none>        Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.ppc64le   cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9
    worker-1-ppc64le   Ready    worker                 42d   v1.29.5   192.168.200.20   <none>        Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.ppc64le   cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9
    master-0-x86       Ready    control-plane,master   75d   v1.29.5   10.248.0.38      10.248.0.38   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9
    master-1-x86       Ready    control-plane,master   75d   v1.29.5   10.248.0.39      10.248.0.39   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9
    master-2-x86       Ready    control-plane,master   75d   v1.29.5   10.248.0.40      10.248.0.40   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9
    worker-0-x86       Ready    worker                 75d   v1.29.5   10.248.0.43      10.248.0.43   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9
    worker-1-x86       Ready    worker                 75d   v1.29.5   10.248.0.44      10.248.0.44   Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)   5.14.0-284.34.1.el9_2.x86_64    cri-o://1.29.5-3.rhaos4.15.gitb36169e.el9

    Note

    It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

4.10. Managing a cluster with multi-architecture compute machines

4.10.1. Scheduling workloads on clusters with multi-architecture compute machines

Deploying a workload on a cluster with compute nodes of different architectures requires attention and monitoring of your cluster. There might be further actions you need to take in order to successfully place pods in the nodes of your cluster.

You can use the Multiarch Tuning Operator to enable architecture-aware scheduling of workloads on clusters with multi-architecture compute machines. The Multiarch Tuning Operator implements additional scheduler predicates in the pods specifications based on the architectures that the pods can support at creation time. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

For more information on node affinity, scheduling, taints and tolerations, see the following documentation:

4.10.1.1. Sample multi-architecture node workload deployments

Before you schedule workloads on a cluster with compute nodes of different architectures, consider the following use cases:

Using node affinity to schedule workloads on a node

You can allow a workload to be scheduled on only a set of nodes with architectures supported by its images, you can set the spec.affinity.nodeAffinity field in your pod’s template specification.

Example deployment with the nodeAffinity set to certain architectures

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
   # ...
  template:
     # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values: 1
                - amd64
                - arm64

1
Specify the supported architectures. Valid values include amd64,arm64, or both values.
Tainting every node for a specific architecture

You can taint a node to avoid workloads that are not compatible with its architecture to be scheduled on that node. In the case where your cluster is using a MachineSet object, you can add parameters to the .spec.template.spec.taints field to avoid workloads being scheduled on nodes with non-supported architectures.

  • Before you can taint a node, you must scale down the MachineSet object or remove available machines. You can scale down the machine set by using one of following commands:

    $ oc scale --replicas=0 machineset <machineset> -n openshift-machine-api

    Or:

    $ oc edit machineset <machineset> -n openshift-machine-api

    For more information on scaling machine sets, see "Modifying a compute machine set".

Example MachineSet with a taint set

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      # ...
      taints:
      - effect: NoSchedule
        key: multi-arch.openshift.io/arch
        value: arm64

You can also set a taint on a specific node by running the following command:

$ oc adm taint nodes <node-name> multi-arch.openshift.io/arch=arm64:NoSchedule
Creating a default toleration

You can annotate a namespace so all of the workloads get the same default toleration by running the following command:

$ oc annotate namespace my-namespace \
  'scheduler.alpha.kubernetes.io/defaultTolerations'='[{"operator": "Exists", "effect": "NoSchedule", "key": "multi-arch.openshift.io/arch"}]'
Tolerating architecture taints in workloads

On a node with a defined taint, workloads will not be scheduled on that node. However, you can allow them to be scheduled by setting a toleration in the pod’s specification.

Example deployment with a toleration

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      tolerations:
      - key: "multi-arch.openshift.io/arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoSchedule"

This example deployment can also be allowed on nodes with the multi-arch.openshift.io/arch=arm64 taint specified.

Using node affinity with taints and tolerations

When a scheduler computes the set of nodes to schedule a pod, tolerations can broaden the set while node affinity restricts the set. If you set a taint to the nodes of a specific architecture, the following example toleration is required for scheduling pods.

Example deployment with a node affinity and toleration set.

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
      tolerations:
      - key: "multi-arch.openshift.io/arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoSchedule"

Additional resources

4.10.2. Enabling 64k pages on the Red Hat Enterprise Linux CoreOS (RHCOS) kernel

You can enable the 64k memory page in the Red Hat Enterprise Linux CoreOS (RHCOS) kernel on the 64-bit ARM compute machines in your cluster. The 64k page size kernel specification can be used for large GPU or high memory workloads. This is done using the Machine Config Operator (MCO) which uses a machine config pool to update the kernel. To enable 64k page sizes, you must dedicate a machine config pool for ARM64 to enable on the kernel.

Important

Using 64k pages is exclusive to 64-bit ARM architecture compute nodes or clusters installed on 64-bit ARM machines. If you configure the 64k pages kernel on a machine config pool using 64-bit x86 machines, the machine config pool and MCO will degrade.

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You created a cluster with compute nodes of different architecture on one of the supported platforms.

Procedure

  1. Label the nodes where you want to run the 64k page size kernel:

    $ oc label node <node_name> <label>

    Example command

    $ oc label node worker-arm64-01 node-role.kubernetes.io/worker-64k-pages=

  2. Create a machine config pool that contains the worker role that uses the ARM64 architecture and the worker-64k-pages role:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      name: worker-64k-pages
    spec:
      machineConfigSelector:
        matchExpressions:
          - key: machineconfiguration.openshift.io/role
            operator: In
            values:
            - worker
            - worker-64k-pages
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker-64k-pages: ""
          kubernetes.io/arch: arm64
  3. Create a machine config on your compute node to enable 64k-pages with the 64k-pages parameter.

    $ oc create -f <filename>.yaml

    Example MachineConfig

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "worker-64k-pages" 1
      name: 99-worker-64kpages
    spec:
      kernelType: 64k-pages 2

    1
    Specify the value of the machineconfiguration.openshift.io/role label in the custom machine config pool. The example MachineConfig uses the worker-64k-pages label to enable 64k pages in the worker-64k-pages pool.
    2
    Specify your desired kernel type. Valid values are 64k-pages and default
    Note

    The 64k-pages type is supported on only 64-bit ARM architecture based compute nodes. The realtime type is supported on only 64-bit x86 architecture based compute nodes.

Verification

  • To view your new worker-64k-pages machine config pool, run the following command:

    $ oc get mcp

    Example output

    NAME     CONFIG                                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master   rendered-master-9d55ac9a91127c36314e1efe7d77fbf8                      True      False      False      3              3                   3                     0                      361d
    worker   rendered-worker-e7b61751c4a5b7ff995d64b967c421ff                      True      False      False      7              7                   7                     0                      361d
    worker-64k-pages  rendered-worker-64k-pages-e7b61751c4a5b7ff995d64b967c421ff   True      False      False      2              2                   2                     0                      35m

4.10.3. Importing manifest lists in image streams on your multi-architecture compute machines

On an OpenShift Container Platform 4.16 cluster with multi-architecture compute machines, the image streams in the cluster do not import manifest lists automatically. You must manually change the default importMode option to the PreserveOriginal option in order to import the manifest list.

Prerequisites

  • You installed the OpenShift Container Platform CLI (oc).

Procedure

  • The following example command shows how to patch the ImageStream cli-artifacts so that the cli-artifacts:latest image stream tag is imported as a manifest list.

    $ oc patch is/cli-artifacts -n openshift -p '{"spec":{"tags":[{"name":"latest","importPolicy":{"importMode":"PreserveOriginal"}}]}}'

Verification

  • You can check that the manifest lists imported properly by inspecting the image stream tag. The following command will list the individual architecture manifests for a particular tag.

    $ oc get istag cli-artifacts:latest -n openshift -oyaml

    If the dockerImageManifests object is present, then the manifest list import was successful.

    Example output of the dockerImageManifests object

    dockerImageManifests:
      - architecture: amd64
        digest: sha256:16d4c96c52923a9968fbfa69425ec703aff711f1db822e4e9788bf5d2bee5d77
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux
      - architecture: arm64
        digest: sha256:6ec8ad0d897bcdf727531f7d0b716931728999492709d19d8b09f0d90d57f626
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux
      - architecture: ppc64le
        digest: sha256:65949e3a80349cdc42acd8c5b34cde6ebc3241eae8daaeea458498fedb359a6a
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux
      - architecture: s390x
        digest: sha256:75f4fa21224b5d5d511bea8f92dfa8e1c00231e5c81ab95e83c3013d245d1719
        manifestSize: 1252
        mediaType: application/vnd.docker.distribution.manifest.v2+json
        os: linux

4.11. Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator

The Multiarch Tuning Operator optimizes workload management within multi-architecture clusters and in single-architecture clusters transitioning to multi-architecture environments.

Architecture-aware workload scheduling allows the scheduler to place pods onto nodes that match the architecture of the pod images.

By default, the scheduler does not consider the architecture of a pod’s container images when determining the placement of new pods onto nodes.

To enable architecture-aware workload scheduling, you must create the ClusterPodPlacementConfig object. When you create the ClusterPodPlacementConfig object, the Multiarch Tuning Operator deploys the necessary operands to support architecture-aware workload scheduling.

When a pod is created, the operands perform the following actions:

  1. Add the multiarch.openshift.io/scheduling-gate scheduling gate that prevents the scheduling of the pod.
  2. Compute a scheduling predicate that includes the supported architecture values for the kubernetes.io/arch label.
  3. Integrate the scheduling predicate as a nodeAffinity requirement in the pod specification.
  4. Remove the scheduling gate from the pod.
Important

Note the following operand behaviors:

  • If the nodeSelector field is already configured with the kubernetes.io/arch label for a workload, the operand does not update the nodeAffinity field for that workload.
  • If the nodeSelector field is not configured with the kubernetes.io/arch label for a workload, the operand updates the nodeAffinity field for that workload. However, in that nodeAffinity field, the operand updates only the node selector terms that are not configured with the kubernetes.io/arch label.
  • If the nodeName field is already set, the Multiarch Tuning Operator does not process the pod.

4.11.1. Installing the Multiarch Tuning Operator by using the CLI

You can install the Multiarch Tuning Operator by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.

Procedure

  1. Create a new project named openshift-multiarch-tuning-operator by running the following command:

    $ oc create ns openshift-multiarch-tuning-operator
  2. Create an OperatorGroup object:

    1. Create a YAML file with the configuration for creating an OperatorGroup object.

      Example YAML configuration for creating an OperatorGroup object

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: openshift-multiarch-tuning-operator
        namespace: openshift-multiarch-tuning-operator
      spec: {}

    2. Create the OperatorGroup object by running the following command:

      $ oc create -f <file_name> 1
      1
      Replace <file_name> with the name of the YAML file that contains the OperatorGroup object configuration.
  3. Create a Subscription object:

    1. Create a YAML file with the configuration for creating a Subscription object.

      Example YAML configuration for creating a Subscription object

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: openshift-multiarch-tuning-operator
        namespace: openshift-multiarch-tuning-operator
      spec:
        channel: stable
        name: multiarch-tuning-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
        installPlanApproval: Automatic
        startingCSV: multiarch-tuning-operator.v1.0.0

    2. Create the Subscription object by running the following command:

      $ oc create -f <file_name> 1
      1
      Replace <file_name> with the name of the YAML file that contains the Subscription object configuration.
Note

For more details about configuring the Subscription object and OperatorGroup object, see "Installing from OperatorHub using the CLI".

Verification

  1. To verify that the Multiarch Tuning Operator is installed, run the following command:

    $ oc get csv -n openshift-multiarch-tuning-operator

    Example output

    NAME                               DISPLAY                     VERSION   REPLACES                              PHASE
    multiarch-tuning-operator.v1.0.0   Multiarch Tuning Operator   1.0.0     multiarch-tuning-operator.v0.9.0      Succeeded

    The installation is successful if the Operator is in Succeeded phase.

  2. Optional: To verify that the OperatorGroup object is created, run the following command:

    $ oc get operatorgroup -n openshift-multiarch-tuning-operator

    Example output

    NAME                                        AGE
    openshift-multiarch-tuning-operator-q8zbb   133m

  3. Optional: To verify that the Subscription object is created, run the following command:

    $ oc get subscription -n openshift-multiarch-tuning-operator

    Example output

    NAME                        PACKAGE                     SOURCE                  CHANNEL
    multiarch-tuning-operator   multiarch-tuning-operator   redhat-operators        stable

4.11.2. Installing the Multiarch Tuning Operator by using the web console

You can install the Multiarch Tuning Operator by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to Operators → OperatorHub.
  3. Enter Multiarch Tuning Operator in the search field.
  4. Click Multiarch Tuning Operator.
  5. Select the Multiarch Tuning Operator version from the Version list.
  6. Click Install
  7. Set the following options on the Operator Installation page:

    1. Set Update Channel to stable.
    2. Set Installation Mode to All namespaces on the cluster.
    3. Set Installed Namespace to Operator recommended Namespace or Select a Namespace.

      The recommended Operator namespace is openshift-multiarch-tuning-operator. If the openshift-multiarch-tuning-operator namespace does not exist, it is created during the operator installation.

      If you select Select a namespace, you must select a namespace for the Operator from the Select Project list.

    4. Update approval as Automatic or Manual.

      If you select Automatic updates, Operator Lifecycle Manager (OLM) automatically updates the running instance of the Multiarch Tuning Operator without any intervention.

      If you select Manual updates, OLM creates an update request. As a cluster administrator, you must manually approve the update request to update the Multiarch Tuning Operator to a newer version.

  8. Optional: Select the Enable Operator recommended cluster monitoring on this Namespace checkbox.
  9. Click Install.

Verification

  1. Navigate to OperatorsInstalled Operators.
  2. Verify that the Multiarch Tuning Operator is listed with the Status field as Succeeded in the openshift-multiarch-tuning-operator namespace.

4.11.3. Multiarch Tuning Operator pod labels and architecture support overview

After installing the Multiarch Tuning Operator, you can verify the multi-architecture support for workloads in your cluster. You can identify and manage pods based on their architecture compatibility by using the pod labels. These labels are automatically set on the newly created pods to provide insights into their architecture support.

The following table describes the labels that the Multiarch Tuning Operator adds when you create a pod:

Table 4.2. Pod labels that the Multiarch Tuning Operator adds when you create a pod
LabelDescription

multiarch.openshift.io/multi-arch: ""

The pod supports multiple architectures.

multiarch.openshift.io/single-arch: ""

The pod supports only a single architecture.

multiarch.openshift.io/arm64: ""

The pod supports the arm64 architecture.

multiarch.openshift.io/amd64: ""

The pod supports the amd64 architecture.

multiarch.openshift.io/ppc64le: ""

The pod supports the ppc64le architecture.

multiarch.openshift.io/s390x: ""

The pod supports the s390x architecture.

multirach.openshift.io/node-affinity: set

The Operator has set the node affinity requirement for the architecture.

multirach.openshift.io/node-affinity: not-set

The Operator did not set the node affinity requirement. For example, when the pod already has a node affinity for the architecture, the Multiarch Tuning Operator adds this label to the pod.

multiarch.openshift.io/scheduling-gate: gated

The pod is gated.

multiarch.openshift.io/scheduling-gate: removed

The pod gate has been removed.

multiarch.openshift.io/inspection-error: ""

An error has occurred while building the node affinity requirements.

4.11.4. Creating the ClusterPodPlacementConfig object

After installing the Multiarch Tuning Operator, you must create the ClusterPodPlacementConfig object. When you create this object, the Multiarch Tuning Operator deploys an operand that enables architecture-aware workload scheduling.

Note

You can create only one instance of the ClusterPodPlacementConfig object.

Example ClusterPodPlacementConfig object configuration

apiVersion: multiarch.openshift.io/v1beta1
kind: ClusterPodPlacementConfig
metadata:
  name: cluster 1
spec:
  logVerbosityLevel: Normal 2
  namespaceSelector: 3
    matchExpressions:
      - key: multiarch.openshift.io/exclude-pod-placement
        operator: DoesNotExist

1
You must set this field value to cluster.
2
Optional: You can set the field value to Normal, Debug, Trace, or TraceAll. The value is set to Normal by default.
3
Optional: You can configure the namespaceSelector to select the namespaces in which the Multiarch Tuning Operator’s pod placement operand must process the nodeAffinity of the pods. All namespaces are considered by default.

In this example, the operator field value is set to DoesNotExist. Therefore, if the key field value (multiarch.openshift.io/exclude-pod-placement) is set as a label in a namespace, the operand does not process the nodeAffinity of the pods in that namespace. Instead, the operand processes the nodeAffinity of the pods in namespaces that do not contain the label.

If you want the operand to process the nodeAffinity of the pods only in specific namespaces, you can configure the namespaceSelector as follows:

namespaceSelector:
  matchExpressions:
    - key: multiarch.openshift.io/include-pod-placement
      operator: Exists

In this example, the operator field value is set to Exists. Therefore, the operand processes the nodeAffinity of the pods only in namespaces that contain the multiarch.openshift.io/include-pod-placement label.

Important

This Operator excludes pods in namespaces starting with kube-. It also excludes pods that are expected to be scheduled on control plane nodes.

4.11.4.1. Creating the ClusterPodPlacementConfig object by using the CLI

To deploy the pod placement operand that enables architecture-aware workload scheduling, you can create the ClusterPodPlacementConfig object by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.
  • You have installed the Multiarch Tuning Operator.

Procedure

  1. Create a ClusterPodPlacementConfig object YAML file:

    Example ClusterPodPlacementConfig object configuration

    apiVersion: multiarch.openshift.io/v1beta1
    kind: ClusterPodPlacementConfig
    metadata:
      name: cluster
    spec:
      logVerbosityLevel: Normal
      namespaceSelector:
        matchExpressions:
          - key: multiarch.openshift.io/exclude-pod-placement
            operator: DoesNotExist

  2. Create the ClusterPodPlacementConfig object by running the following command:

    $ oc create -f <file_name> 1
    1
    Replace <file_name> with the name of the ClusterPodPlacementConfig object YAML file.

Verification

  • To check that the ClusterPodPlacementConfig object is created, run the following command:

    $ oc get clusterpodplacementconfig

    Example output

    NAME      AGE
    cluster   29s

4.11.4.2. Creating the ClusterPodPlacementConfig object by using the web console

To deploy the pod placement operand that enables architecture-aware workload scheduling, you can create the ClusterPodPlacementConfig object by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the Multiarch Tuning Operator.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to OperatorsInstalled Operators.
  3. On the Installed Operators page, click Multiarch Tuning Operator.
  4. Click the Cluster Pod Placement Config tab.
  5. Select either Form view or YAML view.
  6. Configure the ClusterPodPlacementConfig object parameters.
  7. Click Create.
  8. Optional: If you want to edit the ClusterPodPlacementConfig object, perform the following actions:

    1. Click the Cluster Pod Placement Config tab.
    2. Select Edit ClusterPodPlacementConfig from the options menu.
    3. Click YAML and edit the ClusterPodPlacementConfig object parameters.
    4. Click Save.

Verification

  • On the Cluster Pod Placement Config page, check that the ClusterPodPlacementConfig object is in the Ready state.

4.11.5. Deleting the ClusterPodPlacementConfig object by using the CLI

You can create only one instance of the ClusterPodPlacementConfig object. If you want to re-create this object, you must first delete the existing instance.

You can delete this object by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.

Procedure

  1. Log in to the OpenShift CLI (oc).
  2. Delete the ClusterPodPlacementConfig object by running the following command:

    $ oc delete clusterpodplacementconfig cluster

Verification

  • To check that the ClusterPodPlacementConfig object is deleted, run the following command:

    $ oc get clusterpodplacementconfig

    Example output

    No resources found

4.11.6. Deleting the ClusterPodPlacementConfig object by using the web console

You can create only one instance of the ClusterPodPlacementConfig object. If you want to re-create this object, you must first delete the existing instance.

You can delete this object by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have created the ClusterPodPlacementConfig object.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to OperatorsInstalled Operators.
  3. On the Installed Operators page, click Multiarch Tuning Operator.
  4. Click the Cluster Pod Placement Config tab.
  5. Select Delete ClusterPodPlacementConfig from the options menu.
  6. Click Delete.

Verification

  • On the Cluster Pod Placement Config page, check that the ClusterPodPlacementConfig object has been deleted.

4.11.7. Uninstalling the Multiarch Tuning Operator by using the CLI

You can uninstall the Multiarch Tuning Operator by using the OpenShift CLI (oc).

Prerequisites

  • You have installed oc.
  • You have logged in to oc as a user with cluster-admin privileges.
  • You deleted the ClusterPodPlacementConfig object.

    Important

    You must delete the ClusterPodPlacementConfig object before uninstalling the Multiarch Tuning Operator. Uninstalling the Operator without deleting the ClusterPodPlacementConfig object can lead to unexpected behavior.

Procedure

  1. Get the Subscription object name for the Multiarch Tuning Operator by running the following command:

    $ oc get subscription.operators.coreos.com -n <namespace> 1
    1
    Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    NAME                                  PACKAGE                     SOURCE             CHANNEL
    openshift-multiarch-tuning-operator   multiarch-tuning-operator   redhat-operators   stable

  2. Get the currentCSV value for the Multiarch Tuning Operator by running the following command:

    $ oc get subscription.operators.coreos.com <subscription_name> -n <namespace> -o yaml | grep currentCSV 1
    1
    Replace <subscription_name> with the Subscription object name. For example: openshift-multiarch-tuning-operator. Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    currentCSV: multiarch-tuning-operator.v1.0.0

  3. Delete the Subscription object by running the following command:

    $ oc delete subscription.operators.coreos.com <subscription_name> -n <namespace> 1
    1
    Replace <subscription_name> with the Subscription object name. Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    subscription.operators.coreos.com "openshift-multiarch-tuning-operator" deleted

  4. Delete the CSV for the Multiarch Tuning Operator in the target namespace using the currentCSV value by running the following command:

    $ oc delete clusterserviceversion <currentCSV_value> -n <namespace> 1
    1
    Replace <currentCSV> with the currentCSV value for the Multiarch Tuning Operator. For example: multiarch-tuning-operator.v1.0.0. Replace <namespace> with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.

    Example output

    clusterserviceversion.operators.coreos.com "multiarch-tuning-operator.v1.0.0" deleted

Verification

  • To verify that the Multiarch Tuning Operator is uninstalled, run the following command:

    $ oc get csv -n <namespace> 1
    1
    Replace <namespace> with the name of the namespace where you have uninstalled the Multiarch Tuning Operator.

    Example output

    No resources found in openshift-multiarch-tuning-operator namespace.

4.11.8. Uninstalling the Multiarch Tuning Operator by using the web console

You can uninstall the Multiarch Tuning Operator by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster with cluster-admin permissions.
  • You deleted the ClusterPodPlacementConfig object.

    Important

    You must delete the ClusterPodPlacementConfig object before uninstalling the Multiarch Tuning Operator. Uninstalling the Operator without deleting the ClusterPodPlacementConfig object can lead to unexpected behavior.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to Operators → OperatorHub.
  3. Enter Multiarch Tuning Operator in the search field.
  4. Click Multiarch Tuning Operator.
  5. Click the Details tab.
  6. From the Actions menu, select Uninstall Operator.
  7. When prompted, click Uninstall.

Verification

  1. Navigate to OperatorsInstalled Operators.
  2. On the Installed Operators page, verify that the Multiarch Tuning Operator is not listed.

4.12. Multiarch Tuning Operator release notes

The Multiarch Tuning Operator optimizes workload management within multi-architecture clusters and in single-architecture clusters transitioning to multi-architecture environments.

These release notes track the development of the Multiarch Tuning Operator.

For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

4.12.1. Release notes for the Multiarch Tuning Operator 1.0.0

Issued: 31 October 2024

4.12.1.1. New features and enhancements
  • With this release, the Multiarch Tuning Operator supports custom network scenarios and cluster-wide custom registries configurations.
  • With this release, you can identify pods based on their architecture compatibility by using the pod labels that the Multiarch Tuning Operator adds to newly created pods.
  • With this release, you can monitor the behavior of the Multiarch Tuning Operator by using the metrics and alerts that are registered in the Cluster Monitoring Operator.

Chapter 5. Postinstallation cluster tasks

After installing OpenShift Container Platform, you can further expand and customize your cluster to your requirements.

5.1. Available cluster customizations

You complete most of the cluster configuration and customization after you deploy your OpenShift Container Platform cluster. A number of configuration resources are available.

Note

If you install your cluster on IBM Z®, not all features and functions are available.

You modify the configuration resources to configure the major features of the cluster, such as the image registry, networking configuration, image build behavior, and the identity provider.

For current documentation of the settings that you control by using these resources, use the oc explain command, for example oc explain builds --api-version=config.openshift.io/v1

5.1.1. Cluster configuration resources

All cluster configuration resources are globally scoped (not namespaced) and named cluster.

Resource nameDescription

apiserver.config.openshift.io

Provides API server configuration such as certificates and certificate authorities.

authentication.config.openshift.io

Controls the identity provider and authentication configuration for the cluster.

build.config.openshift.io

Controls default and enforced configuration for all builds on the cluster.

console.config.openshift.io

Configures the behavior of the web console interface, including the logout behavior.

featuregate.config.openshift.io

Enables FeatureGates so that you can use Tech Preview features.

image.config.openshift.io

Configures how specific image registries should be treated (allowed, disallowed, insecure, CA details).

ingress.config.openshift.io

Configuration details related to routing such as the default domain for routes.

oauth.config.openshift.io

Configures identity providers and other behavior related to internal OAuth server flows.

project.config.openshift.io

Configures how projects are created including the project template.

proxy.config.openshift.io

Defines proxies to be used by components needing external network access. Note: not all components currently consume this value.

scheduler.config.openshift.io

Configures scheduler behavior such as profiles and default node selectors.

5.1.2. Operator configuration resources

These configuration resources are cluster-scoped instances, named cluster, which control the behavior of a specific component as owned by a particular Operator.

Resource nameDescription

consoles.operator.openshift.io

Controls console appearance such as branding customizations

config.imageregistry.operator.openshift.io

Configures OpenShift image registry settings such as public routing, log levels, proxy settings, resource constraints, replica counts, and storage type.

config.samples.operator.openshift.io

Configures the Samples Operator to control which example image streams and templates are installed on the cluster.

5.1.3. Additional configuration resources

These configuration resources represent a single instance of a particular component. In some cases, you can request multiple instances by creating multiple instances of the resource. In other cases, the Operator can use only a specific resource instance name in a specific namespace. Reference the component-specific documentation for details on how and when you can create additional resource instances.

Resource nameInstance nameNamespaceDescription

alertmanager.monitoring.coreos.com

main

openshift-monitoring

Controls the Alertmanager deployment parameters.

ingresscontroller.operator.openshift.io

default

openshift-ingress-operator

Configures Ingress Operator behavior such as domain, number of replicas, certificates, and controller placement.

5.1.4. Informational Resources

You use these resources to retrieve information about the cluster. Some configurations might require you to edit these resources directly.

Resource nameInstance nameDescription

clusterversion.config.openshift.io

version

In OpenShift Container Platform 4.16, you must not customize the ClusterVersion resource for production clusters. Instead, follow the process to update a cluster.

dns.config.openshift.io

cluster

You cannot modify the DNS settings for your cluster. You can check the DNS Operator status.

infrastructure.config.openshift.io

cluster

Configuration details allowing the cluster to interact with its cloud provider.

network.config.openshift.io

cluster

You cannot modify your cluster networking after installation. To customize your network, follow the process to customize networking during installation.

5.2. Adding worker nodes

After you deploy your OpenShift Container Platform cluster, you can add worker nodes to scale cluster resources. There are different ways you can add worker nodes depending on the installation method and the environment of your cluster.

5.2.1. Adding worker nodes to installer-provisioned infrastructure clusters

For installer-provisioned infrastructure clusters, you can manually or automatically scale the MachineSet object to match the number of available bare-metal hosts.

To add a bare-metal host, you must configure all network prerequisites, configure an associated baremetalhost object, then provision the worker node to the cluster. You can add a bare-metal host manually or by using the web console.

5.2.2. Adding worker nodes to user-provisioned infrastructure clusters

For user-provisioned infrastructure clusters, you can add worker nodes by using a RHEL or RHCOS ISO image and connecting it to your cluster using cluster Ignition config files. For RHEL worker nodes, the following example uses Ansible playbooks to add worker nodes to the cluster. For RHCOS worker nodes, the following example uses an ISO image and network booting to add worker nodes to the cluster.

5.2.3. Adding worker nodes to clusters managed by the Assisted Installer

For clusters managed by the Assisted Installer, you can add worker nodes by using the Red Hat OpenShift Cluster Manager console, the Assisted Installer REST API or you can manually add worker nodes using an ISO image and cluster Ignition config files.

5.2.4. Adding worker nodes to clusters managed by the multicluster engine for Kubernetes

For clusters managed by the multicluster engine for Kubernetes, you can add worker nodes by using the dedicated multicluster engine console.

5.3. Adjust worker nodes

If you incorrectly sized the worker nodes during deployment, adjust them by creating one or more new compute machine sets, scale them up, then scale the original compute machine set down before removing them.

5.3.1. Understanding the difference between compute machine sets and the machine config pool

MachineSet objects describe OpenShift Container Platform nodes with respect to the cloud or machine provider.

The MachineConfigPool object allows MachineConfigController components to define and provide the status of machines in the context of upgrades.

The MachineConfigPool object allows users to configure how upgrades are rolled out to the OpenShift Container Platform nodes in the machine config pool.

The NodeSelector object can be replaced with a reference to the MachineSet object.

5.3.2. Scaling a compute machine set manually

To add or remove an instance of a machine in a compute machine set, you can manually scale the compute machine set.

This guidance is relevant to fully automated, installer-provisioned infrastructure installations. Customized, user-provisioned infrastructure installations do not have compute machine sets.

Prerequisites

  • Install an OpenShift Container Platform cluster and the oc command line.
  • Log in to oc as a user with cluster-admin permission.

Procedure

  1. View the compute machine sets that are in the cluster by running the following command:

    $ oc get machinesets.machine.openshift.io -n openshift-machine-api

    The compute machine sets are listed in the form of <clusterid>-worker-<aws-region-az>.

  2. View the compute machines that are in the cluster by running the following command:

    $ oc get machines.machine.openshift.io -n openshift-machine-api
  3. Set the annotation on the compute machine that you want to delete by running the following command:

    $ oc annotate machines.machine.openshift.io/<machine_name> -n openshift-machine-api machine.openshift.io/delete-machine="true"
  4. Scale the compute machine set by running one of the following commands:

    $ oc scale --replicas=2 machinesets.machine.openshift.io <machineset> -n openshift-machine-api

    Or:

    $ oc edit machinesets.machine.openshift.io <machineset> -n openshift-machine-api
    Tip

    You can alternatively apply the following YAML to scale the compute machine set:

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      name: <machineset>
      namespace: openshift-machine-api
    spec:
      replicas: 2

    You can scale the compute machine set up or down. It takes several minutes for the new machines to be available.

    Important

    By default, the machine controller tries to drain the node that is backed by the machine until it succeeds. In some situations, such as with a misconfigured pod disruption budget, the drain operation might not be able to succeed. If the drain operation fails, the machine controller cannot proceed removing the machine.

    You can skip draining the node by annotating machine.openshift.io/exclude-node-draining in a specific machine.

Verification

  • Verify the deletion of the intended machine by running the following command:

    $ oc get machines.machine.openshift.io

5.3.3. The compute machine set deletion policy

Random, Newest, and Oldest are the three supported deletion options. The default is Random, meaning that random machines are chosen and deleted when scaling compute machine sets down. The deletion policy can be set according to the use case by modifying the particular compute machine set:

spec:
  deletePolicy: <delete_policy>
  replicas: <desired_replica_count>

Specific machines can also be prioritized for deletion by adding the annotation machine.openshift.io/delete-machine=true to the machine of interest, regardless of the deletion policy.

Important

By default, the OpenShift Container Platform router pods are deployed on workers. Because the router is required to access some cluster resources, including the web console, do not scale the worker compute machine set to 0 unless you first relocate the router pods.

Note

Custom compute machine sets can be used for use cases requiring that services run on specific nodes and that those services are ignored by the controller when the worker compute machine sets are scaling down. This prevents service disruption.

5.3.4. Creating default cluster-wide node selectors

You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.

With cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.

You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a compute machine set, or a machine config. Adding the label to the compute machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.

Note

You can add additional key/value pairs to a pod. But you cannot add a different value for a default key.

Procedure

To add a default cluster-wide node selector:

  1. Edit the Scheduler Operator CR to add the default cluster-wide node selectors:

    $ oc edit scheduler cluster

    Example Scheduler Operator CR with a node selector

    apiVersion: config.openshift.io/v1
    kind: Scheduler
    metadata:
      name: cluster
    ...
    spec:
      defaultNodeSelector: type=user-node,region=east 1
      mastersSchedulable: false

    1
    Add a node selector with the appropriate <key>:<value> pairs.

    After making this change, wait for the pods in the openshift-kube-apiserver project to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.

  2. Add labels to a node by using a compute machine set or editing the node directly:

    • Use a compute machine set to add labels to nodes managed by the compute machine set when a node is created:

      1. Run the following command to add labels to a MachineSet object:

        $ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]'  -n openshift-machine-api 1
        1
        Add a <key>/<value> pair for each label.

        For example:

        $ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]'  -n openshift-machine-api
        Tip

        You can alternatively apply the following YAML to add labels to a compute machine set:

        apiVersion: machine.openshift.io/v1beta1
        kind: MachineSet
        metadata:
          name: <machineset>
          namespace: openshift-machine-api
        spec:
          template:
            spec:
              metadata:
                labels:
                  region: "east"
                  type: "user-node"
      2. Verify that the labels are added to the MachineSet object by using the oc edit command:

        For example:

        $ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-api

        Example MachineSet object

        apiVersion: machine.openshift.io/v1beta1
        kind: MachineSet
          ...
        spec:
          ...
          template:
            metadata:
          ...
            spec:
              metadata:
                labels:
                  region: east
                  type: user-node
          ...

      3. Redeploy the nodes associated with that compute machine set by scaling down to 0 and scaling up the nodes:

        For example:

        $ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
        $ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
      4. When the nodes are ready and available, verify that the label is added to the nodes by using the oc get command:

        $ oc get nodes -l <key>=<value>

        For example:

        $ oc get nodes -l type=user-node

        Example output

        NAME                                       STATUS   ROLES    AGE   VERSION
        ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp   Ready    worker   61s   v1.29.4

    • Add labels directly to a node:

      1. Edit the Node object for the node:

        $ oc label nodes <name> <key>=<value>

        For example, to label a node:

        $ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=east
        Tip

        You can alternatively apply the following YAML to add labels to a node:

        kind: Node
        apiVersion: v1
        metadata:
          name: <node_name>
          labels:
            type: "user-node"
            region: "east"
      2. Verify that the labels are added to the node using the oc get command:

        $ oc get nodes -l <key>=<value>,<key>=<value>

        For example:

        $ oc get nodes -l type=user-node,region=east

        Example output

        NAME                                       STATUS   ROLES    AGE   VERSION
        ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49   Ready    worker   17m   v1.29.4

5.4. Improving cluster stability in high latency environments using worker latency profiles

If the cluster administrator has performed latency tests for platform verification, they can discover the need to adjust the operation of the cluster to ensure stability in cases of high latency. The cluster administrator needs to change only one parameter, recorded in a file, which controls four parameters affecting how supervisory processes read status and interpret the health of the cluster. Changing only the one parameter provides cluster tuning in an easy, supportable manner.

The Kubelet process provides the starting point for monitoring cluster health. The Kubelet sets status values for all nodes in the OpenShift Container Platform cluster. The Kubernetes Controller Manager (kube controller) reads the status values every 10 seconds, by default. If the kube controller cannot read a node status value, it loses contact with that node after a configured period. The default behavior is:

  1. The node controller on the control plane updates the node health to Unhealthy and marks the node Ready condition`Unknown`.
  2. In response, the scheduler stops scheduling pods to that node.
  3. The Node Lifecycle Controller adds a node.kubernetes.io/unreachable taint with a NoExecute effect to the node and schedules any pods on the node for eviction after five minutes, by default.

This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager might not receive an update from a healthy node due to network latency. The Kubelet evicts pods from the node even though the node is healthy.

To avoid this problem, you can use worker latency profiles to adjust the frequency that the Kubelet and the Kubernetes Controller Manager wait for status updates before taking action. These adjustments help to ensure that your cluster runs properly if network latency between the control plane and the worker nodes is not optimal.

These worker latency profiles contain three sets of parameters that are predefined with carefully tuned values to control the reaction of the cluster to increased latency. There is no need to experime