Postinstallation configuration
Day 2 operations for OpenShift Container Platform
Abstract
Chapter 1. Postinstallation configuration overview
After installing OpenShift Container Platform, a cluster administrator can configure and customize the following components:
- Machine
- Bare metal
- Cluster
- Node
- Network
- Storage
- Users
- Alerts and notifications
1.1. Post-installation configuration tasks
You can perform the post-installation configuration tasks to configure your environment to meet your need.
The following lists details these configurations:
-
Configure operating system features: The Machine Config Operator (MCO) manages
MachineConfig
objects. By using the MCO, you can configure nodes and custom resources. Configure cluster features. You can modify the following features of an OpenShift Container Platform cluster:
- Image registry
- Networking configuration
- Image build behavior
- Identity provider
- The etcd configuration
- Machine set creation to handle the workloads
- Cloud provider credential management
Configuring a private cluster: By default, the installation program provisions OpenShift Container Platform by using a publicly accessible DNS and endpoints. To make your cluster accessible only from within an internal network, configure the following components to make them private:
- DNS
- Ingress Controller
- API server
Perform node operations: By default, OpenShift Container Platform uses Red Hat Enterprise Linux CoreOS (RHCOS) compute machines. You can perform the following node operations:
- Add and remove compute machines.
- Add and remove taints and tolerations.
- Configure the maximum number of pods per node.
- Enable Device Manager.
- Configure users: OAuth access tokens allow users to authenticate themselves to the API. You can configure OAuth to perform the following tasks:
- Specify an identity provider
- Use role-based access control to define and supply permissions to users
- Install an Operator from OperatorHub
- Configuring alert notifications: By default, firing alerts are displayed on the Alerting UI of the web console. You can also configure OpenShift Container Platform to send alert notifications to external systems.
Chapter 2. Configuring a private cluster
After you install an OpenShift Container Platform version 4.17 cluster, you can set some of its core components to be private.
2.1. About private clusters
By default, OpenShift Container Platform is provisioned using publicly-accessible DNS and endpoints. You can set the DNS, Ingress Controller, and API server to private after you deploy your private cluster.
If the cluster has any public subnets, load balancer services created by administrators might be publicly accessible. To ensure cluster security, verify that these services are explicitly annotated as private.
DNS
If you install OpenShift Container Platform on installer-provisioned infrastructure, the installation program creates records in a pre-existing public zone and, where possible, creates a private zone for the cluster’s own DNS resolution. In both the public zone and the private zone, the installation program or cluster creates DNS entries for *.apps
, for the Ingress
object, and api
, for the API server.
The *.apps
records in the public and private zone are identical, so when you delete the public zone, the private zone seamlessly provides all DNS resolution for the cluster.
Ingress Controller
Because the default Ingress
object is created as public, the load balancer is internet-facing and in the public subnets.
The Ingress Operator generates a default certificate for an Ingress Controller to serve as a placeholder until you configure a custom default certificate. Do not use Operator-generated default certificates in production clusters. The Ingress Operator does not rotate its own signing certificate or the default certificates that it generates. Operator-generated default certificates are intended as placeholders for custom default certificates that you configure.
API server
By default, the installation program creates appropriate network load balancers for the API server to use for both internal and external traffic.
On Amazon Web Services (AWS), separate public and private load balancers are created. The load balancers are identical except that an additional port is available on the internal one for use within the cluster. Although the installation program automatically creates or destroys the load balancer based on API server requirements, the cluster does not manage or maintain them. As long as you preserve the cluster’s access to the API server, you can manually modify or move the load balancers. For the public load balancer, port 6443 is open and the health check is configured for HTTPS against the /readyz
path.
On Google Cloud Platform, a single load balancer is created to manage both internal and external API traffic, so you do not need to modify the load balancer.
On Microsoft Azure, both public and private load balancers are created. However, because of limitations in current implementation, you just retain both load balancers in a private cluster.
2.2. Configuring DNS records to be published in a private zone
For all OpenShift Container Platform clusters, whether public or private, DNS records are published in a public zone by default.
You can remove the public zone from the cluster DNS configuration to avoid exposing DNS records to the public. You might want to avoid exposing sensitive information, such as internal domain names, internal IP addresses, or the number of clusters at an organization, or you might simply have no need to publish records publicly. If all the clients that should be able to connect to services within the cluster use a private DNS service that has the DNS records from the private zone, then there is no need to have a public DNS record for the cluster.
After you deploy a cluster, you can modify its DNS to use only a private zone by modifying the DNS
custom resource (CR). Modifying the DNS
CR in this way means that any DNS records that are subsequently created are not published to public DNS servers, which keeps knowledge of the DNS records isolated to internal users. This can be done when you configure the cluster to be private, or if you never want DNS records to be publicly resolvable.
Alternatively, even in a private cluster, you might keep the public zone for DNS records because it allows clients to resolve DNS names for applications running on that cluster. For example, an organization can have machines that connect to the public internet and then establish VPN connections for certain private IP ranges in order to connect to private IP addresses. The DNS lookups from these machines use the public DNS to determine the private addresses of those services, and then connect to the private addresses over the VPN.
Procedure
Review the
DNS
CR for your cluster by running the following command and observing the output:$ oc get dnses.config.openshift.io/cluster -o yaml
Example output
apiVersion: config.openshift.io/v1 kind: DNS metadata: creationTimestamp: "2019-10-25T18:27:09Z" generation: 2 name: cluster resourceVersion: "37966" selfLink: /apis/config.openshift.io/v1/dnses/cluster uid: 0e714746-f755-11f9-9cb1-02ff55d8f976 spec: baseDomain: <base_domain> privateZone: tags: Name: <infrastructure_id>-int kubernetes.io/cluster/<infrastructure_id>: owned publicZone: id: Z2XXXXXXXXXXA4 status: {}
Note that the
spec
section contains both a private and a public zone.Patch the
DNS
CR to remove the public zone by running the following command:$ oc patch dnses.config.openshift.io/cluster --type=merge --patch='{"spec": {"publicZone": null}}'
Example output
dns.config.openshift.io/cluster patched
The Ingress Operator consults the
DNS
CR definition when it creates DNS records forIngressController
objects. If only private zones are specified, only private records are created.ImportantExisting DNS records are not modified when you remove the public zone. You must manually delete previously published public DNS records if you no longer want them to be published publicly.
Verification
Review the
DNS
CR for your cluster and confirm that the public zone was removed, by running the following command and observing the output:$ oc get dnses.config.openshift.io/cluster -o yaml
Example output
apiVersion: config.openshift.io/v1 kind: DNS metadata: creationTimestamp: "2019-10-25T18:27:09Z" generation: 2 name: cluster resourceVersion: "37966" selfLink: /apis/config.openshift.io/v1/dnses/cluster uid: 0e714746-f755-11f9-9cb1-02ff55d8f976 spec: baseDomain: <base_domain> privateZone: tags: Name: <infrastructure_id>-int kubernetes.io/cluster/<infrastructure_id>-wfpg4: owned status: {}
2.3. Setting the Ingress Controller to private
After you deploy a cluster, you can modify its Ingress Controller to use only a private zone.
Procedure
Modify the default Ingress Controller to use only an internal endpoint:
$ oc replace --force --wait --filename - <<EOF apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: default spec: endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal EOF
Example output
ingresscontroller.operator.openshift.io "default" deleted ingresscontroller.operator.openshift.io/default replaced
The public DNS entry is removed, and the private zone entry is updated.
2.4. Restricting the API server to private
After you deploy a cluster to Amazon Web Services (AWS) or Microsoft Azure, you can reconfigure the API server to use only the private zone.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Have access to the web console as a user with
admin
privileges.
Procedure
In the web portal or console for your cloud provider, take the following actions:
Locate and delete the appropriate load balancer component:
- For AWS, delete the external load balancer. The API DNS entry in the private zone already points to the internal load balancer, which uses an identical configuration, so you do not need to modify the internal load balancer.
-
For Azure, delete the
api-internal-v4
rule for the public load balancer.
-
For Azure, configure the Ingress Controller endpoint publishing scope to
Internal
. For more information, see "Configuring the Ingress Controller endpoint publishing scope to Internal". -
For the Azure public load balancer, if you configure the Ingress Controller endpoint publishing scope to
Internal
and there are no existing inbound rules in the public load balancer, you must create an outbound rule explicitly to provide outbound traffic for the backend address pool. For more information, see the Microsoft Azure documentation about adding outbound rules. -
Delete the
api.$clustername.$yourdomain
orapi.$clustername
DNS entry in the public zone.
AWS clusters: Remove the external load balancers:
ImportantYou can run the following steps only for an installer-provisioned infrastructure (IPI) cluster. For a user-provisioned infrastructure (UPI) cluster, you must manually remove or disable the external load balancers.
If your cluster uses a control plane machine set, delete the lines in the control plane machine set custom resource that configure your public or external load balancer:
# ... providerSpec: value: # ... loadBalancers: - name: lk4pj-ext 1 type: network 2 - name: lk4pj-int type: network # ...
If your cluster does not use a control plane machine set, you must delete the external load balancers from each control plane machine.
From your terminal, list the cluster machines by running the following command:
$ oc get machine -n openshift-machine-api
Example output
NAME STATE TYPE REGION ZONE AGE lk4pj-master-0 running m4.xlarge us-east-1 us-east-1a 17m lk4pj-master-1 running m4.xlarge us-east-1 us-east-1b 17m lk4pj-master-2 running m4.xlarge us-east-1 us-east-1a 17m lk4pj-worker-us-east-1a-5fzfj running m4.xlarge us-east-1 us-east-1a 15m lk4pj-worker-us-east-1a-vbghs running m4.xlarge us-east-1 us-east-1a 15m lk4pj-worker-us-east-1b-zgpzg running m4.xlarge us-east-1 us-east-1b 15m
The control plane machines contain
master
in the name.Remove the external load balancer from each control plane machine:
Edit a control plane machine object to by running the following command:
$ oc edit machines -n openshift-machine-api <control_plane_name> 1
- 1
- Specify the name of the control plane machine object to modify.
Remove the lines that describe the external load balancer, which are marked in the following example:
# ... providerSpec: value: # ... loadBalancers: - name: lk4pj-ext 1 type: network 2 - name: lk4pj-int type: network # ...
- Save your changes and exit the object specification.
- Repeat this process for each of the control plane machines.
Additional resources
2.5. Configuring a private storage endpoint on Azure
You can leverage the Image Registry Operator to use private endpoints on Azure, which enables seamless configuration of private storage accounts when OpenShift Container Platform is deployed on private Azure clusters. This allows you to deploy the image registry without exposing public-facing storage endpoints.
Do not configure a private storage endpoint on Microsoft Azure Red Hat OpenShift (ARO), because the endpoint can put your Microsoft Azure Red Hat OpenShift cluster in an unrecoverable state.
You can configure the Image Registry Operator to use private storage endpoints on Azure in one of two ways:
- By configuring the Image Registry Operator to discover the VNet and subnet names
- With user-provided Azure Virtual Network (VNet) and subnet names
2.5.1. Limitations for configuring a private storage endpoint on Azure
The following limitations apply when configuring a private storage endpoint on Azure:
-
When configuring the Image Registry Operator to use a private storage endpoint, public network access to the storage account is disabled. Consequently, pulling images from the registry outside of OpenShift Container Platform only works by setting
disableRedirect: true
in the registry Operator configuration. With redirect enabled, the registry redirects the client to pull images directly from the storage account, which will no longer work due to disabled public network access. For more information, see "Disabling redirect when using a private storage endpoint on Azure". - This operation cannot be undone by the Image Registry Operator.
2.5.2. Configuring a private storage endpoint on Azure by enabling the Image Registry Operator to discover VNet and subnet names
The following procedure shows you how to set up a private storage endpoint on Azure by configuring the Image Registry Operator to discover VNet and subnet names.
Prerequisites
- You have configured the image registry to run on Azure.
Your network has been set up using the Installer Provisioned Infrastructure installation method.
For users with a custom network setup, see "Configuring a private storage endpoint on Azure with user-provided VNet and subnet names".
Procedure
Edit the Image Registry Operator
config
object and setnetworkAccess.type
toInternal
:$ oc edit configs.imageregistry/cluster
# ... spec: # ... storage: azure: # ... networkAccess: type: Internal # ...
Optional: Enter the following command to confirm that the Operator has completed provisioning. This might take a few minutes.
$ oc get configs.imageregistry/cluster -o=jsonpath="{.spec.storage.azure.privateEndpointName}" -w
Optional: If the registry is exposed by a route, and you are configuring your storage account to be private, you must disable redirect if you want pulls external to the cluster to continue to work. Enter the following command to disable redirect on the Image Operator configuration:
$ oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"disableRedirect": true}}'
NoteWhen redirect is enabled, pulling images from outside of the cluster will not work.
Verification
Fetch the registry service name by running the following command:
$ oc get imagestream -n openshift
Example output
NAME IMAGE REPOSITORY TAGS UPDATED cli image-registry.openshift-image-registry.svc:5000/openshift/cli latest 8 hours ago ...
Enter debug mode by running the following command:
$ oc debug node/<node_name>
Run the suggested
chroot
command. For example:$ chroot /host
Enter the following command to log in to your container registry:
$ podman login --tls-verify=false -u unused -p $(oc whoami -t) image-registry.openshift-image-registry.svc:5000
Example output
Login Succeeded!
Enter the following command to verify that you can pull an image from the registry:
$ podman pull --tls-verify=false image-registry.openshift-image-registry.svc:5000/openshift/tools
Example output
Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/tools/openshift/tools... Getting image source signatures Copying blob 6b245f040973 done Copying config 22667f5368 done Writing manifest to image destination Storing signatures 22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9
2.5.3. Configuring a private storage endpoint on Azure with user-provided VNet and subnet names
Use the following procedure to configure a storage account that has public network access disabled and is exposed behind a private storage endpoint on Azure.
Prerequisites
- You have configured the image registry to run on Azure.
- You must know the VNet and subnet names used for your Azure environment.
- If your network was configured in a separate resource group in Azure, you must also know its name.
Procedure
Edit the Image Registry Operator
config
object and configure the private endpoint using your VNet and subnet names:$ oc edit configs.imageregistry/cluster
# ... spec: # ... storage: azure: # ... networkAccess: type: Internal internal: subnetName: <subnet_name> vnetName: <vnet_name> networkResourceGroupName: <network_resource_group_name> # ...
Optional: Enter the following command to confirm that the Operator has completed provisioning. This might take a few minutes.
$ oc get configs.imageregistry/cluster -o=jsonpath="{.spec.storage.azure.privateEndpointName}" -w
NoteWhen redirect is enabled, pulling images from outside of the cluster will not work.
Verification
Fetch the registry service name by running the following command:
$ oc get imagestream -n openshift
Example output
NAME IMAGE REPOSITORY TAGS UPDATED cli image-registry.openshift-image-registry.svc:5000/openshift/cli latest 8 hours ago ...
Enter debug mode by running the following command:
$ oc debug node/<node_name>
Run the suggested
chroot
command. For example:$ chroot /host
Enter the following command to log in to your container registry:
$ podman login --tls-verify=false -u unused -p $(oc whoami -t) image-registry.openshift-image-registry.svc:5000
Example output
Login Succeeded!
Enter the following command to verify that you can pull an image from the registry:
$ podman pull --tls-verify=false image-registry.openshift-image-registry.svc:5000/openshift/tools
Example output
Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/tools/openshift/tools... Getting image source signatures Copying blob 6b245f040973 done Copying config 22667f5368 done Writing manifest to image destination Storing signatures 22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9
2.5.4. Optional: Disabling redirect when using a private storage endpoint on Azure
By default, redirect is enabled when using the image registry. Redirect allows off-loading of traffic from the registry pods into the object storage, which makes pull faster. When redirect is enabled and the storage account is private, users from outside of the cluster are unable to pull images from the registry.
In some cases, users might want to disable redirect so that users from outside of the cluster can pull images from the registry.
Use the following procedure to disable redirect.
Prerequisites
- You have configured the image registry to run on Azure.
- You have configured a route.
Procedure
Enter the following command to disable redirect on the image registry configuration:
$ oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"disableRedirect": true}}'
Verification
Fetch the registry service name by running the following command:
$ oc get imagestream -n openshift
Example output
NAME IMAGE REPOSITORY TAGS UPDATED cli default-route-openshift-image-registry.<cluster_dns>/cli latest 8 hours ago ...
Enter the following command to log in to your container registry:
$ podman login --tls-verify=false -u unused -p $(oc whoami -t) default-route-openshift-image-registry.<cluster_dns>
Example output
Login Succeeded!
Enter the following command to verify that you can pull an image from the registry:
$ podman pull --tls-verify=false default-route-openshift-image-registry.<cluster_dns> /openshift/tools
Example output
Trying to pull default-route-openshift-image-registry.<cluster_dns>/openshift/tools... Getting image source signatures Copying blob 6b245f040973 done Copying config 22667f5368 done Writing manifest to image destination Storing signatures 22667f53682a2920948d19c7133ab1c9c3f745805c14125859d20cede07f11f9
Chapter 3. Configuring multi-architecture compute machines on an OpenShift cluster
3.1. About clusters with multi-architecture compute machines
An OpenShift Container Platform cluster with multi-architecture compute machines is a cluster that supports compute machines with different architectures.
When there are nodes with multiple architectures in your cluster, the architecture of your image must be consistent with the architecture of the node. You need to ensure that the pod is assigned to the node with the appropriate architecture and that it matches the image architecture. For more information on assigning pods to nodes, see Assigning pods to nodes.
The Cluster Samples Operator is not supported on clusters with multi-architecture compute machines. Your cluster can be created without this capability. For more information, see Cluster capabilities.
For information on migrating your single-architecture cluster to a cluster that supports multi-architecture compute machines, see Migrating to a cluster with multi-architecture compute machines.
3.1.1. Configuring your cluster with multi-architecture compute machines
To create a cluster with multi-architecture compute machines with different installation options and platforms, you can use the documentation in the following table:
Documentation section | Platform | User-provisioned installation | Installer-provisioned installation | Control Plane | Compute node |
---|---|---|---|---|---|
Creating a cluster with multi-architecture compute machines on Azure | Microsoft Azure | ✓ |
|
| |
Creating a cluster with multi-architecture compute machines on AWS | Amazon Web Services (AWS) | ✓ |
|
| |
Creating a cluster with multi-architecture compute machines on GCP | Google Cloud Platform (GCP) | ✓ |
|
| |
Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z | Bare metal | ✓ |
|
| |
IBM Power | ✓ |
|
| ||
IBM Z | ✓ |
|
| ||
Creating a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE with z/VM | IBM Z® and IBM® LinuxONE | ✓ |
|
| |
IBM Z® and IBM® LinuxONE | ✓ |
|
| ||
Creating a cluster with multi-architecture compute machines on IBM Power® | IBM Power® | ✓ |
|
|
Autoscaling from zero is currently not supported on Google Cloud Platform (GCP).
3.2. Creating a cluster with multi-architecture compute machine on Azure
To deploy an Azure cluster with multi-architecture compute machines, you must first create a single-architecture Azure installer-provisioned cluster that uses the multi-architecture installer binary. For more information on Azure installations, see Installing a cluster on Azure with customizations.
You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.
After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.
3.2.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.2.2. Creating a 64-bit ARM boot image using the Azure image gallery
The following procedure describes how to manually generate a 64-bit ARM boot image.
Prerequisites
-
You installed the Azure CLI (
az
). - You created a single-architecture Azure installer-provisioned cluster with the multi-architecture installer binary.
Procedure
Log in to your Azure account:
$ az login
Create a storage account and upload the
aarch64
virtual hard disk (VHD) to your storage account. The OpenShift Container Platform installation program creates a resource group, however, the boot image can also be uploaded to a custom named resource group:$ az storage account create -n ${STORAGE_ACCOUNT_NAME} -g ${RESOURCE_GROUP} -l westus --sku Standard_LRS 1
- 1
- The
westus
object is an example region.
Create a storage container using the storage account you generated:
$ az storage container create -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME}
You must use the OpenShift Container Platform installation program JSON file to extract the URL and
aarch64
VHD name:Extract the
URL
field and set it toRHCOS_VHD_ORIGIN_URL
as the file name by running the following command:$ RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".url')
Extract the
aarch64
VHD name and set it toBLOB_NAME
as the file name by running the following command:$ BLOB_NAME=rhcos-$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".release')-azure.aarch64.vhd
Generate a shared access signature (SAS) token. Use this token to upload the RHCOS VHD to your storage container with the following commands:
$ end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
$ sas=`az storage container generate-sas -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME} --https-only --permissions dlrw --expiry $end -o tsv`
Copy the RHCOS VHD into the storage container:
$ az storage blob copy start --account-name ${STORAGE_ACCOUNT_NAME} --sas-token "$sas" \ --source-uri "${RHCOS_VHD_ORIGIN_URL}" \ --destination-blob "${BLOB_NAME}" --destination-container ${CONTAINER_NAME}
You can check the status of the copying process with the following command:
$ az storage blob show -c ${CONTAINER_NAME} -n ${BLOB_NAME} --account-name ${STORAGE_ACCOUNT_NAME} | jq .properties.copy
Example output
{ "completionTime": null, "destinationSnapshot": null, "id": "1fd97630-03ca-489a-8c4e-cfe839c9627d", "incrementalCopy": null, "progress": "17179869696/17179869696", "source": "https://rhcos.blob.core.windows.net/imagebucket/rhcos-411.86.202207130959-0-azure.aarch64.vhd", "status": "success", 1 "statusDescription": null }
- 1
- If the status parameter displays the
success
object, the copying process is complete.
Create an image gallery using the following command:
$ az sig create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME}
Use the image gallery to create an image definition. In the following example command,
rhcos-arm64
is the name of the image definition.$ az sig image-definition create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --publisher RedHat --offer arm --sku arm64 --os-type linux --architecture Arm64 --hyper-v-generation V2
To get the URL of the VHD and set it to
RHCOS_VHD_URL
as the file name, run the following command:$ RHCOS_VHD_URL=$(az storage blob url --account-name ${STORAGE_ACCOUNT_NAME} -c ${CONTAINER_NAME} -n "${BLOB_NAME}" -o tsv)
Use the
RHCOS_VHD_URL
file, your storage account, resource group, and image gallery to create an image version. In the following example,1.0.0
is the image version.$ az sig image-version create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --gallery-image-version 1.0.0 --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} --os-vhd-uri ${RHCOS_VHD_URL}
Your
arm64
boot image is now generated. You can access the ID of your image with the following command:$ az sig image-version show -r $GALLERY_NAME -g $RESOURCE_GROUP -i rhcos-arm64 -e 1.0.0
The following example image ID is used in the
recourseID
parameter of the compute machine set:Example
resourceID
/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-arm64/versions/1.0.0
3.2.3. Creating a 64-bit x86 boot image using the Azure image gallery
The following procedure describes how to manually generate a 64-bit x86 boot image.
Prerequisites
-
You installed the Azure CLI (
az
). - You created a single-architecture Azure installer-provisioned cluster with the multi-architecture installer binary.
Procedure
Log in to your Azure account by running the following command:
$ az login
Create a storage account and upload the
x86_64
virtual hard disk (VHD) to your storage account by running the following command. The OpenShift Container Platform installation program creates a resource group. However, the boot image can also be uploaded to a custom named resource group:$ az storage account create -n ${STORAGE_ACCOUNT_NAME} -g ${RESOURCE_GROUP} -l westus --sku Standard_LRS 1
- 1
- The
westus
object is an example region.
Create a storage container using the storage account you generated by running the following command:
$ az storage container create -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME}
Use the OpenShift Container Platform installation program JSON file to extract the URL and
x86_64
VHD name:Extract the
URL
field and set it toRHCOS_VHD_ORIGIN_URL
as the file name by running the following command:$ RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.x86_64."rhel-coreos-extensions"."azure-disk".url')
Extract the
x86_64
VHD name and set it toBLOB_NAME
as the file name by running the following command:$ BLOB_NAME=rhcos-$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.x86_64."rhel-coreos-extensions"."azure-disk".release')-azure.x86_64.vhd
Generate a shared access signature (SAS) token. Use this token to upload the RHCOS VHD to your storage container by running the following commands:
$ end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
$ sas=`az storage container generate-sas -n ${CONTAINER_NAME} --account-name ${STORAGE_ACCOUNT_NAME} --https-only --permissions dlrw --expiry $end -o tsv`
Copy the RHCOS VHD into the storage container by running the following command:
$ az storage blob copy start --account-name ${STORAGE_ACCOUNT_NAME} --sas-token "$sas" \ --source-uri "${RHCOS_VHD_ORIGIN_URL}" \ --destination-blob "${BLOB_NAME}" --destination-container ${CONTAINER_NAME}
You can check the status of the copying process by running the following command:
$ az storage blob show -c ${CONTAINER_NAME} -n ${BLOB_NAME} --account-name ${STORAGE_ACCOUNT_NAME} | jq .properties.copy
Example output
{ "completionTime": null, "destinationSnapshot": null, "id": "1fd97630-03ca-489a-8c4e-cfe839c9627d", "incrementalCopy": null, "progress": "17179869696/17179869696", "source": "https://rhcos.blob.core.windows.net/imagebucket/rhcos-411.86.202207130959-0-azure.aarch64.vhd", "status": "success", 1 "statusDescription": null }
- 1
- If the
status
parameter displays thesuccess
object, the copying process is complete.
Create an image gallery by running the following command:
$ az sig create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME}
Use the image gallery to create an image definition by running the following command:
$ az sig image-definition create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-x86_64 --publisher RedHat --offer x86_64 --sku x86_64 --os-type linux --architecture x64 --hyper-v-generation V2
In this example command,
rhcos-x86_64
is the name of the image definition.To get the URL of the VHD and set it to
RHCOS_VHD_URL
as the file name, run the following command:$ RHCOS_VHD_URL=$(az storage blob url --account-name ${STORAGE_ACCOUNT_NAME} -c ${CONTAINER_NAME} -n "${BLOB_NAME}" -o tsv)
Use the
RHCOS_VHD_URL
file, your storage account, resource group, and image gallery to create an image version by running the following command:$ az sig image-version create --resource-group ${RESOURCE_GROUP} --gallery-name ${GALLERY_NAME} --gallery-image-definition rhcos-arm64 --gallery-image-version 1.0.0 --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} --os-vhd-uri ${RHCOS_VHD_URL}
In this example,
1.0.0
is the image version.Optional: Access the ID of the generated
x86_64
boot image by running the following command:$ az sig image-version show -r $GALLERY_NAME -g $RESOURCE_GROUP -i rhcos-x86_64 -e 1.0.0
The following example image ID is used in the
recourseID
parameter of the compute machine set:Example
resourceID
/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-x86_64/versions/1.0.0
3.2.4. Adding a multi-architecture compute machine set to your Azure cluster
After creating a multi-architecture cluster, you can add nodes with different architectures.
You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:
- Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
- Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
To create a custom compute machine set on Azure, see "Creating a compute machine set on Azure".
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".
Prerequisites
-
You installed the OpenShift CLI (
oc
). - You created a 64-bit ARM or 64-bit x86 boot image.
- You used the installation program to create a 64-bit ARM or 64-bit x86 single-architecture Azure cluster with the multi-architecture installer binary.
Procedure
-
Log in to the OpenShift CLI (
oc
). Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.
Example
MachineSet
object for an Azure 64-bit ARM or 64-bit x86 compute nodeapiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker name: <infrastructure_id>-machine-set-0 namespace: openshift-machine-api spec: replicas: 2 selector: matchLabels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machineset: <infrastructure_id>-machine-set-0 template: metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: <infrastructure_id>-machine-set-0 spec: lifecycleHooks: {} metadata: {} providerSpec: value: acceleratedNetworking: true apiVersion: machine.openshift.io/v1beta1 credentialsSecret: name: azure-cloud-credentials namespace: openshift-machine-api image: offer: "" publisher: "" resourceID: /resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Compute/galleries/${GALLERY_NAME}/images/rhcos-arm64/versions/1.0.0 1 sku: "" version: "" kind: AzureMachineProviderSpec location: <region> managedIdentity: <infrastructure_id>-identity networkResourceGroup: <infrastructure_id>-rg osDisk: diskSettings: {} diskSizeGB: 128 managedDisk: storageAccountType: Premium_LRS osType: Linux publicIP: false publicLoadBalancer: <infrastructure_id> resourceGroup: <infrastructure_id>-rg subnet: <infrastructure_id>-worker-subnet userDataSecret: name: worker-user-data vmSize: Standard_D4ps_v5 2 vnet: <infrastructure_id>-vnet zone: "<zone>"
Create the compute machine set by running the following command:
$ oc create -f <file_name> 1
- 1
- Replace
<file_name>
with the name of the YAML file with compute machine set configuration. For example:arm64-machine-set-0.yaml
, oramd64-machine-set-0.yaml
.
Verification
Verify that the new machines are running by running the following command:
$ oc get machineset -n openshift-machine-api
The output must include the machine set that you created.
Example output
NAME DESIRED CURRENT READY AVAILABLE AGE <infrastructure_id>-machine-set-0 2 2 2 2 10m
You can check if the nodes are ready and schedulable by running the following command:
$ oc get nodes
3.3. Creating a cluster with multi-architecture compute machines on AWS
To create an AWS cluster with multi-architecture compute machines, you must first create a single-architecture AWS installer-provisioned cluster with the multi-architecture installer binary. For more information on AWS installations, see Installing a cluster on AWS with customizations.
You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.
After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.
3.3.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.3.2. Adding a multi-architecture compute machine set to your AWS cluster
After creating a multi-architecture cluster, you can add nodes with different architectures.
You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:
- Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
- Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".
Prerequisites
-
You installed the OpenShift CLI (
oc
). - You used the installation program to create an 64-bit ARM or 64-bit x86 single-architecture AWS cluster with the multi-architecture installer binary.
Procedure
-
Log in to the OpenShift CLI (
oc
). Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.
Example
MachineSet
object for an AWS 64-bit ARM or x86 compute nodeapiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1 name: <infrastructure_id>-aws-machine-set-0 2 namespace: openshift-machine-api spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> 3 machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>-<zone> 4 template: metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machine-role: <role> 5 machine.openshift.io/cluster-api-machine-type: <role> 6 machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role>-<zone> 7 spec: metadata: labels: node-role.kubernetes.io/<role>: "" providerSpec: value: ami: id: ami-02a574449d4f4d280 8 apiVersion: awsproviderconfig.openshift.io/v1beta1 blockDevices: - ebs: iops: 0 volumeSize: 120 volumeType: gp2 credentialsSecret: name: aws-cloud-credentials deviceIndex: 0 iamInstanceProfile: id: <infrastructure_id>-worker-profile 9 instanceType: m6g.xlarge 10 kind: AWSMachineProviderConfig placement: availabilityZone: us-east-1a 11 region: <region> 12 securityGroups: - filters: - name: tag:Name values: - <infrastructure_id>-worker-sg 13 subnet: filters: - name: tag:Name values: - <infrastructure_id>-private-<zone> tags: - name: kubernetes.io/cluster/<infrastructure_id> 14 value: owned - name: <custom_tag_name> value: <custom_tag_value> userDataSecret: name: worker-user-data
- 1 2 3 9 13 14
- Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI (
oc
) installed, you can obtain the infrastructure ID by running the following command:$ oc get -o jsonpath=‘{.status.infrastructureName}{“\n”}’ infrastructure cluster
- 4 7
- Specify the infrastructure ID, role node label, and zone.
- 5 6
- Specify the role node label to add.
- 8
- Specify a Red Hat Enterprise Linux CoreOS (RHCOS) Amazon Machine Image (AMI) for your AWS zone for the nodes. The RHCOS AMI must be compatible with the machine architecture.
$ oc get configmap/coreos-bootimages \ -n openshift-machine-config-operator \ -o jsonpath='{.data.stream}' | jq \ -r '.architectures.<arch>.images.aws.regions."<region>".image'
- 10
- Specify a machine type that aligns with the CPU architecture of the chosen AMI. For more information, see "Tested instance types for AWS 64-bit ARM"
- 11
- Specify the zone. For example,
us-east-1a
. Ensure that the zone you select has machines with the required architecture. - 12
- Specify the region. For example,
us-east-1
. Ensure that the zone you select has machines with the required architecture.
Create the compute machine set by running the following command:
$ oc create -f <file_name> 1
- 1
- Replace
<file_name>
with the name of the YAML file with compute machine set configuration. For example:aws-arm64-machine-set-0.yaml
, oraws-amd64-machine-set-0.yaml
.
Verification
View the list of compute machine sets by running the following command:
$ oc get machineset -n openshift-machine-api
The output must include the machine set that you created.
Example output
NAME DESIRED CURRENT READY AVAILABLE AGE <infrastructure_id>-aws-machine-set-0 2 2 2 2 10m
You can check if the nodes are ready and schedulable by running the following command:
$ oc get nodes
3.4. Creating a cluster with multi-architecture compute machines on GCP
To create a Google Cloud Platform (GCP) cluster with multi-architecture compute machines, you must first create a single-architecture GCP installer-provisioned cluster with the multi-architecture installer binary. For more information on AWS installations, see Installing a cluster on GCP with customizations.
You can also migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines. For more information, see Migrating to a cluster with multi-architecture compute machines.
After creating a multi-architecture cluster, you can add nodes with different architectures to the cluster.
Secure booting is currently not supported on 64-bit ARM machines for GCP
3.4.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.4.2. Adding a multi-architecture compute machine set to your GCP cluster
After creating a multi-architecture cluster, you can add nodes with different architectures.
You can add multi-architecture compute machines to a multi-architecture cluster in the following ways:
- Adding 64-bit x86 compute machines to a cluster that uses 64-bit ARM control plane machines and already includes 64-bit ARM compute machines. In this case, 64-bit x86 is considered the secondary architecture.
- Adding 64-bit ARM compute machines to a cluster that uses 64-bit x86 control plane machines and already includes 64-bit x86 compute machines. In this case, 64-bit ARM is considered the secondary architecture.
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
custom resource. For more information, see "Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator".
Prerequisites
-
You installed the OpenShift CLI (
oc
). - You used the installation program to create a 64-bit x86 or 64-bit ARM single-architecture GCP cluster with the multi-architecture installer binary.
Procedure
-
Log in to the OpenShift CLI (
oc
). Create a YAML file, and add the configuration to create a compute machine set to control the 64-bit ARM or 64-bit x86 compute nodes in your cluster.
Example
MachineSet
object for a GCP 64-bit ARM or 64-bit x86 compute nodeapiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1 name: <infrastructure_id>-w-a namespace: openshift-machine-api spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machineset: <infrastructure_id>-w-a template: metadata: creationTimestamp: null labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machine-role: <role> 2 machine.openshift.io/cluster-api-machine-type: <role> machine.openshift.io/cluster-api-machineset: <infrastructure_id>-w-a spec: metadata: labels: node-role.kubernetes.io/<role>: "" providerSpec: value: apiVersion: gcpprovider.openshift.io/v1beta1 canIPForward: false credentialsSecret: name: gcp-cloud-credentials deletionProtection: false disks: - autoDelete: true boot: true image: <path_to_image> 3 labels: null sizeGb: 128 type: pd-ssd gcpMetadata: 4 - key: <custom_metadata_key> value: <custom_metadata_value> kind: GCPMachineProviderSpec machineType: n1-standard-4 5 metadata: creationTimestamp: null networkInterfaces: - network: <infrastructure_id>-network subnetwork: <infrastructure_id>-worker-subnet projectID: <project_name> 6 region: us-central1 7 serviceAccounts: - email: <infrastructure_id>-w@<project_name>.iam.gserviceaccount.com scopes: - https://www.googleapis.com/auth/cloud-platform tags: - <infrastructure_id>-worker userDataSecret: name: worker-user-data zone: us-central1-a
- 1
- Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. You can obtain the infrastructure ID by running the following command:
$ oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster
- 2
- Specify the role node label to add.
- 3
- Specify the path to the image that is used in current compute machine sets. You need the project and image name for your path to image.
To access the project and image name, run the following command:
$ oc get configmap/coreos-bootimages \ -n openshift-machine-config-operator \ -o jsonpath='{.data.stream}' | jq \ -r '.architectures.aarch64.images.gcp'
Example output
"gcp": { "release": "415.92.202309142014-0", "project": "rhcos-cloud", "name": "rhcos-415-92-202309142014-0-gcp-aarch64" }
Use the
project
andname
parameters from the output to create the path to image field in your machine set. The path to the image should follow the following format:$ projects/<project>/global/images/<image_name>
- 4
- Optional: Specify custom metadata in the form of a
key:value
pair. For example use cases, see the GCP documentation for setting custom metadata. - 5
- Specify a machine type that aligns with the CPU architecture of the chosen OS image. For more information, see "Tested instance types for GCP on 64-bit ARM infrastructures".
- 6
- Specify the name of the GCP project that you use for your cluster.
- 7
- Specify the region. For example,
us-central1
. Ensure that the zone you select has machines with the required architecture.
Create the compute machine set by running the following command:
$ oc create -f <file_name> 1
- 1
- Replace
<file_name>
with the name of the YAML file with compute machine set configuration. For example:gcp-arm64-machine-set-0.yaml
, orgcp-amd64-machine-set-0.yaml
.
Verification
View the list of compute machine sets by running the following command:
$ oc get machineset -n openshift-machine-api
The output must include the machine set that you created.
Example output
NAME DESIRED CURRENT READY AVAILABLE AGE <infrastructure_id>-gcp-machine-set-0 2 2 2 2 10m
You can check if the nodes are ready and schedulable by running the following command:
$ oc get nodes
3.5. Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z
To create a cluster with multi-architecture compute machines on bare metal (x86_64
or aarch64
), IBM Power® (ppc64le
), or IBM Z® (s390x
) you must have an existing single-architecture cluster on one of these platforms. Follow the installations procedures for your platform:
- Installing a user provisioned cluster on bare metal. You can then add 64-bit ARM compute machines to your OpenShift Container Platform cluster on bare metal.
-
Installing a cluster on IBM Power®. You can then add
x86_64
compute machines to your OpenShift Container Platform cluster on IBM Power®. -
Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add
x86_64
compute machines to your OpenShift Container Platform cluster on IBM Z® and IBM® LinuxONE.
The bare metal installer-provisioned infrastructure and the Bare Metal Operator do not support adding secondary architecture nodes during the initial cluster setup. You can add secondary architecture nodes manually only after the initial cluster setup.
Before you can add additional compute nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.
The following procedures explain how to create a RHCOS compute machine using an ISO image or network PXE booting. This allows you to add additional nodes to your cluster and deploy a cluster with multi-architecture compute machines.
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
3.5.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.5.2. Creating RHCOS machines using an ISO image
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using an ISO image to create the machines.
Prerequisites
- Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
-
You must have the OpenShift CLI (
oc
) installed.
Procedure
Extract the Ignition config file from the cluster by running the following command:
$ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
-
Upload the
worker.ign
Ignition config file you exported from your cluster to your HTTP server. Note the URLs of these files. You can validate that the ignition files are available on the URLs. The following example gets the Ignition config files for the compute node:
$ curl -k http://<HTTP_server>/worker.ign
You can access the ISO image for booting your new machine by running to following command:
RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.<architecture>.artifacts.metal.formats.iso.disk.location')
Use the ISO file to install RHCOS on more compute machines. Use the same method that you used when you created machines before you installed the cluster:
- Burn the ISO image to a disk and boot it directly.
- Use ISO redirection with a LOM interface.
Boot the RHCOS ISO image without specifying any options, or interrupting the live boot sequence. Wait for the installer to boot into a shell prompt in the RHCOS live environment.
NoteYou can interrupt the RHCOS installation boot process to add kernel arguments. However, for this ISO procedure you must use the
coreos-installer
command as outlined in the following steps, instead of adding kernel arguments.Run the
coreos-installer
command and specify the options that meet your installation requirements. At a minimum, you must specify the URL that points to the Ignition config file for the node type, and the device that you are installing to:$ sudo coreos-installer install --ignition-url=http://<HTTP_server>/<node_type>.ign <device> --ignition-hash=sha512-<digest> 12
- 1
- You must run the
coreos-installer
command by usingsudo
, because thecore
user does not have the required root privileges to perform the installation. - 2
- The
--ignition-hash
option is required when the Ignition config file is obtained through an HTTP URL to validate the authenticity of the Ignition config file on the cluster node.<digest>
is the Ignition config file SHA512 digest obtained in a preceding step.
NoteIf you want to provide your Ignition config files through an HTTPS server that uses TLS, you can add the internal certificate authority (CA) to the system trust store before running
coreos-installer
.The following example initializes a bootstrap node installation to the
/dev/sda
device. The Ignition config file for the bootstrap node is obtained from an HTTP web server with the IP address 192.168.1.2:$ sudo coreos-installer install --ignition-url=http://192.168.1.2:80/installation_directory/bootstrap.ign /dev/sda --ignition-hash=sha512-a5a2d43879223273c9b60af66b44202a1d1248fc01cf156c46d4a79f552b6bad47bc8cc78ddf0116e80c59d2ea9e32ba53bc807afbca581aa059311def2c3e3b
Monitor the progress of the RHCOS installation on the console of the machine.
ImportantEnsure that the installation is successful on each node before commencing with the OpenShift Container Platform installation. Observing the installation process can also help to determine the cause of RHCOS installation issues that might arise.
- Continue to create more compute machines for your cluster.
3.5.3. Creating RHCOS machines by PXE or iPXE booting
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using PXE or iPXE booting.
Prerequisites
- Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
-
Obtain the URLs of the RHCOS ISO image, compressed metal BIOS,
kernel
, andinitramfs
files that you uploaded to your HTTP server during cluster installation. - You have access to the PXE booting infrastructure that you used to create the machines for your OpenShift Container Platform cluster during installation. The machines must boot from their local disks after RHCOS is installed on them.
-
If you use UEFI, you have access to the
grub.conf
file that you modified during OpenShift Container Platform installation.
Procedure
Confirm that your PXE or iPXE installation for the RHCOS images is correct.
For PXE:
DEFAULT pxeboot TIMEOUT 20 PROMPT 0 LABEL pxeboot KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> 1 APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img 2
- 1
- Specify the location of the live
kernel
file that you uploaded to your HTTP server. - 2
- Specify locations of the RHCOS files that you uploaded to your HTTP server. The
initrd
parameter value is the location of the liveinitramfs
file, thecoreos.inst.ignition_url
parameter value is the location of the worker Ignition config file, and thecoreos.live.rootfs_url
parameter value is the location of the liverootfs
file. Thecoreos.inst.ignition_url
andcoreos.live.rootfs_url
parameters only support HTTP and HTTPS.
NoteThis configuration does not enable serial console access on machines with a graphical console. To configure a different console, add one or more
console=
arguments to theAPPEND
line. For example, addconsole=tty0 console=ttyS0
to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?.For iPXE (
x86_64
+aarch64
):kernel http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> initrd=main coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2 initrd --name main http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 3 boot
- 1
- Specify the locations of the RHCOS files that you uploaded to your HTTP server. The
kernel
parameter value is the location of thekernel
file, theinitrd=main
argument is needed for booting on UEFI systems, thecoreos.live.rootfs_url
parameter value is the location of therootfs
file, and thecoreos.inst.ignition_url
parameter value is the location of the worker Ignition config file. - 2
- If you use multiple NICs, specify a single interface in the
ip
option. For example, to use DHCP on a NIC that is namedeno1
, setip=eno1:dhcp
. - 3
- Specify the location of the
initramfs
file that you uploaded to your HTTP server.
NoteThis configuration does not enable serial console access on machines with a graphical console To configure a different console, add one or more
console=
arguments to thekernel
line. For example, addconsole=tty0 console=ttyS0
to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux? and "Enabling the serial console for PXE and ISO installation" in the "Advanced RHCOS installation configuration" section.NoteTo network boot the CoreOS
kernel
onaarch64
architecture, you need to use a version of iPXE build with theIMAGE_GZIP
option enabled. SeeIMAGE_GZIP
option in iPXE.For PXE (with UEFI and GRUB as second stage) on
aarch64
:menuentry 'Install CoreOS' { linux rhcos-<version>-live-kernel-<architecture> coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2 initrd rhcos-<version>-live-initramfs.<architecture>.img 3 }
- 1
- Specify the locations of the RHCOS files that you uploaded to your HTTP/TFTP server. The
kernel
parameter value is the location of thekernel
file on your TFTP server. Thecoreos.live.rootfs_url
parameter value is the location of therootfs
file, and thecoreos.inst.ignition_url
parameter value is the location of the worker Ignition config file on your HTTP Server. - 2
- If you use multiple NICs, specify a single interface in the
ip
option. For example, to use DHCP on a NIC that is namedeno1
, setip=eno1:dhcp
. - 3
- Specify the location of the
initramfs
file that you uploaded to your TFTP server.
- Use the PXE or iPXE infrastructure to create the required compute machines for your cluster.
3.5.4. Approving the certificate signing requests for your machines
When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.
Prerequisites
- You added machines to your cluster.
Procedure
Confirm that the cluster recognizes the machines:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 63m v1.30.3 master-1 Ready master 63m v1.30.3 master-2 Ready master 64m v1.30.3
The output lists all of the machines that you created.
NoteThe preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.
Review the pending CSRs and ensure that you see the client requests with the
Pending
orApproved
status for each machine that you added to the cluster:$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-8b2br 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-8vnps 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending ...
In this example, two machines are joining the cluster. You might see more approved CSRs in the list.
If the CSRs were not approved, after all of the pending CSRs for the machines you added are in
Pending
status, approve the CSRs for your cluster machines:NoteBecause the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the
machine-approver
if the Kubelet requests a new certificate with identical parameters.NoteFor clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the
oc exec
,oc rsh
, andoc logs
commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by thenode-bootstrapper
service account in thesystem:node
orsystem:admin
groups, and confirm the identity of the node.To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
NoteSome Operators might not become available until some CSRs are approved.
Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:
$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-bfd72 5m26s system:node:ip-10-0-50-126.us-east-2.compute.internal Pending csr-c57lv 5m26s system:node:ip-10-0-95-157.us-east-2.compute.internal Pending ...
If the remaining CSRs are not approved, and are in the
Pending
status, approve the CSRs for your cluster machines:To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
After all client and server CSRs have been approved, the machines have the
Ready
status. Verify this by running the following command:$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 73m v1.30.3 master-1 Ready master 73m v1.30.3 master-2 Ready master 74m v1.30.3 worker-0 Ready worker 11m v1.30.3 worker-1 Ready worker 11m v1.30.3
NoteIt can take a few minutes after approval of the server CSRs for the machines to transition to the
Ready
status.
Additional information
3.6. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with z/VM
To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x
) with z/VM, you must have an existing single-architecture x86_64
cluster. You can then add s390x
compute machines to your OpenShift Container Platform cluster.
Before you can add s390x
nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.
The following procedures explain how to create a RHCOS compute machine using a z/VM instance. This will allow you to add s390x
nodes to your cluster and deploy a cluster with multi-architecture compute machines.
To create an IBM Z® or IBM® LinuxONE (s390x
) cluster with multi-architecture compute machines on x86_64
, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64
compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
3.6.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.6.2. Creating RHCOS machines on IBM Z with z/VM
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines running on IBM Z® with z/VM and attach them to your existing cluster.
Prerequisites
- You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
- You have an HTTP or HTTPS server running on your provisioning machine that is accessible to the machines you create.
Procedure
Disable UDP aggregation.
Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an
x86_64
control plane and additionals390x
compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.Create a YAML file
udp-aggregation-config.yaml
with the following content:apiVersion: v1 kind: ConfigMap data: disable-udp-aggregation: "true" metadata: name: udp-aggregation-config namespace: openshift-network-operator
Create the ConfigMap resource by running the following command:
$ oc create -f udp-aggregation-config.yaml
Extract the Ignition config file from the cluster by running the following command:
$ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
-
Upload the
worker.ign
Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:
$ curl -k http://<http_server>/worker.ign
Download the RHEL live
kernel
,initramfs
, androotfs
files by running the following commands:$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
-
Move the downloaded RHEL live
kernel
,initramfs
, androotfs
files to an HTTP or HTTPS server that is accessible from the z/VM guest you want to add. Create a parameter file for the z/VM guest. The following parameters are specific for the virtual machine:
Optional: To specify a static IP address, add an
ip=
parameter with the following entries, with each separated by a colon:- The IP address for the machine.
- An empty string.
- The gateway.
- The netmask.
-
The machine host and domain name in the form
hostname.domainname
. Omit this value to let RHCOS decide. - The network interface name. Omit this value to let RHCOS decide.
-
The value
none
.
-
For
coreos.inst.ignition_url=
, specify the URL to theworker.ign
file. Only HTTP and HTTPS protocols are supported. -
For
coreos.live.rootfs_url=
, specify the matching rootfs artifact for thekernel
andinitramfs
you are booting. Only HTTP and HTTPS protocols are supported. For installations on DASD-type disks, complete the following tasks:
-
For
coreos.inst.install_dev=
, specify/dev/dasda
. -
Use
rd.dasd=
to specify the DASD where RHCOS is to be installed. You can adjust further parameters if required.
The following is an example parameter file,
additional-worker-dasd.parm
:cio_ignore=all,!condev rd.neednet=1 \ console=ttysclp0 \ coreos.inst.install_dev=/dev/dasda \ coreos.inst.ignition_url=http://<http_server>/worker.ign \ coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \ ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \ rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \ rd.dasd=0.0.3490 \ zfcp.allow_lun_scan=0
Write all options in the parameter file as a single line and make sure that you have no newline characters.
-
For
For installations on FCP-type disks, complete the following tasks:
Use
rd.zfcp=<adapter>,<wwpn>,<lun>
to specify the FCP disk where RHCOS is to be installed. For multipathing, repeat this step for each additional path.NoteWhen you install with multiple paths, you must enable multipathing directly after the installation, not at a later point in time, as this can cause problems.
Set the install device as:
coreos.inst.install_dev=/dev/sda
.NoteIf additional LUNs are configured with NPIV, FCP requires
zfcp.allow_lun_scan=0
. If you must enablezfcp.allow_lun_scan=1
because you use a CSI driver, for example, you must configure your NPIV so that each node cannot access the boot partition of another node.You can adjust further parameters if required.
ImportantAdditional postinstallation steps are required to fully enable multipathing. For more information, see “Enabling multipathing with kernel arguments on RHCOS" in Postinstallation machine configuration tasks.
The following is an example parameter file,
additional-worker-fcp.parm
for a worker node with multipathing:cio_ignore=all,!condev rd.neednet=1 \ console=ttysclp0 \ coreos.inst.install_dev=/dev/sda \ coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \ coreos.inst.ignition_url=http://<http_server>/worker.ign \ ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \ rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \ zfcp.allow_lun_scan=0 \ rd.zfcp=0.0.1987,0x50050763070bc5e3,0x4008400B00000000 \ rd.zfcp=0.0.19C7,0x50050763070bc5e3,0x4008400B00000000 \ rd.zfcp=0.0.1987,0x50050763071bc5e3,0x4008400B00000000 \ rd.zfcp=0.0.19C7,0x50050763071bc5e3,0x4008400B00000000
Write all options in the parameter file as a single line and make sure that you have no newline characters.
-
Transfer the
initramfs
,kernel
, parameter files, and RHCOS images to z/VM, for example, by using FTP. For details about how to transfer the files with FTP and boot from the virtual reader, see Installing under Z/VM. Punch the files to the virtual reader of the z/VM guest virtual machine.
See PUNCH in IBM® Documentation.
TipYou can use the CP PUNCH command or, if you use Linux, the vmur command to transfer files between two z/VM guest virtual machines.
- Log in to CMS on the bootstrap machine.
IPL the bootstrap machine from the reader by running the following command:
$ ipl c
See IPL in IBM® Documentation.
3.6.3. Approving the certificate signing requests for your machines
When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.
Prerequisites
- You added machines to your cluster.
Procedure
Confirm that the cluster recognizes the machines:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 63m v1.30.3 master-1 Ready master 63m v1.30.3 master-2 Ready master 64m v1.30.3
The output lists all of the machines that you created.
NoteThe preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.
Review the pending CSRs and ensure that you see the client requests with the
Pending
orApproved
status for each machine that you added to the cluster:$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-8b2br 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-8vnps 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending ...
In this example, two machines are joining the cluster. You might see more approved CSRs in the list.
If the CSRs were not approved, after all of the pending CSRs for the machines you added are in
Pending
status, approve the CSRs for your cluster machines:NoteBecause the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the
machine-approver
if the Kubelet requests a new certificate with identical parameters.NoteFor clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the
oc exec
,oc rsh
, andoc logs
commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by thenode-bootstrapper
service account in thesystem:node
orsystem:admin
groups, and confirm the identity of the node.To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
NoteSome Operators might not become available until some CSRs are approved.
Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:
$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-bfd72 5m26s system:node:ip-10-0-50-126.us-east-2.compute.internal Pending csr-c57lv 5m26s system:node:ip-10-0-95-157.us-east-2.compute.internal Pending ...
If the remaining CSRs are not approved, and are in the
Pending
status, approve the CSRs for your cluster machines:To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
After all client and server CSRs have been approved, the machines have the
Ready
status. Verify this by running the following command:$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 73m v1.30.3 master-1 Ready master 73m v1.30.3 master-2 Ready master 74m v1.30.3 worker-0 Ready worker 11m v1.30.3 worker-1 Ready worker 11m v1.30.3
NoteIt can take a few minutes after approval of the server CSRs for the machines to transition to the
Ready
status.
Additional information
3.7. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE in an LPAR
To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x
) in an LPAR, you must have an existing single-architecture x86_64
cluster. You can then add s390x
compute machines to your OpenShift Container Platform cluster.
Before you can add s390x
nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.
The following procedures explain how to create a RHCOS compute machine using an LPAR instance. This will allow you to add s390x
nodes to your cluster and deploy a cluster with multi-architecture compute machines.
To create an IBM Z® or IBM® LinuxONE (s390x
) cluster with multi-architecture compute machines on x86_64
, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64
compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.
3.7.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.7.2. Creating RHCOS machines on IBM Z with z/VM
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines running on IBM Z® with z/VM and attach them to your existing cluster.
Prerequisites
- You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
- You have an HTTP or HTTPS server running on your provisioning machine that is accessible to the machines you create.
Procedure
Disable UDP aggregation.
Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an
x86_64
control plane and additionals390x
compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.Create a YAML file
udp-aggregation-config.yaml
with the following content:apiVersion: v1 kind: ConfigMap data: disable-udp-aggregation: "true" metadata: name: udp-aggregation-config namespace: openshift-network-operator
Create the ConfigMap resource by running the following command:
$ oc create -f udp-aggregation-config.yaml
Extract the Ignition config file from the cluster by running the following command:
$ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
-
Upload the
worker.ign
Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:
$ curl -k http://<http_server>/worker.ign
Download the RHEL live
kernel
,initramfs
, androotfs
files by running the following commands:$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
-
Move the downloaded RHEL live
kernel
,initramfs
, androotfs
files to an HTTP or HTTPS server that is accessible from the z/VM guest you want to add. Create a parameter file for the z/VM guest. The following parameters are specific for the virtual machine:
Optional: To specify a static IP address, add an
ip=
parameter with the following entries, with each separated by a colon:- The IP address for the machine.
- An empty string.
- The gateway.
- The netmask.
-
The machine host and domain name in the form
hostname.domainname
. Omit this value to let RHCOS decide. - The network interface name. Omit this value to let RHCOS decide.
-
The value
none
.
-
For
coreos.inst.ignition_url=
, specify the URL to theworker.ign
file. Only HTTP and HTTPS protocols are supported. -
For
coreos.live.rootfs_url=
, specify the matching rootfs artifact for thekernel
andinitramfs
you are booting. Only HTTP and HTTPS protocols are supported. For installations on DASD-type disks, complete the following tasks:
-
For
coreos.inst.install_dev=
, specify/dev/dasda
. -
Use
rd.dasd=
to specify the DASD where RHCOS is to be installed. You can adjust further parameters if required.
The following is an example parameter file,
additional-worker-dasd.parm
:cio_ignore=all,!condev rd.neednet=1 \ console=ttysclp0 \ coreos.inst.install_dev=/dev/dasda \ coreos.inst.ignition_url=http://<http_server>/worker.ign \ coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \ ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \ rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \ rd.dasd=0.0.3490 \ zfcp.allow_lun_scan=0
Write all options in the parameter file as a single line and make sure that you have no newline characters.
-
For
For installations on FCP-type disks, complete the following tasks:
Use
rd.zfcp=<adapter>,<wwpn>,<lun>
to specify the FCP disk where RHCOS is to be installed. For multipathing, repeat this step for each additional path.NoteWhen you install with multiple paths, you must enable multipathing directly after the installation, not at a later point in time, as this can cause problems.
Set the install device as:
coreos.inst.install_dev=/dev/sda
.NoteIf additional LUNs are configured with NPIV, FCP requires
zfcp.allow_lun_scan=0
. If you must enablezfcp.allow_lun_scan=1
because you use a CSI driver, for example, you must configure your NPIV so that each node cannot access the boot partition of another node.You can adjust further parameters if required.
ImportantAdditional postinstallation steps are required to fully enable multipathing. For more information, see “Enabling multipathing with kernel arguments on RHCOS" in Postinstallation machine configuration tasks.
The following is an example parameter file,
additional-worker-fcp.parm
for a worker node with multipathing:cio_ignore=all,!condev rd.neednet=1 \ console=ttysclp0 \ coreos.inst.install_dev=/dev/sda \ coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img \ coreos.inst.ignition_url=http://<http_server>/worker.ign \ ip=<ip>::<gateway>:<netmask>:<hostname>::none nameserver=<dns> \ rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,portno=0 \ zfcp.allow_lun_scan=0 \ rd.zfcp=0.0.1987,0x50050763070bc5e3,0x4008400B00000000 \ rd.zfcp=0.0.19C7,0x50050763070bc5e3,0x4008400B00000000 \ rd.zfcp=0.0.1987,0x50050763071bc5e3,0x4008400B00000000 \ rd.zfcp=0.0.19C7,0x50050763071bc5e3,0x4008400B00000000
Write all options in the parameter file as a single line and make sure that you have no newline characters.
-
Transfer the
initramfs
,kernel
, parameter files, and RHCOS images to z/VM, for example, by using FTP. For details about how to transfer the files with FTP and boot from the virtual reader, see Installing under Z/VM. Punch the files to the virtual reader of the z/VM guest virtual machine.
See PUNCH in IBM® Documentation.
TipYou can use the CP PUNCH command or, if you use Linux, the vmur command to transfer files between two z/VM guest virtual machines.
- Log in to CMS on the bootstrap machine.
IPL the bootstrap machine from the reader by running the following command:
$ ipl c
See IPL in IBM® Documentation.
3.7.3. Approving the certificate signing requests for your machines
When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.
Prerequisites
- You added machines to your cluster.
Procedure
Confirm that the cluster recognizes the machines:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 63m v1.30.3 master-1 Ready master 63m v1.30.3 master-2 Ready master 64m v1.30.3
The output lists all of the machines that you created.
NoteThe preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.
Review the pending CSRs and ensure that you see the client requests with the
Pending
orApproved
status for each machine that you added to the cluster:$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-8b2br 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-8vnps 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending ...
In this example, two machines are joining the cluster. You might see more approved CSRs in the list.
If the CSRs were not approved, after all of the pending CSRs for the machines you added are in
Pending
status, approve the CSRs for your cluster machines:NoteBecause the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the
machine-approver
if the Kubelet requests a new certificate with identical parameters.NoteFor clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the
oc exec
,oc rsh
, andoc logs
commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by thenode-bootstrapper
service account in thesystem:node
orsystem:admin
groups, and confirm the identity of the node.To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
NoteSome Operators might not become available until some CSRs are approved.
Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:
$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-bfd72 5m26s system:node:ip-10-0-50-126.us-east-2.compute.internal Pending csr-c57lv 5m26s system:node:ip-10-0-95-157.us-east-2.compute.internal Pending ...
If the remaining CSRs are not approved, and are in the
Pending
status, approve the CSRs for your cluster machines:To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
After all client and server CSRs have been approved, the machines have the
Ready
status. Verify this by running the following command:$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 73m v1.30.3 master-1 Ready master 73m v1.30.3 master-2 Ready master 74m v1.30.3 worker-0 Ready worker 11m v1.30.3 worker-1 Ready worker 11m v1.30.3
NoteIt can take a few minutes after approval of the server CSRs for the machines to transition to the
Ready
status.
Additional information
3.8. Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with RHEL KVM
To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x
) with RHEL KVM, you must have an existing single-architecture x86_64
cluster. You can then add s390x
compute machines to your OpenShift Container Platform cluster.
Before you can add s390x
nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.
The following procedures explain how to create a RHCOS compute machine using a RHEL KVM instance. This will allow you to add s390x
nodes to your cluster and deploy a cluster with multi-architecture compute machines.
To create an IBM Z® or IBM® LinuxONE (s390x
) cluster with multi-architecture compute machines on x86_64
, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64
compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
3.8.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.8.2. Creating RHCOS machines using virt-install
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using virt-install
.
Prerequisites
- You have at least one LPAR running on RHEL 8.7 or later with KVM, referred to as RHEL KVM host in this procedure.
- The KVM/QEMU hypervisor is installed on the RHEL KVM host.
- You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
- An HTTP or HTTPS server is set up.
Procedure
Disable UDP aggregation.
Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an
x86_64
control plane and additionals390x
compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.Create a YAML file
udp-aggregation-config.yaml
with the following content:apiVersion: v1 kind: ConfigMap data: disable-udp-aggregation: "true" metadata: name: udp-aggregation-config namespace: openshift-network-operator
Create the ConfigMap resource by running the following command:
$ oc create -f udp-aggregation-config.yaml
Extract the Ignition config file from the cluster by running the following command:
$ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
-
Upload the
worker.ign
Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file. You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:
$ curl -k http://<HTTP_server>/worker.ign
Download the RHEL live
kernel
,initramfs
, androotfs
files by running the following commands:$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')
$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')
$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \ | jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')
-
Move the downloaded RHEL live
kernel
,initramfs
androotfs
files to an HTTP or HTTPS server before you launchvirt-install
. Create the new KVM guest nodes using the RHEL
kernel
,initramfs
, and Ignition files; the new disk image; and adjusted parm line arguments.$ virt-install \ --connect qemu:///system \ --name <vm_name> \ --autostart \ --os-variant rhel9.4 \ 1 --cpu host \ --vcpus <vcpus> \ --memory <memory_mb> \ --disk <vm_name>.qcow2,size=<image_size> \ --network network=<virt_network_parm> \ --location <media_location>,kernel=<rhcos_kernel>,initrd=<rhcos_initrd> \ 2 --extra-args "rd.neednet=1" \ --extra-args "coreos.inst.install_dev=/dev/vda" \ --extra-args "coreos.inst.ignition_url=http://<http_server>/worker.ign " \ 3 --extra-args "coreos.live.rootfs_url=http://<http_server>/rhcos-<version>-live-rootfs.<architecture>.img" \ 4 --extra-args "ip=<ip>::<gateway>:<netmask>:<hostname>::none" \ 5 --extra-args "nameserver=<dns>" \ --extra-args "console=ttysclp0" \ --noautoconsole \ --wait
- 1
- For
os-variant
, specify the RHEL version for the RHCOS compute machine.rhel9.4
is the recommended version. To query the supported RHEL version of your operating system, run the following command:$ osinfo-query os -f short-id
NoteThe
os-variant
is case sensitive. - 2
- For
--location
, specify the location of the kernel/initrd on the HTTP or HTTPS server. - 3
- Specify the location of the
worker.ign
config file. Only HTTP and HTTPS protocols are supported. - 4
- Specify the location of the
rootfs
artifact for thekernel
andinitramfs
you are booting. Only HTTP and HTTPS protocols are supported - 5
- Optional: For
hostname
, specify the fully qualified hostname of the client machine.
NoteIf you are using HAProxy as a load balancer, update your HAProxy rules for
ingress-router-443
andingress-router-80
in the/etc/haproxy/haproxy.cfg
configuration file.- Continue to create more compute machines for your cluster.
3.8.3. Approving the certificate signing requests for your machines
When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.
Prerequisites
- You added machines to your cluster.
Procedure
Confirm that the cluster recognizes the machines:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 63m v1.30.3 master-1 Ready master 63m v1.30.3 master-2 Ready master 64m v1.30.3
The output lists all of the machines that you created.
NoteThe preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.
Review the pending CSRs and ensure that you see the client requests with the
Pending
orApproved
status for each machine that you added to the cluster:$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-8b2br 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-8vnps 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending ...
In this example, two machines are joining the cluster. You might see more approved CSRs in the list.
If the CSRs were not approved, after all of the pending CSRs for the machines you added are in
Pending
status, approve the CSRs for your cluster machines:NoteBecause the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the
machine-approver
if the Kubelet requests a new certificate with identical parameters.NoteFor clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the
oc exec
,oc rsh
, andoc logs
commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by thenode-bootstrapper
service account in thesystem:node
orsystem:admin
groups, and confirm the identity of the node.To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
NoteSome Operators might not become available until some CSRs are approved.
Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:
$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-bfd72 5m26s system:node:ip-10-0-50-126.us-east-2.compute.internal Pending csr-c57lv 5m26s system:node:ip-10-0-95-157.us-east-2.compute.internal Pending ...
If the remaining CSRs are not approved, and are in the
Pending
status, approve the CSRs for your cluster machines:To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
After all client and server CSRs have been approved, the machines have the
Ready
status. Verify this by running the following command:$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 73m v1.30.3 master-1 Ready master 73m v1.30.3 master-2 Ready master 74m v1.30.3 worker-0 Ready worker 11m v1.30.3 worker-1 Ready worker 11m v1.30.3
NoteIt can take a few minutes after approval of the server CSRs for the machines to transition to the
Ready
status.
Additional information
3.9. Creating a cluster with multi-architecture compute machines on IBM Power
To create a cluster with multi-architecture compute machines on IBM Power® (ppc64le
), you must have an existing single-architecture (x86_64
) cluster. You can then add ppc64le
compute machines to your OpenShift Container Platform cluster.
Before you can add ppc64le
nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.
The following procedures explain how to create a RHCOS compute machine using an ISO image or network PXE booting. This will allow you to add ppc64le
nodes to your cluster and deploy a cluster with multi-architecture compute machines.
To create an IBM Power® (ppc64le
) cluster with multi-architecture compute machines on x86_64
, follow the instructions for Installing a cluster on IBM Power®. You can then add x86_64
compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.
Before adding a secondary architecture node to your cluster, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
object. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
3.9.1. Verifying cluster compatibility
Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.
Prerequisites
-
You installed the OpenShift CLI (
oc
).
When using multiple architectures, hosts for OpenShift Container Platform nodes must share the same storage layer. If they do not have the same storage layer, use a storage provider such as nfs-provisioner
.
You should limit the number of network hops between the compute and control plane as much as possible.
Procedure
-
Log in to the OpenShift CLI (
oc
). You can check that your cluster uses the architecture payload by running the following command:
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
Verification
If you see the following output, your cluster is using the multi-architecture payload:
{ "release.openshift.io/architecture": "multi", "url": "https://access.redhat.com/errata/<errata_version>" }
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, your cluster is not using the multi-architecture payload:
{ "url": "https://access.redhat.com/errata/<errata_version>" }
ImportantTo migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.
3.9.2. Creating RHCOS machines using an ISO image
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using an ISO image to create the machines.
Prerequisites
- Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
-
You must have the OpenShift CLI (
oc
) installed.
Procedure
Extract the Ignition config file from the cluster by running the following command:
$ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign
-
Upload the
worker.ign
Ignition config file you exported from your cluster to your HTTP server. Note the URLs of these files. You can validate that the ignition files are available on the URLs. The following example gets the Ignition config files for the compute node:
$ curl -k http://<HTTP_server>/worker.ign
You can access the ISO image for booting your new machine by running to following command:
RHCOS_VHD_ORIGIN_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' | jq -r '.architectures.<architecture>.artifacts.metal.formats.iso.disk.location')
Use the ISO file to install RHCOS on more compute machines. Use the same method that you used when you created machines before you installed the cluster:
- Burn the ISO image to a disk and boot it directly.
- Use ISO redirection with a LOM interface.
Boot the RHCOS ISO image without specifying any options, or interrupting the live boot sequence. Wait for the installer to boot into a shell prompt in the RHCOS live environment.
NoteYou can interrupt the RHCOS installation boot process to add kernel arguments. However, for this ISO procedure you must use the
coreos-installer
command as outlined in the following steps, instead of adding kernel arguments.Run the
coreos-installer
command and specify the options that meet your installation requirements. At a minimum, you must specify the URL that points to the Ignition config file for the node type, and the device that you are installing to:$ sudo coreos-installer install --ignition-url=http://<HTTP_server>/<node_type>.ign <device> --ignition-hash=sha512-<digest> 12
- 1
- You must run the
coreos-installer
command by usingsudo
, because thecore
user does not have the required root privileges to perform the installation. - 2
- The
--ignition-hash
option is required when the Ignition config file is obtained through an HTTP URL to validate the authenticity of the Ignition config file on the cluster node.<digest>
is the Ignition config file SHA512 digest obtained in a preceding step.
NoteIf you want to provide your Ignition config files through an HTTPS server that uses TLS, you can add the internal certificate authority (CA) to the system trust store before running
coreos-installer
.The following example initializes a bootstrap node installation to the
/dev/sda
device. The Ignition config file for the bootstrap node is obtained from an HTTP web server with the IP address 192.168.1.2:$ sudo coreos-installer install --ignition-url=http://192.168.1.2:80/installation_directory/bootstrap.ign /dev/sda --ignition-hash=sha512-a5a2d43879223273c9b60af66b44202a1d1248fc01cf156c46d4a79f552b6bad47bc8cc78ddf0116e80c59d2ea9e32ba53bc807afbca581aa059311def2c3e3b
Monitor the progress of the RHCOS installation on the console of the machine.
ImportantEnsure that the installation is successful on each node before commencing with the OpenShift Container Platform installation. Observing the installation process can also help to determine the cause of RHCOS installation issues that might arise.
- Continue to create more compute machines for your cluster.
3.9.3. Creating RHCOS machines by PXE or iPXE booting
You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your bare metal cluster by using PXE or iPXE booting.
Prerequisites
- Obtain the URL of the Ignition config file for the compute machines for your cluster. You uploaded this file to your HTTP server during installation.
-
Obtain the URLs of the RHCOS ISO image, compressed metal BIOS,
kernel
, andinitramfs
files that you uploaded to your HTTP server during cluster installation. - You have access to the PXE booting infrastructure that you used to create the machines for your OpenShift Container Platform cluster during installation. The machines must boot from their local disks after RHCOS is installed on them.
-
If you use UEFI, you have access to the
grub.conf
file that you modified during OpenShift Container Platform installation.
Procedure
Confirm that your PXE or iPXE installation for the RHCOS images is correct.
For PXE:
DEFAULT pxeboot TIMEOUT 20 PROMPT 0 LABEL pxeboot KERNEL http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> 1 APPEND initrd=http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img 2
- 1
- Specify the location of the live
kernel
file that you uploaded to your HTTP server. - 2
- Specify locations of the RHCOS files that you uploaded to your HTTP server. The
initrd
parameter value is the location of the liveinitramfs
file, thecoreos.inst.ignition_url
parameter value is the location of the worker Ignition config file, and thecoreos.live.rootfs_url
parameter value is the location of the liverootfs
file. Thecoreos.inst.ignition_url
andcoreos.live.rootfs_url
parameters only support HTTP and HTTPS.
NoteThis configuration does not enable serial console access on machines with a graphical console. To configure a different console, add one or more
console=
arguments to theAPPEND
line. For example, addconsole=tty0 console=ttyS0
to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?.For iPXE (
x86_64
+ppc64le
):kernel http://<HTTP_server>/rhcos-<version>-live-kernel-<architecture> initrd=main coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2 initrd --name main http://<HTTP_server>/rhcos-<version>-live-initramfs.<architecture>.img 3 boot
- 1
- Specify the locations of the RHCOS files that you uploaded to your HTTP server. The
kernel
parameter value is the location of thekernel
file, theinitrd=main
argument is needed for booting on UEFI systems, thecoreos.live.rootfs_url
parameter value is the location of therootfs
file, and thecoreos.inst.ignition_url
parameter value is the location of the worker Ignition config file. - 2
- If you use multiple NICs, specify a single interface in the
ip
option. For example, to use DHCP on a NIC that is namedeno1
, setip=eno1:dhcp
. - 3
- Specify the location of the
initramfs
file that you uploaded to your HTTP server.
NoteThis configuration does not enable serial console access on machines with a graphical console To configure a different console, add one or more
console=
arguments to thekernel
line. For example, addconsole=tty0 console=ttyS0
to set the first PC serial port as the primary console and the graphical console as a secondary console. For more information, see How does one set up a serial terminal and/or console in Red Hat Enterprise Linux? and "Enabling the serial console for PXE and ISO installation" in the "Advanced RHCOS installation configuration" section.NoteTo network boot the CoreOS
kernel
onppc64le
architecture, you need to use a version of iPXE build with theIMAGE_GZIP
option enabled. SeeIMAGE_GZIP
option in iPXE.For PXE (with UEFI and GRUB as second stage) on
ppc64le
:menuentry 'Install CoreOS' { linux rhcos-<version>-live-kernel-<architecture> coreos.live.rootfs_url=http://<HTTP_server>/rhcos-<version>-live-rootfs.<architecture>.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://<HTTP_server>/worker.ign 1 2 initrd rhcos-<version>-live-initramfs.<architecture>.img 3 }
- 1
- Specify the locations of the RHCOS files that you uploaded to your HTTP/TFTP server. The
kernel
parameter value is the location of thekernel
file on your TFTP server. Thecoreos.live.rootfs_url
parameter value is the location of therootfs
file, and thecoreos.inst.ignition_url
parameter value is the location of the worker Ignition config file on your HTTP Server. - 2
- If you use multiple NICs, specify a single interface in the
ip
option. For example, to use DHCP on a NIC that is namedeno1
, setip=eno1:dhcp
. - 3
- Specify the location of the
initramfs
file that you uploaded to your TFTP server.
- Use the PXE or iPXE infrastructure to create the required compute machines for your cluster.
3.9.4. Approving the certificate signing requests for your machines
When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.
Prerequisites
- You added machines to your cluster.
Procedure
Confirm that the cluster recognizes the machines:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 63m v1.30.3 master-1 Ready master 63m v1.30.3 master-2 Ready master 64m v1.30.3
The output lists all of the machines that you created.
NoteThe preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.
Review the pending CSRs and ensure that you see the client requests with the
Pending
orApproved
status for each machine that you added to the cluster:$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-8b2br 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-8vnps 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending ...
In this example, two machines are joining the cluster. You might see more approved CSRs in the list.
If the CSRs were not approved, after all of the pending CSRs for the machines you added are in
Pending
status, approve the CSRs for your cluster machines:NoteBecause the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the
machine-approver
if the Kubelet requests a new certificate with identical parameters.NoteFor clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the
oc exec
,oc rsh
, andoc logs
commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by thenode-bootstrapper
service account in thesystem:node
orsystem:admin
groups, and confirm the identity of the node.To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
NoteSome Operators might not become available until some CSRs are approved.
Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:
$ oc get csr
Example output
NAME AGE REQUESTOR CONDITION csr-bfd72 5m26s system:node:ip-10-0-50-126.us-east-2.compute.internal Pending csr-c57lv 5m26s system:node:ip-10-0-95-157.us-east-2.compute.internal Pending ...
If the remaining CSRs are not approved, and are in the
Pending
status, approve the CSRs for your cluster machines:To approve them individually, run the following command for each valid CSR:
$ oc adm certificate approve <csr_name> 1
- 1
<csr_name>
is the name of a CSR from the list of current CSRs.
To approve all pending CSRs, run the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
After all client and server CSRs have been approved, the machines have the
Ready
status. Verify this by running the following command:$ oc get nodes -o wide
Example output
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker-0-ppc64le Ready worker 42d v1.30.3 192.168.200.21 <none> Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.ppc64le cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9 worker-1-ppc64le Ready worker 42d v1.30.3 192.168.200.20 <none> Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.ppc64le cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9 master-0-x86 Ready control-plane,master 75d v1.30.3 10.248.0.38 10.248.0.38 Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.x86_64 cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9 master-1-x86 Ready control-plane,master 75d v1.30.3 10.248.0.39 10.248.0.39 Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.x86_64 cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9 master-2-x86 Ready control-plane,master 75d v1.30.3 10.248.0.40 10.248.0.40 Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.x86_64 cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9 worker-0-x86 Ready worker 75d v1.30.3 10.248.0.43 10.248.0.43 Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.x86_64 cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9 worker-1-x86 Ready worker 75d v1.30.3 10.248.0.44 10.248.0.44 Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow) 5.14.0-284.34.1.el9_2.x86_64 cri-o://1.30.3-3.rhaos4.15.gitb36169e.el9
NoteIt can take a few minutes after approval of the server CSRs for the machines to transition to the
Ready
status.
Additional information
3.10. Managing a cluster with multi-architecture compute machines
Managing a cluster that has nodes with multiple architectures requires you to consider node architecture as you monitor the cluster and manage your workloads. This requires you to take additional considerations into account when you configure cluster resource requirements and behavior, or schedule workloads in a multi-architecture cluster.
3.10.1. Scheduling workloads on clusters with multi-architecture compute machines
When you deploy workloads on a cluster with compute nodes that use different architectures, you must align pod architecture with the architecture of the underlying node. Your workload may also require additional configuration to particular resources depending on the underlying node architecture.
You can use the Multiarch Tuning Operator to enable architecture-aware scheduling of workloads on clusters with multi-architecture compute machines. The Multiarch Tuning Operator implements additional scheduler predicates in the pods specifications based on the architectures that the pods can support at creation time.
3.10.1.1. Sample multi-architecture node workload deployments
Scheduling a workload to an appropriate node based on architecture works in the same way as scheduling based on any other node characteristic. Consider the following options when determining how to schedule your workloads.
- Using
nodeAffinity
to schedule nodes with specific architectures You can allow a workload to be scheduled on only a set of nodes with architectures supported by its images, you can set the
spec.affinity.nodeAffinity
field in your pod’s template specification.apiVersion: apps/v1 kind: Deployment metadata: # ... spec: # ... template: # ... spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: 1 - amd64 - arm64
- 1
- Specify the supported architectures. Valid values include
amd64
,arm64
, or both values.
- Tainting each node for a specific architecture
You can taint a node to avoid the node scheduling workloads that are incompatible with its architecture. When your cluster uses a
MachineSet
object, you can add parameters to the.spec.template.spec.taints
field to avoid workloads being scheduled on nodes with non-supported architectures.Before you add a taint to a node, you must scale down the
MachineSet
object or remove existing available machines. For more information, see Modifying a compute machine set.apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: # ... spec: # ... template: # ... spec: # ... taints: - effect: NoSchedule key: multiarch.openshift.io/arch value: arm64
You can also set a taint on a specific node by running the following command:
$ oc adm taint nodes <node-name> multiarch.openshift.io/arch=arm64:NoSchedule
- Creating a default toleration in a namespace
When a node or machine set has a taint, only workloads that tolerate that taint can be scheduled. You can annotate a namespace so all of the workloads get the same default toleration by running the following command:
$ oc annotate namespace my-namespace \ 'scheduler.alpha.kubernetes.io/defaultTolerations'='[{"operator": "Exists", "effect": "NoSchedule", "key": "multiarch.openshift.io/arch"}]'
- Tolerating architecture taints in workloads
When a node or machine set has a taint, only workloads that tolerate that taint can be scheduled. You can configure your workload with a
toleration
so that it is scheduled on nodes with specific architecture taints.apiVersion: apps/v1 kind: Deployment metadata: # ... spec: # ... template: # ... spec: tolerations: - key: "multiarch.openshift.io/arch" value: "arm64" operator: "Equal" effect: "NoSchedule"
This example deployment can be scheduled on nodes and machine sets that have the
multiarch.openshift.io/arch=arm64
taint specified.
- Using node affinity with taints and tolerations
When a scheduler computes the set of nodes to schedule a pod, tolerations can broaden the set while node affinity restricts the set. If you set a taint on nodes that have a specific architecture, you must also add a toleration to workloads that you want to be scheduled there.
apiVersion: apps/v1 kind: Deployment metadata: # ... spec: # ... template: # ... spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 - arm64 tolerations: - key: "multiarch.openshift.io/arch" value: "arm64" operator: "Equal" effect: "NoSchedule"
3.10.2. Enabling 64k pages on the Red Hat Enterprise Linux CoreOS (RHCOS) kernel
You can enable the 64k memory page in the Red Hat Enterprise Linux CoreOS (RHCOS) kernel on the 64-bit ARM compute machines in your cluster. The 64k page size kernel specification can be used for large GPU or high memory workloads. This is done using the Machine Config Operator (MCO) which uses a machine config pool to update the kernel. To enable 64k page sizes, you must dedicate a machine config pool for ARM64 to enable on the kernel.
Using 64k pages is exclusive to 64-bit ARM architecture compute nodes or clusters installed on 64-bit ARM machines. If you configure the 64k pages kernel on a machine config pool using 64-bit x86 machines, the machine config pool and MCO will degrade.
Prerequisites
-
You installed the OpenShift CLI (
oc
). - You created a cluster with compute nodes of different architecture on one of the supported platforms.
Procedure
Label the nodes where you want to run the 64k page size kernel:
$ oc label node <node_name> <label>
Example command
$ oc label node worker-arm64-01 node-role.kubernetes.io/worker-64k-pages=
Create a machine config pool that contains the worker role that uses the ARM64 architecture and the
worker-64k-pages
role:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-64k-pages spec: machineConfigSelector: matchExpressions: - key: machineconfiguration.openshift.io/role operator: In values: - worker - worker-64k-pages nodeSelector: matchLabels: node-role.kubernetes.io/worker-64k-pages: "" kubernetes.io/arch: arm64
Create a machine config on your compute node to enable
64k-pages
with the64k-pages
parameter.$ oc create -f <filename>.yaml
Example MachineConfig
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker-64k-pages" 1 name: 99-worker-64kpages spec: kernelType: 64k-pages 2
NoteThe
64k-pages
type is supported on only 64-bit ARM architecture based compute nodes. Therealtime
type is supported on only 64-bit x86 architecture based compute nodes.
Verification
To view your new
worker-64k-pages
machine config pool, run the following command:$ oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-9d55ac9a91127c36314e1efe7d77fbf8 True False False 3 3 3 0 361d worker rendered-worker-e7b61751c4a5b7ff995d64b967c421ff True False False 7 7 7 0 361d worker-64k-pages rendered-worker-64k-pages-e7b61751c4a5b7ff995d64b967c421ff True False False 2 2 2 0 35m
3.10.3. Importing manifest lists in image streams on your multi-architecture compute machines
On an OpenShift Container Platform 4.17 cluster with multi-architecture compute machines, the image streams in the cluster do not import manifest lists automatically. You must manually change the default importMode
option to the PreserveOriginal
option in order to import the manifest list.
Prerequisites
-
You installed the OpenShift Container Platform CLI (
oc
).
Procedure
The following example command shows how to patch the
ImageStream
cli-artifacts so that thecli-artifacts:latest
image stream tag is imported as a manifest list.$ oc patch is/cli-artifacts -n openshift -p '{"spec":{"tags":[{"name":"latest","importPolicy":{"importMode":"PreserveOriginal"}}]}}'
Verification
You can check that the manifest lists imported properly by inspecting the image stream tag. The following command will list the individual architecture manifests for a particular tag.
$ oc get istag cli-artifacts:latest -n openshift -oyaml
If the
dockerImageManifests
object is present, then the manifest list import was successful.Example output of the
dockerImageManifests
objectdockerImageManifests: - architecture: amd64 digest: sha256:16d4c96c52923a9968fbfa69425ec703aff711f1db822e4e9788bf5d2bee5d77 manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux - architecture: arm64 digest: sha256:6ec8ad0d897bcdf727531f7d0b716931728999492709d19d8b09f0d90d57f626 manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux - architecture: ppc64le digest: sha256:65949e3a80349cdc42acd8c5b34cde6ebc3241eae8daaeea458498fedb359a6a manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux - architecture: s390x digest: sha256:75f4fa21224b5d5d511bea8f92dfa8e1c00231e5c81ab95e83c3013d245d1719 manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux
3.11. Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator
The Multiarch Tuning Operator optimizes workload management within multi-architecture clusters and in single-architecture clusters transitioning to multi-architecture environments.
Architecture-aware workload scheduling allows the scheduler to place pods onto nodes that match the architecture of the pod images.
By default, the scheduler does not consider the architecture of a pod’s container images when determining the placement of new pods onto nodes.
To enable architecture-aware workload scheduling, you must create the ClusterPodPlacementConfig
object. When you create the ClusterPodPlacementConfig
object, the Multiarch Tuning Operator deploys the necessary operands to support architecture-aware workload scheduling.
When a pod is created, the operands perform the following actions:
-
Add the
multiarch.openshift.io/scheduling-gate
scheduling gate that prevents the scheduling of the pod. -
Compute a scheduling predicate that includes the supported architecture values for the
kubernetes.io/arch
label. -
Integrate the scheduling predicate as a
nodeAffinity
requirement in the pod specification. - Remove the scheduling gate from the pod.
Note the following operand behaviors:
-
If the
nodeSelector
field is already configured with thekubernetes.io/arch
label for a workload, the operand does not update thenodeAffinity
field for that workload. -
If the
nodeSelector
field is not configured with thekubernetes.io/arch
label for a workload, the operand updates thenodeAffinity
field for that workload. However, in thatnodeAffinity
field, the operand updates only the node selector terms that are not configured with thekubernetes.io/arch
label. -
If the
nodeName
field is already set, the Multiarch Tuning Operator does not process the pod.
3.11.1. Installing the Multiarch Tuning Operator by using the CLI
You can install the Multiarch Tuning Operator by using the OpenShift CLI (oc
).
Prerequisites
-
You have installed
oc
. -
You have logged in to
oc
as a user withcluster-admin
privileges.
Procedure
Create a new project named
openshift-multiarch-tuning-operator
by running the following command:$ oc create ns openshift-multiarch-tuning-operator
Create an
OperatorGroup
object:Create a YAML file with the configuration for creating an
OperatorGroup
object.Example YAML configuration for creating an
OperatorGroup
objectapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-multiarch-tuning-operator namespace: openshift-multiarch-tuning-operator spec: {}
Create the
OperatorGroup
object by running the following command:$ oc create -f <file_name> 1
- 1
- Replace
<file_name>
with the name of the YAML file that contains theOperatorGroup
object configuration.
Create a
Subscription
object:Create a YAML file with the configuration for creating a
Subscription
object.Example YAML configuration for creating a
Subscription
objectapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: openshift-multiarch-tuning-operator namespace: openshift-multiarch-tuning-operator spec: channel: stable name: multiarch-tuning-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic startingCSV: multiarch-tuning-operator.v1.0.0
Create the
Subscription
object by running the following command:$ oc create -f <file_name> 1
- 1
- Replace
<file_name>
with the name of the YAML file that contains theSubscription
object configuration.
For more details about configuring the Subscription
object and OperatorGroup
object, see "Installing from OperatorHub using the CLI".
Verification
To verify that the Multiarch Tuning Operator is installed, run the following command:
$ oc get csv -n openshift-multiarch-tuning-operator
Example output
NAME DISPLAY VERSION REPLACES PHASE multiarch-tuning-operator.v1.0.0 Multiarch Tuning Operator 1.0.0 multiarch-tuning-operator.v0.9.0 Succeeded
The installation is successful if the Operator is in
Succeeded
phase.Optional: To verify that the
OperatorGroup
object is created, run the following command:$ oc get operatorgroup -n openshift-multiarch-tuning-operator
Example output
NAME AGE openshift-multiarch-tuning-operator-q8zbb 133m
Optional: To verify that the
Subscription
object is created, run the following command:$ oc get subscription -n openshift-multiarch-tuning-operator
Example output
NAME PACKAGE SOURCE CHANNEL multiarch-tuning-operator multiarch-tuning-operator redhat-operators stable
Additional resources
3.11.2. Installing the Multiarch Tuning Operator by using the web console
You can install the Multiarch Tuning Operator by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
- Navigate to Operators → OperatorHub.
- Enter Multiarch Tuning Operator in the search field.
- Click Multiarch Tuning Operator.
- Select the Multiarch Tuning Operator version from the Version list.
- Click Install
Set the following options on the Operator Installation page:
- Set Update Channel to stable.
- Set Installation Mode to All namespaces on the cluster.
Set Installed Namespace to Operator recommended Namespace or Select a Namespace.
The recommended Operator namespace is
openshift-multiarch-tuning-operator
. If theopenshift-multiarch-tuning-operator
namespace does not exist, it is created during the operator installation.If you select Select a namespace, you must select a namespace for the Operator from the Select Project list.
Update approval as Automatic or Manual.
If you select Automatic updates, Operator Lifecycle Manager (OLM) automatically updates the running instance of the Multiarch Tuning Operator without any intervention.
If you select Manual updates, OLM creates an update request. As a cluster administrator, you must manually approve the update request to update the Multiarch Tuning Operator to a newer version.
- Optional: Select the Enable Operator recommended cluster monitoring on this Namespace checkbox.
- Click Install.
Verification
- Navigate to Operators → Installed Operators.
-
Verify that the Multiarch Tuning Operator is listed with the Status field as Succeeded in the
openshift-multiarch-tuning-operator
namespace.
3.11.3. Multiarch Tuning Operator pod labels and architecture support overview
After installing the Multiarch Tuning Operator, you can verify the multi-architecture support for workloads in your cluster. You can identify and manage pods based on their architecture compatibility by using the pod labels. These labels are automatically set on the newly created pods to provide insights into their architecture support.
The following table describes the labels that the Multiarch Tuning Operator adds when you create a pod:
Label | Description |
---|---|
| The pod supports multiple architectures. |
| The pod supports only a single architecture. |
|
The pod supports the |
|
The pod supports the |
|
The pod supports the |
|
The pod supports the |
| The Operator has set the node affinity requirement for the architecture. |
| The Operator did not set the node affinity requirement. For example, when the pod already has a node affinity for the architecture, the Multiarch Tuning Operator adds this label to the pod. |
| The pod is gated. |
| The pod gate has been removed. |
| An error has occurred while building the node affinity requirements. |
3.11.4. Creating the ClusterPodPlacementConfig object
After installing the Multiarch Tuning Operator, you must create the ClusterPodPlacementConfig
object. When you create this object, the Multiarch Tuning Operator deploys an operand that enables architecture-aware workload scheduling.
You can create only one instance of the ClusterPodPlacementConfig
object.
Example ClusterPodPlacementConfig
object configuration
apiVersion: multiarch.openshift.io/v1beta1 kind: ClusterPodPlacementConfig metadata: name: cluster 1 spec: logVerbosityLevel: Normal 2 namespaceSelector: 3 matchExpressions: - key: multiarch.openshift.io/exclude-pod-placement operator: DoesNotExist
- 1
- You must set this field value to
cluster
. - 2
- Optional: You can set the field value to
Normal
,Debug
,Trace
, orTraceAll
. The value is set toNormal
by default. - 3
- Optional: You can configure the
namespaceSelector
to select the namespaces in which the Multiarch Tuning Operator’s pod placement operand must process thenodeAffinity
of the pods. All namespaces are considered by default.
In this example, the operator
field value is set to DoesNotExist
. Therefore, if the key
field value (multiarch.openshift.io/exclude-pod-placement
) is set as a label in a namespace, the operand does not process the nodeAffinity
of the pods in that namespace. Instead, the operand processes the nodeAffinity
of the pods in namespaces that do not contain the label.
If you want the operand to process the nodeAffinity
of the pods only in specific namespaces, you can configure the namespaceSelector
as follows:
namespaceSelector: matchExpressions: - key: multiarch.openshift.io/include-pod-placement operator: Exists
In this example, the operator
field value is set to Exists
. Therefore, the operand processes the nodeAffinity
of the pods only in namespaces that contain the multiarch.openshift.io/include-pod-placement
label.
This Operator excludes pods in namespaces starting with kube-
. It also excludes pods that are expected to be scheduled on control plane nodes.
3.11.4.1. Creating the ClusterPodPlacementConfig object by using the CLI
To deploy the pod placement operand that enables architecture-aware workload scheduling, you can create the ClusterPodPlacementConfig
object by using the OpenShift CLI (oc
).
Prerequisites
-
You have installed
oc
. -
You have logged in to
oc
as a user withcluster-admin
privileges. - You have installed the Multiarch Tuning Operator.
Procedure
Create a
ClusterPodPlacementConfig
object YAML file:Example
ClusterPodPlacementConfig
object configurationapiVersion: multiarch.openshift.io/v1beta1 kind: ClusterPodPlacementConfig metadata: name: cluster spec: logVerbosityLevel: Normal namespaceSelector: matchExpressions: - key: multiarch.openshift.io/exclude-pod-placement operator: DoesNotExist
Create the
ClusterPodPlacementConfig
object by running the following command:$ oc create -f <file_name> 1
- 1
- Replace
<file_name>
with the name of theClusterPodPlacementConfig
object YAML file.
Verification
To check that the
ClusterPodPlacementConfig
object is created, run the following command:$ oc get clusterpodplacementconfig
Example output
NAME AGE cluster 29s
3.11.4.2. Creating the ClusterPodPlacementConfig object by using the web console
To deploy the pod placement operand that enables architecture-aware workload scheduling, you can create the ClusterPodPlacementConfig
object by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
- You have installed the Multiarch Tuning Operator.
Procedure
- Log in to the OpenShift Container Platform web console.
- Navigate to Operators → Installed Operators.
- On the Installed Operators page, click Multiarch Tuning Operator.
- Click the Cluster Pod Placement Config tab.
- Select either Form view or YAML view.
-
Configure the
ClusterPodPlacementConfig
object parameters. - Click Create.
Optional: If you want to edit the
ClusterPodPlacementConfig
object, perform the following actions:- Click the Cluster Pod Placement Config tab.
- Select Edit ClusterPodPlacementConfig from the options menu.
-
Click YAML and edit the
ClusterPodPlacementConfig
object parameters. - Click Save.
Verification
-
On the Cluster Pod Placement Config page, check that the
ClusterPodPlacementConfig
object is in theReady
state.
3.11.5. Deleting the ClusterPodPlacementConfig object by using the CLI
You can create only one instance of the ClusterPodPlacementConfig
object. If you want to re-create this object, you must first delete the existing instance.
You can delete this object by using the OpenShift CLI (oc
).
Prerequisites
-
You have installed
oc
. -
You have logged in to
oc
as a user withcluster-admin
privileges.
Procedure
-
Log in to the OpenShift CLI (
oc
). Delete the
ClusterPodPlacementConfig
object by running the following command:$ oc delete clusterpodplacementconfig cluster
Verification
To check that the
ClusterPodPlacementConfig
object is deleted, run the following command:$ oc get clusterpodplacementconfig
Example output
No resources found
3.11.6. Deleting the ClusterPodPlacementConfig object by using the web console
You can create only one instance of the ClusterPodPlacementConfig
object. If you want to re-create this object, you must first delete the existing instance.
You can delete this object by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
-
You have created the
ClusterPodPlacementConfig
object.
Procedure
- Log in to the OpenShift Container Platform web console.
- Navigate to Operators → Installed Operators.
- On the Installed Operators page, click Multiarch Tuning Operator.
- Click the Cluster Pod Placement Config tab.
- Select Delete ClusterPodPlacementConfig from the options menu.
- Click Delete.
Verification
-
On the Cluster Pod Placement Config page, check that the
ClusterPodPlacementConfig
object has been deleted.
3.11.7. Uninstalling the Multiarch Tuning Operator by using the CLI
You can uninstall the Multiarch Tuning Operator by using the OpenShift CLI (oc
).
Prerequisites
-
You have installed
oc
. -
You have logged in to
oc
as a user withcluster-admin
privileges. You deleted the
ClusterPodPlacementConfig
object.ImportantYou must delete the
ClusterPodPlacementConfig
object before uninstalling the Multiarch Tuning Operator. Uninstalling the Operator without deleting theClusterPodPlacementConfig
object can lead to unexpected behavior.
Procedure
Get the
Subscription
object name for the Multiarch Tuning Operator by running the following command:$ oc get subscription.operators.coreos.com -n <namespace> 1
- 1
- Replace
<namespace>
with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.
Example output
NAME PACKAGE SOURCE CHANNEL openshift-multiarch-tuning-operator multiarch-tuning-operator redhat-operators stable
Get the
currentCSV
value for the Multiarch Tuning Operator by running the following command:$ oc get subscription.operators.coreos.com <subscription_name> -n <namespace> -o yaml | grep currentCSV 1
- 1
- Replace
<subscription_name>
with theSubscription
object name. For example:openshift-multiarch-tuning-operator
. Replace<namespace>
with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.
Example output
currentCSV: multiarch-tuning-operator.v1.0.0
Delete the
Subscription
object by running the following command:$ oc delete subscription.operators.coreos.com <subscription_name> -n <namespace> 1
- 1
- Replace
<subscription_name>
with theSubscription
object name. Replace<namespace>
with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.
Example output
subscription.operators.coreos.com "openshift-multiarch-tuning-operator" deleted
Delete the CSV for the Multiarch Tuning Operator in the target namespace using the
currentCSV
value by running the following command:$ oc delete clusterserviceversion <currentCSV_value> -n <namespace> 1
- 1
- Replace
<currentCSV>
with thecurrentCSV
value for the Multiarch Tuning Operator. For example:multiarch-tuning-operator.v1.0.0
. Replace<namespace>
with the name of the namespace where you want to uninstall the Multiarch Tuning Operator.
Example output
clusterserviceversion.operators.coreos.com "multiarch-tuning-operator.v1.0.0" deleted
Verification
To verify that the Multiarch Tuning Operator is uninstalled, run the following command:
$ oc get csv -n <namespace> 1
- 1
- Replace
<namespace>
with the name of the namespace where you have uninstalled the Multiarch Tuning Operator.
Example output
No resources found in openshift-multiarch-tuning-operator namespace.
3.11.8. Uninstalling the Multiarch Tuning Operator by using the web console
You can uninstall the Multiarch Tuning Operator by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster with
cluster-admin
permissions. You deleted the
ClusterPodPlacementConfig
object.ImportantYou must delete the
ClusterPodPlacementConfig
object before uninstalling the Multiarch Tuning Operator. Uninstalling the Operator without deleting theClusterPodPlacementConfig
object can lead to unexpected behavior.
Procedure
- Log in to the OpenShift Container Platform web console.
- Navigate to Operators → OperatorHub.
- Enter Multiarch Tuning Operator in the search field.
- Click Multiarch Tuning Operator.
- Click the Details tab.
- From the Actions menu, select Uninstall Operator.
- When prompted, click Uninstall.
Verification
- Navigate to Operators → Installed Operators.
- On the Installed Operators page, verify that the Multiarch Tuning Operator is not listed.
3.12. Multiarch Tuning Operator release notes
The Multiarch Tuning Operator optimizes workload management within multi-architecture clusters and in single-architecture clusters transitioning to multi-architecture environments.
These release notes track the development of the Multiarch Tuning Operator.
For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
3.12.1. Release notes for the Multiarch Tuning Operator 1.0.0
Issued: 31 October 2024
3.12.1.1. New features and enhancements
- With this release, the Multiarch Tuning Operator supports custom network scenarios and cluster-wide custom registries configurations.
- With this release, you can identify pods based on their architecture compatibility by using the pod labels that the Multiarch Tuning Operator adds to newly created pods.
- With this release, you can monitor the behavior of the Multiarch Tuning Operator by using the metrics and alerts that are registered in the Cluster Monitoring Operator.
Chapter 4. Postinstallation cluster tasks
After installing OpenShift Container Platform, you can further expand and customize your cluster to your requirements.
4.1. Available cluster customizations
You complete most of the cluster configuration and customization after you deploy your OpenShift Container Platform cluster. A number of configuration resources are available.
If you install your cluster on IBM Z®, not all features and functions are available.
You modify the configuration resources to configure the major features of the cluster, such as the image registry, networking configuration, image build behavior, and the identity provider.
For current documentation of the settings that you control by using these resources, use the oc explain
command, for example oc explain builds --api-version=config.openshift.io/v1
4.1.1. Cluster configuration resources
All cluster configuration resources are globally scoped (not namespaced) and named cluster
.
Resource name | Description |
---|---|
| Provides API server configuration such as certificates and certificate authorities. |
| Controls the identity provider and authentication configuration for the cluster. |
| Controls default and enforced configuration for all builds on the cluster. |
| Configures the behavior of the web console interface, including the logout behavior. |
| Enables FeatureGates so that you can use Tech Preview features. |
| Configures how specific image registries should be treated (allowed, disallowed, insecure, CA details). |
| Configuration details related to routing such as the default domain for routes. |
| Configures identity providers and other behavior related to internal OAuth server flows. |
| Configures how projects are created including the project template. |
| Defines proxies to be used by components needing external network access. Note: not all components currently consume this value. |
| Configures scheduler behavior such as profiles and default node selectors. |
4.1.2. Operator configuration resources
These configuration resources are cluster-scoped instances, named cluster
, which control the behavior of a specific component as owned by a particular Operator.
Resource name | Description |
---|---|
| Controls console appearance such as branding customizations |
| Configures OpenShift image registry settings such as public routing, log levels, proxy settings, resource constraints, replica counts, and storage type. |
| Configures the Samples Operator to control which example image streams and templates are installed on the cluster. |
4.1.3. Additional configuration resources
These configuration resources represent a single instance of a particular component. In some cases, you can request multiple instances by creating multiple instances of the resource. In other cases, the Operator can use only a specific resource instance name in a specific namespace. Reference the component-specific documentation for details on how and when you can create additional resource instances.
Resource name | Instance name | Namespace | Description |
---|---|---|---|
|
|
| Controls the Alertmanager deployment parameters. |
|
|
| Configures Ingress Operator behavior such as domain, number of replicas, certificates, and controller placement. |
4.1.4. Informational Resources
You use these resources to retrieve information about the cluster. Some configurations might require you to edit these resources directly.
Resource name | Instance name | Description |
---|---|---|
|
|
In OpenShift Container Platform 4.17, you must not customize the |
|
| You cannot modify the DNS settings for your cluster. You can check the DNS Operator status. |
|
| Configuration details allowing the cluster to interact with its cloud provider. |
|
| You cannot modify your cluster networking after installation. To customize your network, follow the process to customize networking during installation. |
4.2. Adding worker nodes
After you deploy your OpenShift Container Platform cluster, you can add worker nodes to scale cluster resources. There are different ways you can add worker nodes depending on the installation method and the environment of your cluster.
4.2.1. Adding worker nodes to an on-premise cluster
For on-premise clusters, you can add worker nodes by using the OpenShift Container Platform CLI (oc
) to generate an ISO image, which can then be used to boot one or more nodes in your target cluster. This process can be used regardless of how you installed your cluster.
You can add one or more nodes at a time while customizing each node with more complex configurations, such as static network configuration, or you can specify only the MAC address of each node. Any configurations that are not specified during ISO generation are retrieved from the target cluster and applied to the new nodes.
Preflight validation checks are also performed when booting the ISO image to inform you of failure-causing issues before you attempt to boot each node.
4.2.2. Adding worker nodes to installer-provisioned infrastructure clusters
For installer-provisioned infrastructure clusters, you can manually or automatically scale the MachineSet
object to match the number of available bare-metal hosts.
To add a bare-metal host, you must configure all network prerequisites, configure an associated baremetalhost
object, then provision the worker node to the cluster. You can add a bare-metal host manually or by using the web console.
4.2.3. Adding worker nodes to user-provisioned infrastructure clusters
For user-provisioned infrastructure clusters, you can add worker nodes by using a RHEL or RHCOS ISO image and connecting it to your cluster using cluster Ignition config files. For RHEL worker nodes, the following example uses Ansible playbooks to add worker nodes to the cluster. For RHCOS worker nodes, the following example uses an ISO image and network booting to add worker nodes to the cluster.
4.2.4. Adding worker nodes to clusters managed by the Assisted Installer
For clusters managed by the Assisted Installer, you can add worker nodes by using the Red Hat OpenShift Cluster Manager console, the Assisted Installer REST API or you can manually add worker nodes using an ISO image and cluster Ignition config files.
4.2.5. Adding worker nodes to clusters managed by the multicluster engine for Kubernetes
For clusters managed by the multicluster engine for Kubernetes, you can add worker nodes by using the dedicated multicluster engine console.
4.3. Adjust worker nodes
If you incorrectly sized the worker nodes during deployment, adjust them by creating one or more new compute machine sets, scale them up, then scale the original compute machine set down before removing them.
4.3.1. Understanding the difference between compute machine sets and the machine config pool
MachineSet
objects describe OpenShift Container Platform nodes with respect to the cloud or machine provider.
The MachineConfigPool
object allows MachineConfigController
components to define and provide the status of machines in the context of upgrades.
The MachineConfigPool
object allows users to configure how upgrades are rolled out to the OpenShift Container Platform nodes in the machine config pool.
The NodeSelector
object can be replaced with a reference to the MachineSet
object.
4.3.2. Scaling a compute machine set manually
To add or remove an instance of a machine in a compute machine set, you can manually scale the compute machine set.
This guidance is relevant to fully automated, installer-provisioned infrastructure installations. Customized, user-provisioned infrastructure installations do not have compute machine sets.
Prerequisites
-
Install an OpenShift Container Platform cluster and the
oc
command line. -
Log in to
oc
as a user withcluster-admin
permission.
Procedure
View the compute machine sets that are in the cluster by running the following command:
$ oc get machinesets.machine.openshift.io -n openshift-machine-api
The compute machine sets are listed in the form of
<clusterid>-worker-<aws-region-az>
.View the compute machines that are in the cluster by running the following command:
$ oc get machines.machine.openshift.io -n openshift-machine-api
Set the annotation on the compute machine that you want to delete by running the following command:
$ oc annotate machines.machine.openshift.io/<machine_name> -n openshift-machine-api machine.openshift.io/delete-machine="true"
Scale the compute machine set by running one of the following commands:
$ oc scale --replicas=2 machinesets.machine.openshift.io <machineset> -n openshift-machine-api
Or:
$ oc edit machinesets.machine.openshift.io <machineset> -n openshift-machine-api
TipYou can alternatively apply the following YAML to scale the compute machine set:
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: name: <machineset> namespace: openshift-machine-api spec: replicas: 2
You can scale the compute machine set up or down. It takes several minutes for the new machines to be available.
ImportantBy default, the machine controller tries to drain the node that is backed by the machine until it succeeds. In some situations, such as with a misconfigured pod disruption budget, the drain operation might not be able to succeed. If the drain operation fails, the machine controller cannot proceed removing the machine.
You can skip draining the node by annotating
machine.openshift.io/exclude-node-draining
in a specific machine.
Verification
Verify the deletion of the intended machine by running the following command:
$ oc get machines.machine.openshift.io
4.3.3. The compute machine set deletion policy
Random
, Newest
, and Oldest
are the three supported deletion options. The default is Random
, meaning that random machines are chosen and deleted when scaling compute machine sets down. The deletion policy can be set according to the use case by modifying the particular compute machine set:
spec: deletePolicy: <delete_policy> replicas: <desired_replica_count>
Specific machines can also be prioritized for deletion by adding the annotation machine.openshift.io/delete-machine=true
to the machine of interest, regardless of the deletion policy.
By default, the OpenShift Container Platform router pods are deployed on workers. Because the router is required to access some cluster resources, including the web console, do not scale the worker compute machine set to 0
unless you first relocate the router pods.
Custom compute machine sets can be used for use cases requiring that services run on specific nodes and that those services are ignored by the controller when the worker compute machine sets are scaling down. This prevents service disruption.
4.3.4. Creating default cluster-wide node selectors
You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.
With cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.
You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a compute machine set, or a machine config. Adding the label to the compute machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
You can add additional key/value pairs to a pod. But you cannot add a different value for a default key.
Procedure
To add a default cluster-wide node selector:
Edit the Scheduler Operator CR to add the default cluster-wide node selectors:
$ oc edit scheduler cluster
Example Scheduler Operator CR with a node selector
apiVersion: config.openshift.io/v1 kind: Scheduler metadata: name: cluster ... spec: defaultNodeSelector: type=user-node,region=east 1 mastersSchedulable: false
- 1
- Add a node selector with the appropriate
<key>:<value>
pairs.
After making this change, wait for the pods in the
openshift-kube-apiserver
project to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.Add labels to a node by using a compute machine set or editing the node directly:
Use a compute machine set to add labels to nodes managed by the compute machine set when a node is created:
Run the following command to add labels to a
MachineSet
object:$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api 1
- 1
- Add a
<key>/<value>
pair for each label.
For example:
$ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api
TipYou can alternatively apply the following YAML to add labels to a compute machine set:
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: name: <machineset> namespace: openshift-machine-api spec: template: spec: metadata: labels: region: "east" type: "user-node"
Verify that the labels are added to the
MachineSet
object by using theoc edit
command:For example:
$ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-api
Example
MachineSet
objectapiVersion: machine.openshift.io/v1beta1 kind: MachineSet ... spec: ... template: metadata: ... spec: metadata: labels: region: east type: user-node ...
Redeploy the nodes associated with that compute machine set by scaling down to
0
and scaling up the nodes:For example:
$ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
When the nodes are ready and available, verify that the label is added to the nodes by using the
oc get
command:$ oc get nodes -l <key>=<value>
For example:
$ oc get nodes -l type=user-node
Example output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.30.3
Add labels directly to a node:
Edit the
Node
object for the node:$ oc label nodes <name> <key>=<value>
For example, to label a node:
$ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=east
TipYou can alternatively apply the following YAML to add labels to a node:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: type: "user-node" region: "east"
Verify that the labels are added to the node using the
oc get
command:$ oc get nodes -l <key>=<value>,<key>=<value>
For example:
$ oc get nodes -l type=user-node,region=east
Example output
NAME STATUS ROLES AGE VERSION ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.30.3
4.4. Improving cluster stability in high latency environments using worker latency profiles
If the cluster administrator has performed latency tests for platform verification, they can discover the need to adjust the operation of the cluster to ensure stability in cases of high latency. The cluster administrator needs to change only one parameter, recorded in a file, which controls four parameters affecting how supervisory processes read status and interpret the health of the cluster. Changing only the one parameter provides cluster tuning in an easy, supportable manner.
The Kubelet
process provides the starting point for monitoring cluster health. The Kubelet
sets status values for all nodes in the OpenShift Container Platform cluster. The Kubernetes Controller Manager (kube controller
) reads the status values every 10 seconds, by default. If the kube controller
cannot read a node status value, it loses contact with that node after a configured period. The default behavior is:
-
The node controller on the control plane updates the node health to
Unhealthy
and marks the nodeReady
condition`Unknown`. - In response, the scheduler stops scheduling pods to that node.
-
The Node Lifecycle Controller adds a
node.kubernetes.io/unreachable
taint with aNoExecute
effect to the node and schedules any pods on the node for eviction after five minutes, by default.
This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager might not receive an update from a healthy node due to network latency. The Kubelet
evicts pods from the node even though the node is healthy.
To avoid this problem, you can use worker latency profiles to adjust the frequency that the Kubelet
and the Kubernetes Controller Manager wait for status updates before taking action. These adjustments help to ensure that your cluster runs properly if network latency between the control plane and the worker nodes is not optimal.
These worker latency profiles contain three sets of parameters that are predefined with carefully tuned values to control the reaction of the cluster to increased latency. There is no need to experimentally find the best values manually.
You can configure worker latency profiles when installing a cluster or at any time you notice increased latency in your cluster network.
4.4.1. Understanding worker latency profiles
Worker latency profiles are four different categories of carefully-tuned parameters. The four parameters which implement these values are node-status-update-frequency
, node-monitor-grace-period
, default-not-ready-toleration-seconds
and default-unreachable-toleration-seconds
. These parameters can use values which allow you to control the reaction of the cluster to latency issues without needing to determine the best values by using manual methods.
Setting these parameters manually is not supported. Incorrect parameter settings adversely affect cluster stability.
All worker latency profiles configure the following parameters:
- node-status-update-frequency
- Specifies how often the kubelet posts node status to the API server.
- node-monitor-grace-period
-
Specifies the amount of time in seconds that the Kubernetes Controller Manager waits for an update from a kubelet before marking the node unhealthy and adding the
node.kubernetes.io/not-ready
ornode.kubernetes.io/unreachable
taint to the node. - default-not-ready-toleration-seconds
- Specifies the amount of time in seconds after marking a node unhealthy that the Kube API Server Operator waits before evicting pods from that node.
- default-unreachable-toleration-seconds
- Specifies the amount of time in seconds after marking a node unreachable that the Kube API Server Operator waits before evicting pods from that node.
The following Operators monitor the changes to the worker latency profiles and respond accordingly:
-
The Machine Config Operator (MCO) updates the
node-status-update-frequency
parameter on the worker nodes. -
The Kubernetes Controller Manager updates the
node-monitor-grace-period
parameter on the control plane nodes. -
The Kubernetes API Server Operator updates the
default-not-ready-toleration-seconds
anddefault-unreachable-toleration-seconds
parameters on the control plane nodes.
Although the default configuration works in most cases, OpenShift Container Platform offers two other worker latency profiles for situations where the network is experiencing higher latency than usual. The three worker latency profiles are described in the following sections:
- Default worker latency profile
With the
Default
profile, eachKubelet
updates its status every 10 seconds (node-status-update-frequency
). TheKube Controller Manager
checks the statuses ofKubelet
every 5 seconds.The Kubernetes Controller Manager waits 40 seconds (
node-monitor-grace-period
) for a status update fromKubelet
before considering theKubelet
unhealthy. If no status is made available to the Kubernetes Controller Manager, it then marks the node with thenode.kubernetes.io/not-ready
ornode.kubernetes.io/unreachable
taint and evicts the pods on that node.If a pod is on a node that has the
NoExecute
taint, the pod runs according totolerationSeconds
. If the node has no taint, it will be evicted in 300 seconds (default-not-ready-toleration-seconds
anddefault-unreachable-toleration-seconds
settings of theKube API Server
).Profile Component Parameter Value Default
kubelet
node-status-update-frequency
10s
Kubelet Controller Manager
node-monitor-grace-period
40s
Kubernetes API Server Operator
default-not-ready-toleration-seconds
300s
Kubernetes API Server Operator
default-unreachable-toleration-seconds
300s
- Medium worker latency profile
Use the
MediumUpdateAverageReaction
profile if the network latency is slightly higher than usual.The
MediumUpdateAverageReaction
profile reduces the frequency of kubelet updates to 20 seconds and changes the period that the Kubernetes Controller Manager waits for those updates to 2 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has thetolerationSeconds
parameter, the eviction waits for the period specified by that parameter.The Kubernetes Controller Manager waits for 2 minutes to consider a node unhealthy. In another minute, the eviction process starts.
Profile Component Parameter Value MediumUpdateAverageReaction
kubelet
node-status-update-frequency
20s
Kubelet Controller Manager
node-monitor-grace-period
2m
Kubernetes API Server Operator
default-not-ready-toleration-seconds
60s
Kubernetes API Server Operator
default-unreachable-toleration-seconds
60s
- Low worker latency profile
Use the
LowUpdateSlowReaction
profile if the network latency is extremely high.The
LowUpdateSlowReaction
profile reduces the frequency of kubelet updates to 1 minute and changes the period that the Kubernetes Controller Manager waits for those updates to 5 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has thetolerationSeconds
parameter, the eviction waits for the period specified by that parameter.The Kubernetes Controller Manager waits for 5 minutes to consider a node unhealthy. In another minute, the eviction process starts.
Profile Component Parameter Value LowUpdateSlowReaction
kubelet
node-status-update-frequency
1m
Kubelet Controller Manager
node-monitor-grace-period
5m
Kubernetes API Server Operator
default-not-ready-toleration-seconds
60s
Kubernetes API Server Operator
default-unreachable-toleration-seconds
60s
4.4.2. Using and changing worker latency profiles
To change a worker latency profile to deal with network latency, edit the node.config
object to add the name of the profile. You can change the profile at any time as latency increases or decreases.
You must move one worker latency profile at a time. For example, you cannot move directly from the Default
profile to the LowUpdateSlowReaction
worker latency profile. You must move from the Default
worker latency profile to the MediumUpdateAverageReaction
profile first, then to LowUpdateSlowReaction
. Similarly, when returning to the Default
profile, you must move from the low profile to the medium profile first, then to Default
.
You can also configure worker latency profiles upon installing an OpenShift Container Platform cluster.
Procedure
To move from the default worker latency profile:
Move to the medium worker latency profile:
Edit the
node.config
object:$ oc edit nodes.config/cluster
Add
spec.workerLatencyProfile: MediumUpdateAverageReaction
:Example
node.config
objectapiVersion: config.openshift.io/v1 kind: Node metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2022-07-08T16:02:51Z" generation: 1 name: cluster ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: 36282574-bf9f-409e-a6cd-3032939293eb resourceVersion: "1865" uid: 0c0f7a4c-4307-4187-b591-6155695ac85b spec: workerLatencyProfile: MediumUpdateAverageReaction 1 # ...
- 1
- Specifies the medium worker latency policy.
Scheduling on each worker node is disabled as the change is being applied.
Optional: Move to the low worker latency profile:
Edit the
node.config
object:$ oc edit nodes.config/cluster
Change the
spec.workerLatencyProfile
value toLowUpdateSlowReaction
:Example
node.config
objectapiVersion: config.openshift.io/v1 kind: Node metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2022-07-08T16:02:51Z" generation: 1 name: cluster ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: 36282574-bf9f-409e-a6cd-3032939293eb resourceVersion: "1865" uid: 0c0f7a4c-4307-4187-b591-6155695ac85b spec: workerLatencyProfile: LowUpdateSlowReaction 1 # ...
- 1
- Specifies use of the low worker latency policy.
Scheduling on each worker node is disabled as the change is being applied.
Verification
When all nodes return to the
Ready
condition, you can use the following command to look in the Kubernetes Controller Manager to ensure it was applied:$ oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
Example output
# ... - lastTransitionTime: "2022-07-11T19:47:10Z" reason: ProfileUpdated status: "False" type: WorkerLatencyProfileProgressing - lastTransitionTime: "2022-07-11T19:47:10Z" 1 message: all static pod revision(s) have updated latency profile reason: ProfileUpdated status: "True" type: WorkerLatencyProfileComplete - lastTransitionTime: "2022-07-11T19:20:11Z" reason: AsExpected status: "False" type: WorkerLatencyProfileDegraded - lastTransitionTime: "2022-07-11T19:20:36Z" status: "False" # ...
- 1
- Specifies that the profile is applied and active.
To change the medium profile to default or change the default to medium, edit the node.config
object and set the spec.workerLatencyProfile
parameter to the appropriate value.
4.5. Managing control plane machines
Control plane machine sets provide management capabilities for control plane machines that are similar to what compute machine sets provide for compute machines. The availability and initial status of control plane machine sets on your cluster depend on your cloud provider and the version of OpenShift Container Platform that you installed. For more information, see Getting started with control plane machine sets.
4.6. Creating infrastructure machine sets for production environments
You can create a compute machine set to create machines that host only infrastructure components, such as the default router, the integrated container image registry, and components for cluster metrics and monitoring. These infrastructure machines are not counted toward the total number of subscriptions that are required to run the environment.
In a production deployment, it is recommended that you deploy at least three compute machine sets to hold infrastructure components. Both OpenShift Logging and Red Hat OpenShift Service Mesh deploy Elasticsearch, which requires three instances to be installed on different nodes. Each of these nodes can be deployed to different availability zones for high availability. A configuration like this requires three different compute machine sets, one for each availability zone. In global Azure regions that do not have multiple availability zones, you can use availability sets to ensure high availability.
For information on infrastructure nodes and which components can run on infrastructure nodes, see Creating infrastructure machine sets.
To create an infrastructure node, you can use a machine set, assign a label to the nodes, or use a machine config pool.
For sample machine sets that you can use with these procedures, see Creating machine sets for different clouds.
Applying a specific node selector to all infrastructure components causes OpenShift Container Platform to schedule those workloads on nodes with that label.
4.6.1. Creating a compute machine set
In addition to the compute machine sets created by the installation program, you can create your own to dynamically manage the machine compute resources for specific workloads of your choice.
Prerequisites
- Deploy an OpenShift Container Platform cluster.
-
Install the OpenShift CLI (
oc
). -
Log in to
oc
as a user withcluster-admin
permission.
Procedure
Create a new YAML file that contains the compute machine set custom resource (CR) sample and is named
<file_name>.yaml
.Ensure that you set the
<clusterID>
and<role>
parameter values.Optional: If you are not sure which value to set for a specific field, you can check an existing compute machine set from your cluster.
To list the compute machine sets in your cluster, run the following command:
$ oc get machinesets -n openshift-machine-api
Example output
NAME DESIRED CURRENT READY AVAILABLE AGE agl030519-vplxk-worker-us-east-1a 1 1 1 1 55m agl030519-vplxk-worker-us-east-1b 1 1 1 1 55m agl030519-vplxk-worker-us-east-1c 1 1 1 1 55m agl030519-vplxk-worker-us-east-1d 0 0 55m agl030519-vplxk-worker-us-east-1e 0 0 55m agl030519-vplxk-worker-us-east-1f 0 0 55m
To view values of a specific compute machine set custom resource (CR), run the following command:
$ oc get machineset <machineset_name> \ -n openshift-machine-api -o yaml
Example output
apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> 1 name: <infrastructure_id>-<role> 2 namespace: openshift-machine-api spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role> template: metadata: labels: machine.openshift.io/cluster-api-cluster: <infrastructure_id> machine.openshift.io/cluster-api-machine-role: <role> machine.openshift.io/cluster-api-machine-type: <role> machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role> spec: providerSpec: 3 ...
- 1
- The cluster infrastructure ID.
- 2
- A default node label.Note
For clusters that have user-provisioned infrastructure, a compute machine set can only create
worker
andinfra
type machines. - 3
- The values in the
<providerSpec>
section of the compute machine set CR are platform-specific. For more information about<providerSpec>
parameters in the CR, see the sample compute machine set CR configuration for your provider.
Create a
MachineSet
CR by running the following command:$ oc create -f <file_name>.yaml
Verification
View the list of compute machine sets by running the following command:
$ oc get machineset -n openshift-machine-api
Example output
NAME DESIRED CURRENT READY AVAILABLE AGE agl030519-vplxk-infra-us-east-1a 1 1 1 1 11m agl030519-vplxk-worker-us-east-1a 1 1 1 1 55m agl030519-vplxk-worker-us-east-1b 1 1 1 1 55m agl030519-vplxk-worker-us-east-1c 1 1 1 1 55m agl030519-vplxk-worker-us-east-1d 0 0 55m agl030519-vplxk-worker-us-east-1e 0 0 55m agl030519-vplxk-worker-us-east-1f 0 0 55m
When the new compute machine set is available, the
DESIRED
andCURRENT
values match. If the compute machine set is not available, wait a few minutes and run the command again.
4.6.2. Creating an infrastructure node
See Creating infrastructure machine sets for installer-provisioned infrastructure environments or for any cluster where the control plane nodes are managed by the machine API.
Requirements of the cluster dictate that infrastructure, also called infra
nodes, be provisioned. The installer only provides provisions for control plane and worker nodes. Worker nodes can be designated as infrastructure nodes or application, also called app
, nodes through labeling.
Procedure
Add a label to the worker node that you want to act as application node:
$ oc label node <node-name> node-role.kubernetes.io/app=""
Add a label to the worker nodes that you want to act as infrastructure nodes:
$ oc label node <node-name> node-role.kubernetes.io/infra=""
Check to see if applicable nodes now have the
infra
role andapp
roles:$ oc get nodes
Create a default cluster-wide node selector. The default node selector is applied to pods created in all namespaces. This creates an intersection with any existing node selectors on a pod, which additionally constrains the pod’s selector.
ImportantIf the default node selector key conflicts with the key of a pod’s label, then the default node selector is not applied.
However, do not set a default node selector that might cause a pod to become unschedulable. For example, setting the default node selector to a specific node role, such as
node-role.kubernetes.io/infra=""
, when a pod’s label is set to a different node role, such asnode-role.kubernetes.io/master=""
, can cause the pod to become unschedulable. For this reason, use caution when setting the default node selector to specific node roles.You can alternatively use a project node selector to avoid cluster-wide node selector key conflicts.
Edit the
Scheduler
object:$ oc edit scheduler cluster
Add the
defaultNodeSelector
field with the appropriate node selector:apiVersion: config.openshift.io/v1 kind: Scheduler metadata: name: cluster spec: defaultNodeSelector: node-role.kubernetes.io/infra="" 1 # ...
- 1
- This example node selector deploys pods on infrastructure nodes by default.
- Save the file to apply the changes.
You can now move infrastructure resources to the newly labeled infra
nodes.
Additional resources
- For information on how to configure project node selectors to avoid cluster-wide node selector key conflicts, see Project node selectors.
4.6.3. Creating a machine config pool for infrastructure machines
If you need infrastructure machines to have dedicated configurations, you must create an infra pool.
Creating a custom machine configuration pool overrides default worker pool configurations if they refer to the same file or unit.
Procedure
Add a label to the node you want to assign as the infra node with a specific label:
$ oc label node <node_name> <label>
$ oc label node ci-ln-n8mqwr2-f76d1-xscn2-worker-c-6fmtx node-role.kubernetes.io/infra=
Create a machine config pool that contains both the worker role and your custom role as machine config selector:
$ cat infra.mcp.yaml
Example output
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} 1 nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" 2
NoteCustom machine config pools inherit machine configs from the worker pool. Custom pools use any machine config targeted for the worker pool, but add the ability to also deploy changes that are targeted at only the custom pool. Because a custom pool inherits resources from the worker pool, any change to the worker pool also affects the custom pool.
After you have the YAML file, you can create the machine config pool:
$ oc create -f infra.mcp.yaml
Check the machine configs to ensure that the infrastructure configuration rendered successfully:
$ oc get machineconfig
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED 00-master 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 00-worker 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 01-master-container-runtime 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 01-master-kubelet 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 01-worker-container-runtime 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 01-worker-kubelet 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 99-master-1ae2a1e0-a115-11e9-8f14-005056899d54-registries 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 99-master-ssh 3.2.0 31d 99-worker-1ae64748-a115-11e9-8f14-005056899d54-registries 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 31d 99-worker-ssh 3.2.0 31d rendered-infra-4e48906dca84ee702959c71a53ee80e7 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 23m rendered-master-072d4b2da7f88162636902b074e9e28e 5b6fb8349a29735e48446d435962dec4547d3090 3.2.0 31d rendered-master-3e88ec72aed3886dec061df60d16d1af 02c07496ba0417b3e12b78fb32baf6293d314f79 3.2.0 31d rendered-master-419bee7de96134963a15fdf9dd473b25 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 17d rendered-master-53f5c91c7661708adce18739cc0f40fb 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 13d rendered-master-a6a357ec18e5bce7f5ac426fc7c5ffcd 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 7d3h rendered-master-dc7f874ec77fc4b969674204332da037 5b6fb8349a29735e48446d435962dec4547d3090 3.2.0 31d rendered-worker-1a75960c52ad18ff5dfa6674eb7e533d 5b6fb8349a29735e48446d435962dec4547d3090 3.2.0 31d rendered-worker-2640531be11ba43c61d72e82dc634ce6 5b6fb8349a29735e48446d435962dec4547d3090 3.2.0 31d rendered-worker-4e48906dca84ee702959c71a53ee80e7 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 7d3h rendered-worker-4f110718fe88e5f349987854a1147755 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 17d rendered-worker-afc758e194d6188677eb837842d3b379 02c07496ba0417b3e12b78fb32baf6293d314f79 3.2.0 31d rendered-worker-daa08cc1e8f5fcdeba24de60cd955cc3 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 3.2.0 13d
You should see a new machine config, with the
rendered-infra-*
prefix.Optional: To deploy changes to a custom pool, create a machine config that uses the custom pool name as the label, such as
infra
. Note that this is not required and only shown for instructional purposes. In this manner, you can apply any custom configurations specific to only your infra nodes.NoteAfter you create the new machine config pool, the MCO generates a new rendered config for that pool, and associated nodes of that pool reboot to apply the new configuration.
Create a machine config:
$ cat infra.mc.yaml
Example output
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: name: 51-infra labels: machineconfiguration.openshift.io/role: infra 1 spec: config: ignition: version: 3.2.0 storage: files: - path: /etc/infratest mode: 0644 contents: source: data:,infra
- 1
- Add the label you added to the node as a
nodeSelector
.
Apply the machine config to the infra-labeled nodes:
$ oc create -f infra.mc.yaml
Confirm that your new machine config pool is available:
$ oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE infra rendered-infra-60e35c2e99f42d976e084fa94da4d0fc True False False 1 1 1 0 4m20s master rendered-master-9360fdb895d4c131c7c4bebbae099c90 True False False 3 3 3 0 91m worker rendered-worker-60e35c2e99f42d976e084fa94da4d0fc True False False 2 2 2 0 91m
In this example, a worker node was changed to an infra node.
Additional resources
- See Node configuration management with machine config pools for more information on grouping infra machines in a custom pool.
4.7. Assigning machine set resources to infrastructure nodes
After creating an infrastructure machine set, the worker
and infra
roles are applied to new infra nodes. Nodes with the infra
role are not counted toward the total number of subscriptions that are required to run the environment, even when the worker
role is also applied.
However, when an infra node is assigned the worker role, there is a chance that user workloads can get assigned inadvertently to the infra node. To avoid this, you can apply a taint to the infra node and tolerations for the pods that you want to control.
4.7.1. Binding infrastructure node workloads using taints and tolerations
If you have an infra node that has the infra
and worker
roles assigned, you must configure the node so that user workloads are not assigned to it.
It is recommended that you preserve the dual infra,worker
label that is created for infra nodes and use taints and tolerations to manage nodes that user workloads are scheduled on. If you remove the worker
label from the node, you must create a custom pool to manage it. A node with a label other than master
or worker
is not recognized by the MCO without a custom pool. Maintaining the worker
label allows the node to be managed by the default worker machine config pool, if no custom pools that select the custom label exists. The infra
label communicates to the cluster that it does not count toward the total number of subscriptions.
Prerequisites
-
Configure additional
MachineSet
objects in your OpenShift Container Platform cluster.
Procedure
Add a taint to the infra node to prevent scheduling user workloads on it:
Determine if the node has the taint:
$ oc describe nodes <node_name>
Sample output
oc describe node ci-ln-iyhx092-f76d1-nvdfm-worker-b-wln2l Name: ci-ln-iyhx092-f76d1-nvdfm-worker-b-wln2l Roles: worker ... Taints: node-role.kubernetes.io/infra:NoSchedule ...
This example shows that the node has a taint. You can proceed with adding a toleration to your pod in the next step.
If you have not configured a taint to prevent scheduling user workloads on it:
$ oc adm taint nodes <node_name> <key>=<value>:<effect>
For example:
$ oc adm taint nodes node1 node-role.kubernetes.io/infra=reserved:NoSchedule
TipYou can alternatively apply the following YAML to add the taint:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: ... spec: taints: - key: node-role.kubernetes.io/infra effect: NoSchedule value: reserved ...
This example places a taint on
node1
that has keynode-role.kubernetes.io/infra
and taint effectNoSchedule
. Nodes with theNoSchedule
effect schedule only pods that tolerate the taint, but allow existing pods to remain scheduled on the node.NoteIf a descheduler is used, pods violating node taints could be evicted from the cluster.
Add the taint with NoExecute Effect along with the above taint with NoSchedule Effect:
$ oc adm taint nodes <node_name> <key>=<value>:<effect>
For example:
$ oc adm taint nodes node1 node-role.kubernetes.io/infra=reserved:NoExecute
TipYou can alternatively apply the following YAML to add the taint:
kind: Node apiVersion: v1 metadata: name: <node_name> labels: ... spec: taints: - key: node-role.kubernetes.io/infra effect: NoExecute value: reserved ...
This example places a taint on
node1
that has the keynode-role.kubernetes.io/infra
and taint effectNoExecute
. Nodes with theNoExecute
effect schedule only pods that tolerate the taint. The effect will remove any existing pods from the node that do not have a matching toleration.
Add tolerations for the pod configurations you want to schedule on the infra node, like router, registry, and monitoring workloads. Add the following code to the
Pod
object specification:tolerations: - effect: NoSchedule 1 key: node-role.kubernetes.io/infra 2 value: reserved 3 - effect: NoExecute 4 key: node-role.kubernetes.io/infra 5 operator: Exists 6 value: reserved 7
- 1
- Specify the effect that you added to the node.
- 2
- Specify the key that you added to the node.
- 3
- Specify the value of the key-value pair taint that you added to the node.
- 4
- Specify the effect that you added to the node.
- 5
- Specify the key that you added to the node.
- 6
- Specify the
Exists
Operator to require a taint with the keynode-role.kubernetes.io/infra
to be present on the node. - 7
- Specify the value of the key-value pair taint that you added to the node.
This toleration matches the taint created by the
oc adm taint
command. A pod with this toleration can be scheduled onto the infra node.NoteMoving pods for an Operator installed via OLM to an infra node is not always possible. The capability to move Operator pods depends on the configuration of each Operator.
- Schedule the pod to the infra node using a scheduler. See the documentation for Controlling pod placement onto nodes for details.
Additional resources
- See Controlling pod placement using the scheduler for general information on scheduling a pod to a node.
4.8. Moving resources to infrastructure machine sets
Some of the infrastructure resources are deployed in your cluster by default. You can move them to the infrastructure machine sets that you created.
4.8.1. Moving the router
You can deploy the router pod to a different compute machine set. By default, the pod is deployed to a worker node.
Prerequisites
- Configure additional compute machine sets in your OpenShift Container Platform cluster.
Procedure
View the
IngressController
custom resource for the router Operator:$ oc get ingresscontroller default -n openshift-ingress-operator -o yaml
The command output resembles the following text:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: creationTimestamp: 2019-04-18T12:35:39Z finalizers: - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller generation: 1 name: default namespace: openshift-ingress-operator resourceVersion: "11341" selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/default uid: 79509e05-61d6-11e9-bc55-02ce4781844a spec: {} status: availableReplicas: 2 conditions: - lastTransitionTime: 2019-04-18T12:36:15Z status: "True" type: Available domain: apps.<cluster>.example.com endpointPublishingStrategy: type: LoadBalancerService selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
Edit the
ingresscontroller
resource and change thenodeSelector
to use theinfra
label:$ oc edit ingresscontroller default -n openshift-ingress-operator
spec: nodePlacement: nodeSelector: 1 matchLabels: node-role.kubernetes.io/infra: "" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra value: reserved - effect: NoExecute key: node-role.kubernetes.io/infra value: reserved
- 1
- Add a
nodeSelector
parameter with the appropriate value to the component you want to move. You can use anodeSelector
in the format shown or use<key>: <value>
pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
Confirm that the router pod is running on the
infra
node.View the list of router pods and note the node name of the running pod:
$ oc get pod -n openshift-ingress -o wide
Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-86798b4b5d-bdlvd 1/1 Running 0 28s 10.130.2.4 ip-10-0-217-226.ec2.internal <none> <none> router-default-955d875f4-255g8 0/1 Terminating 0 19h 10.129.2.4 ip-10-0-148-172.ec2.internal <none> <none>
In this example, the running pod is on the
ip-10-0-217-226.ec2.internal
node.View the node status of the running pod:
$ oc get node <node_name> 1
- 1
- Specify the
<node_name>
that you obtained from the pod list.
Example output
NAME STATUS ROLES AGE VERSION ip-10-0-217-226.ec2.internal Ready infra,worker 17h v1.30.3
Because the role list includes
infra
, the pod is running on the correct node.
4.8.2. Moving the default registry
You configure the registry Operator to deploy its pods to different nodes.
Prerequisites
- Configure additional compute machine sets in your OpenShift Container Platform cluster.
Procedure
View the
config/instance
object:$ oc get configs.imageregistry.operator.openshift.io/cluster -o yaml
Example output
apiVersion: imageregistry.operator.openshift.io/v1 kind: Config metadata: creationTimestamp: 2019-02-05T13:52:05Z finalizers: - imageregistry.operator.openshift.io/finalizer generation: 1 name: cluster resourceVersion: "56174" selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/cluster uid: 36fd3724-294d-11e9-a524-12ffeee2931b spec: httpSecret: d9a012ccd117b1e6616ceccb2c3bb66a5fed1b5e481623 logging: 2 managementState: Managed proxy: {} replicas: 1 requests: read: {} write: {} storage: s3: bucket: image-registry-us-east-1-c92e88cad85b48ec8b312344dff03c82-392c region: us-east-1 status: ...
Edit the
config/instance
object:$ oc edit configs.imageregistry.operator.openshift.io/cluster
spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: namespaces: - openshift-image-registry topologyKey: kubernetes.io/hostname weight: 100 logLevel: Normal managementState: Managed nodeSelector: 1 node-role.kubernetes.io/infra: "" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra value: reserved - effect: NoExecute key: node-role.kubernetes.io/infra value: reserved
- 1
- Add a
nodeSelector
parameter with the appropriate value to the component you want to move. You can use anodeSelector
in the format shown or use<key>: <value>
pairs, based on the value specified for the node. If you added a taint to the infrasructure node, also add a matching toleration.
Verify the registry pod has been moved to the infrastructure node.
Run the following command to identify the node where the registry pod is located:
$ oc get pods -o wide -n openshift-image-registry
Confirm the node has the label you specified:
$ oc describe node <node_name>
Review the command output and confirm that
node-role.kubernetes.io/infra
is in theLABELS
list.
4.8.3. Moving the monitoring solution
The monitoring stack includes multiple components, including Prometheus, Thanos Querier, and Alertmanager. The Cluster Monitoring Operator manages this stack. To redeploy the monitoring stack to infrastructure nodes, you can create and apply a custom config map.
Procedure
Edit the
cluster-monitoring-config
config map and change thenodeSelector
to use theinfra
label:$ oc edit configmap cluster-monitoring-config -n openshift-monitoring
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |+ alertmanagerMain: nodeSelector: 1 node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute prometheusK8s: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute prometheusOperator: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute metricsServer: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute kubeStateMetrics: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute telemeterClient: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute openshiftStateMetrics: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute thanosQuerier: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute monitoringPlugin: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node-role.kubernetes.io/infra value: reserved effect: NoSchedule - key: node-role.kubernetes.io/infra value: reserved effect: NoExecute
- 1
- Add a
nodeSelector
parameter with the appropriate value to the component you want to move. You can use anodeSelector
in the format shown or use<key>: <value>
pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
Watch the monitoring pods move to the new machines:
$ watch 'oc get pod -n openshift-monitoring -o wide'
If a component has not moved to the
infra
node, delete the pod with this component:$ oc delete pod -n openshift-monitoring <pod>
The component from the deleted pod is re-created on the
infra
node.
4.9. Applying autoscaling to your cluster
Applying autoscaling to an OpenShift Container Platform cluster involves deploying a cluster autoscaler and then deploying machine autoscalers for each machine type in your cluster.
For more information, see Applying autoscaling to an OpenShift Container Platform cluster.
4.10. Configuring Linux cgroup
As of OpenShift Container Platform 4.14, OpenShift Container Platform uses Linux control group version 2 (cgroup v2) in your cluster. If you are using cgroup v1 on OpenShift Container Platform 4.13 or earlier, migrating to OpenShift Container Platform 4.14 or later will not automatically update your cgroup configuration to version 2. A fresh installation of OpenShift Container Platform 4.14 or later will use cgroup v2 by default. However, you can enable Linux control group version 1 (cgroup v1) upon installation.
cgroup v2 is the current version of the Linux cgroup API. cgroup v2 offers several improvements over cgroup v1, including a unified hierarchy, safer sub-tree delegation, new features such as Pressure Stall Information, and enhanced resource management and isolation. However, cgroup v2 has different CPU, memory, and I/O management characteristics than cgroup v1. Therefore, some workloads might experience slight differences in memory or CPU usage on clusters that run cgroup v2.
You can change between cgroup v1 and cgroup v2, as needed. Enabling cgroup v1 in OpenShift Container Platform disables all cgroup v2 controllers and hierarchies in your cluster.
cgroup v1 is a deprecated feature. Deprecated functionality is still included in OpenShift Container Platform and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.
For the most recent list of major functionality that has been deprecated or removed within OpenShift Container Platform, refer to the Deprecated and removed features section of the OpenShift Container Platform release notes.
Prerequisites
- You have a running OpenShift Container Platform cluster that uses version 4.12 or later.
- You are logged in to the cluster as a user with administrative privileges.
Procedure
Enable cgroup v1 on nodes:
Edit the
node.config
object:$ oc edit nodes.config/cluster
Add
spec.cgroupMode: "v1"
:Example
node.config
objectapiVersion: config.openshift.io/v2 kind: Node metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2022-07-08T16:02:51Z" generation: 1 name: cluster ownerReferences: - apiVersion: config.openshift.io/v2 kind: ClusterVersion name: version uid: 36282574-bf9f-409e-a6cd-3032939293eb resourceVersion: "1865" uid: 0c0f7a4c-4307-4187-b591-6155695ac85b spec: cgroupMode: "v1" 1 ...
- 1
- Enables cgroup v1.
Verification
Check the machine configs to see that the new machine configs were added:
$ oc get mc
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 97-master-generated-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 99-worker-generated-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 99-master-ssh 3.2.0 40m 99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m 99-worker-ssh 3.2.0 40m rendered-master-23d4317815a5f854bd3553d689cfe2e9 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 10s 1 rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m rendered-worker-dcc7f1b92892d34db74d6832bcc9ccd4 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 10s
- 1
- New machine configs are created, as expected.
Check that the new
kernelArguments
were added to the new machine configs:$ oc describe mc <name>
Example output for cgroup v1
apiVersion: machineconfiguration.openshift.io/v2 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 05-worker-kernelarg-selinuxpermissive spec: kernelArguments: systemd.unified_cgroup_hierarchy=0 1 systemd.legacy_systemd_cgroup_controller=1 2
Check the nodes to see that scheduling on the nodes is disabled. This indicates that the change is being applied:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION ci-ln-fm1qnwt-72292-99kt6-master-0 Ready,SchedulingDisabled master 58m v1.30.3 ci-ln-fm1qnwt-72292-99kt6-master-1 Ready master 58m v1.30.3 ci-ln-fm1qnwt-72292-99kt6-master-2 Ready master 58m v1.30.3 ci-ln-fm1qnwt-72292-99kt6-worker-a-h5gt4 Ready,SchedulingDisabled worker 48m v1.30.3 ci-ln-fm1qnwt-72292-99kt6-worker-b-7vtmd Ready worker 48m v1.30.3 ci-ln-fm1qnwt-72292-99kt6-worker-c-rhzkv Ready worker 48m v1.30.3
After a node returns to the
Ready
state, start a debug session for that node:$ oc debug node/<node_name>
Set
/host
as the root directory within the debug shell:sh-4.4# chroot /host
Check that the
sys/fs/cgroup/cgroup2fs
file is present on your nodes. This file is created by cgroup v1:$ stat -c %T -f /sys/fs/cgroup
Example output
cgroup2fs
Additional resources
4.11. Enabling Technology Preview features using FeatureGates
You can turn on a subset of the current Technology Preview features on for all nodes in the cluster by editing the FeatureGate
custom resource (CR).
4.11.1. Understanding feature gates
You can use the FeatureGate
custom resource (CR) to enable specific feature sets in your cluster. A feature set is a collection of OpenShift Container Platform features that are not enabled by default.
You can activate the following feature set by using the FeatureGate
CR:
TechPreviewNoUpgrade
. This feature set is a subset of the current Technology Preview features. This feature set allows you to enable these Technology Preview features on test clusters, where you can fully test them, while leaving the features disabled on production clusters.WarningEnabling the
TechPreviewNoUpgrade
feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.The following Technology Preview features are enabled by this feature set:
-
External cloud providers. Enables support for external cloud providers for clusters on vSphere, AWS, Azure, and GCP. Support for OpenStack is GA. This is an internal feature that most users do not need to interact with. (
ExternalCloudProvider
) -
Swap memory on nodes. Enables swap memory use for OpenShift Container Platform workloads on a per-node basis. (
NodeSwap
) -
OpenStack Machine API Provider. This gate has no effect and is planned to be removed from this feature set in a future release. (
MachineAPIProviderOpenStack
) -
Insights Operator. Enables the
InsightsDataGather
CRD, which allows users to configure some Insights data gathering options. The feature set also enables theDataGather
CRD, which allows users to run Insights data gathering on-demand. (InsightsConfigAPI
) -
Dynamic Resource Allocation API. Enables a new API for requesting and sharing resources between pods and containers. This is an internal feature that most users do not need to interact with. (
DynamicResourceAllocation
) -
Pod security admission enforcement. Enables the restricted enforcement mode for pod security admission. Instead of only logging a warning, pods are rejected if they violate pod security standards. (
OpenShiftPodSecurityAdmission
) -
StatefulSet pod availability upgrading limits. Enables users to define the maximum number of statefulset pods unavailable during updates which reduces application downtime. (
MaxUnavailableStatefulSet
) -
gcpLabelsTags
-
vSphereStaticIPs
-
routeExternalCertificate
-
automatedEtcdBackup
-
gcpClusterHostedDNS
-
vSphereControlPlaneMachineset
-
dnsNameResolver
-
machineConfigNodes
-
metricsServer
-
installAlternateInfrastructureAWS
-
mixedCPUsAllocation
-
managedBootImages
-
onClusterBuild
-
signatureStores
-
SigstoreImageVerification
-
DisableKubeletCloudCredentialProviders
-
BareMetalLoadBalancer
-
ClusterAPIInstallAWS
-
ClusterAPIInstallAzure
-
ClusterAPIInstallNutanix
-
ClusterAPIInstallOpenStack
-
ClusterAPIInstallVSphere
-
HardwareSpeed
-
KMSv1
-
NetworkDiagnosticsConfig
-
VSphereDriverConfiguration
-
ExternalOIDC
-
ChunkSizeMiB
-
ClusterAPIInstallGCP
-
ClusterAPIInstallPowerVS
-
EtcdBackendQuota
-
InsightsConfig
-
InsightsOnDemandDataGather
-
MetricsCollectionProfiles
-
NewOLM
-
NodeDisruptionPolicy
-
PinnedImages
-
PlatformOperators
-
ServiceAccountTokenNodeBinding
-
TranslateStreamCloseWebsocketRequests
-
UpgradeStatus
-
VSphereMultiVCenters
-
VolumeGroupSnapshot
-
AdditionalRoutingCapabilities
-
BootcNodeManagement
-
ClusterMonitoringConfig
-
DNSNameResolver
-
ManagedBootImagesAWS
-
NetworkSegmentation
-
OVNObservability
-
PersistentIPsForVirtualization
-
ProcMountType
-
RouteAdvertisements
-
UserNamespacesSupport
-
AWSEFSDriverVolumeMetrics
-
AlibabaPlatform
-
AzureWorkloadIdentity
-
BuildCSIVolumes
-
CloudDualStackNodeIPs
-
ExternalCloudProviderAzure
-
ExternalCloudProviderExternal
-
ExternalCloudProviderGCP
-
IngressControllerLBSubnetsAWS
-
MultiArchInstallAWS
-
MultiArchInstallGCP
-
NetworkLiveMigration
-
PrivateHostedZoneAWS
-
SetEIPForNLBIngressController
-
ValidatingAdmissionPolicy
-
External cloud providers. Enables support for external cloud providers for clusters on vSphere, AWS, Azure, and GCP. Support for OpenStack is GA. This is an internal feature that most users do not need to interact with. (
4.11.2. Enabling feature sets using the web console
You can use the OpenShift Container Platform web console to enable feature sets for all of the nodes in a cluster by editing the FeatureGate
custom resource (CR).
Procedure
To enable feature sets:
- In the OpenShift Container Platform web console, switch to the Administration → Custom Resource Definitions page.
- On the Custom Resource Definitions page, click FeatureGate.
- On the Custom Resource Definition Details page, click the Instances tab.
- Click the cluster feature gate, then click the YAML tab.
Edit the cluster instance to add specific feature sets:
WarningEnabling the
TechPreviewNoUpgrade
feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.Sample Feature Gate custom resource
apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: name: cluster 1 # ... spec: featureSet: TechPreviewNoUpgrade 2
After you save the changes, new machine configs are created, the machine config pools are updated, and scheduling on each node is disabled while the change is being applied.
Verification
You can verify that the feature gates are enabled by looking at the kubelet.conf
file on a node after the nodes return to the ready state.
- From the Administrator perspective in the web console, navigate to Compute → Nodes.
- Select a node.
- In the Node details page, click Terminal.
In the terminal window, change your root directory to
/host
:sh-4.2# chroot /host
View the
kubelet.conf
file:sh-4.2# cat /etc/kubernetes/kubelet.conf
Sample output
# ... featureGates: InsightsOperatorPullingSCA: true, LegacyNodeRoleBehavior: false # ...
The features that are listed as
true
are enabled on your cluster.NoteThe features listed vary depending upon the OpenShift Container Platform version.
4.11.3. Enabling feature sets using the CLI
You can use the OpenShift CLI (oc
) to enable feature sets for all of the nodes in a cluster by editing the FeatureGate
custom resource (CR).
Prerequisites
-
You have installed the OpenShift CLI (
oc
).
Procedure
To enable feature sets:
Edit the
FeatureGate
CR namedcluster
:$ oc edit featuregate cluster
WarningEnabling the
TechPreviewNoUpgrade
feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.Sample FeatureGate custom resource
apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: name: cluster 1 # ... spec: featureSet: TechPreviewNoUpgrade 2
After you save the changes, new machine configs are created, the machine config pools are updated, and scheduling on each node is disabled while the change is being applied.
Verification
You can verify that the feature gates are enabled by looking at the kubelet.conf
file on a node after the nodes return to the ready state.
- From the Administrator perspective in the web console, navigate to Compute → Nodes.
- Select a node.
- In the Node details page, click Terminal.
In the terminal window, change your root directory to
/host
:sh-4.2# chroot /host
View the
kubelet.conf
file:sh-4.2# cat /etc/kubernetes/kubelet.conf
Sample output
# ... featureGates: InsightsOperatorPullingSCA: true, LegacyNodeRoleBehavior: false # ...
The features that are listed as
true
are enabled on your cluster.NoteThe features listed vary depending upon the OpenShift Container Platform version.
4.12. etcd tasks
Back up etcd, enable or disable etcd encryption, or defragment etcd data.
If you deployed a bare-metal cluster, you can scale the cluster up to 5 nodes as part of your post-installation tasks. For more information, see Node scaling for etcd.
4.12.1. About etcd encryption
By default, etcd data is not encrypted in OpenShift Container Platform. You can enable etcd encryption for your cluster to provide an additional layer of data security. For example, it can help protect the loss of sensitive data if an etcd backup is exposed to the incorrect parties.
When you enable etcd encryption, the following OpenShift API server and Kubernetes API server resources are encrypted:
- Secrets
- Config maps
- Routes
- OAuth access tokens
- OAuth authorize tokens
When you enable etcd encryption, encryption keys are created. You must have these keys to restore from an etcd backup.
Etcd encryption only encrypts values, not keys. Resource types, namespaces, and object names are unencrypted.
If etcd encryption is enabled during a backup, the static_kuberesources_<datetimestamp>.tar.gz
file contains the encryption keys for the etcd snapshot. For security reasons, store this file separately from the etcd snapshot. However, this file is required to restore a previous state of etcd from the respective etcd snapshot.
4.12.2. Supported encryption types
The following encryption types are supported for encrypting etcd data in OpenShift Container Platform:
- AES-CBC
- Uses AES-CBC with PKCS#7 padding and a 32 byte key to perform the encryption. The encryption keys are rotated weekly.
- AES-GCM
- Uses AES-GCM with a random nonce and a 32 byte key to perform the encryption. The encryption keys are rotated weekly.
4.12.3. Enabling etcd encryption
You can enable etcd encryption to encrypt sensitive resources in your cluster.
Do not back up etcd resources until the initial encryption process is completed. If the encryption process is not completed, the backup might be only partially encrypted.
After you enable etcd encryption, several changes can occur:
- The etcd encryption might affect the memory consumption of a few resources.
- You might notice a transient affect on backup performance because the leader must serve the backup.
- A disk I/O can affect the node that receives the backup state.
You can encrypt the etcd database in either AES-GCM or AES-CBC encryption.
To migrate your etcd database from one encryption type to the other, you can modify the API server’s spec.encryption.type
field. Migration of the etcd data to the new encryption type occurs automatically.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Modify the
APIServer
object:$ oc edit apiserver
Set the
spec.encryption.type
field toaesgcm
oraescbc
:spec: encryption: type: aesgcm 1
- 1
- Set to
aesgcm
for AES-GCM encryption oraescbc
for AES-CBC encryption.
Save the file to apply the changes.
The encryption process starts. It can take 20 minutes or longer for this process to complete, depending on the size of the etcd database.
Verify that etcd encryption was successful.
Review the
Encrypted
status condition for the OpenShift API server to verify that its resources were successfully encrypted:$ oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows
EncryptionCompleted
upon successful encryption:EncryptionCompleted All resources encrypted: routes.route.openshift.io
If the output shows
EncryptionInProgress
, encryption is still in progress. Wait a few minutes and try again.Review the
Encrypted
status condition for the Kubernetes API server to verify that its resources were successfully encrypted:$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows
EncryptionCompleted
upon successful encryption:EncryptionCompleted All resources encrypted: secrets, configmaps
If the output shows
EncryptionInProgress
, encryption is still in progress. Wait a few minutes and try again.Review the
Encrypted
status condition for the OpenShift OAuth API server to verify that its resources were successfully encrypted:$ oc get authentication.operator.openshift.io -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows
EncryptionCompleted
upon successful encryption:EncryptionCompleted All resources encrypted: oauthaccesstokens.oauth.openshift.io, oauthauthorizetokens.oauth.openshift.io
If the output shows
EncryptionInProgress
, encryption is still in progress. Wait a few minutes and try again.
4.12.4. Disabling etcd encryption
You can disable encryption of etcd data in your cluster.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Modify the
APIServer
object:$ oc edit apiserver
Set the
encryption
field type toidentity
:spec: encryption: type: identity 1
- 1
- The
identity
type is the default value and means that no encryption is performed.
Save the file to apply the changes.
The decryption process starts. It can take 20 minutes or longer for this process to complete, depending on the size of your cluster.
Verify that etcd decryption was successful.
Review the
Encrypted
status condition for the OpenShift API server to verify that its resources were successfully decrypted:$ oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows
DecryptionCompleted
upon successful decryption:DecryptionCompleted Encryption mode set to identity and everything is decrypted
If the output shows
DecryptionInProgress
, decryption is still in progress. Wait a few minutes and try again.Review the
Encrypted
status condition for the Kubernetes API server to verify that its resources were successfully decrypted:$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows
DecryptionCompleted
upon successful decryption:DecryptionCompleted Encryption mode set to identity and everything is decrypted
If the output shows
DecryptionInProgress
, decryption is still in progress. Wait a few minutes and try again.Review the
Encrypted
status condition for the OpenShift OAuth API server to verify that its resources were successfully decrypted:$ oc get authentication.operator.openshift.io -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows
DecryptionCompleted
upon successful decryption:DecryptionCompleted Encryption mode set to identity and everything is decrypted
If the output shows
DecryptionInProgress
, decryption is still in progress. Wait a few minutes and try again.
4.12.5. Backing up etcd data
Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.
Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. You have checked whether the cluster-wide proxy is enabled.
TipYou can check whether the proxy is enabled by reviewing the output of
oc get proxy cluster -o yaml
. The proxy is enabled if thehttpProxy
,httpsProxy
, andnoProxy
fields have values set.
Procedure
Start a debug session as root for a control plane node:
$ oc debug --as-root node/<node_name>
Change your root directory to
/host
in the debug shell:sh-4.4# chroot /host
If the cluster-wide proxy is enabled, export the
NO_PROXY
,HTTP_PROXY
, andHTTPS_PROXY
environment variables by running the following commands:$ export HTTP_PROXY=http://<your_proxy.example.com>:8080
$ export HTTPS_PROXY=https://<your_proxy.example.com>:8080
$ export NO_PROXY=<example.com>
Run the
cluster-backup.sh
script in the debug shell and pass in the location to save the backup to.TipThe
cluster-backup.sh
script is maintained as a component of the etcd Cluster Operator and is a wrapper around theetcdctl snapshot save
command.sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup
Example script output
found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6 found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7 found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6 found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3 ede95fe6b88b87ba86a03c15e669fb4aa5bf0991c180d3c6895ce72eaade54a1 etcdctl version: 3.4.14 API version: 3.4 {"level":"info","ts":1624647639.0188997,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db.part"} {"level":"info","ts":"2021-06-25T19:00:39.030Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"} {"level":"info","ts":1624647639.0301006,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.0.5:2379"} {"level":"info","ts":"2021-06-25T19:00:40.215Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"} {"level":"info","ts":1624647640.6032252,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.0.5:2379","size":"114 MB","took":1.584090459} {"level":"info","ts":1624647640.6047094,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db"} Snapshot saved at /home/core/assets/backup/snapshot_2021-06-25_190035.db {"hash":3866667823,"revision":31407,"totalKey":12828,"totalSize":114446336} snapshot db and kube resources are successfully saved to /home/core/assets/backup
In this example, two files are created in the
/home/core/assets/backup/
directory on the control plane host:-
snapshot_<datetimestamp>.db
: This file is the etcd snapshot. Thecluster-backup.sh
script confirms its validity. static_kuberesources_<datetimestamp>.tar.gz
: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.NoteIf etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required to restore from the etcd snapshot.
Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted.
-
4.12.6. Defragmenting etcd data
For large and dense clusters, etcd can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment etcd to free up space in the data store. Monitor Prometheus for etcd metrics and defragment it when required; otherwise, etcd can raise a cluster-wide alarm that puts the cluster into a maintenance mode that accepts only key reads and deletes.
Monitor these key metrics:
-
etcd_server_quota_backend_bytes
, which is the current quota limit -
etcd_mvcc_db_total_size_in_use_in_bytes
, which indicates the actual database usage after a history compaction -
etcd_mvcc_db_total_size_in_bytes
, which shows the database size, including free space waiting for defragmentation
Defragment etcd data to reclaim disk space after events that cause disk fragmentation, such as etcd history compaction.
History compaction is performed automatically every five minutes and leaves gaps in the back-end database. This fragmented space is available for use by etcd, but is not available to the host file system. You must defragment etcd to make this space available to the host file system.
Defragmentation occurs automatically, but you can also trigger it manually.
Automatic defragmentation is good for most cases, because the etcd operator uses cluster information to determine the most efficient operation for the user.
4.12.6.1. Automatic defragmentation
The etcd Operator automatically defragments disks. No manual intervention is needed.
Verify that the defragmentation process is successful by viewing one of these logs:
- etcd logs
- cluster-etcd-operator pod
- operator status error log
Automatic defragmentation can cause leader election failure in various OpenShift core components, such as the Kubernetes controller manager, which triggers a restart of the failing component. The restart is harmless and either triggers failover to the next running instance or the component resumes work again after the restart.
Example log output for successful defragmentation
etcd member has been defragmented: <member_name>, memberID: <member_id>
Example log output for unsuccessful defragmentation
failed defrag on member: <member_name>, memberID: <member_id>: <error_message>