Chapter 3. Performing a cluster update

3.1. Updating a cluster using the CLI
Copy link

You can perform minor version and patch updates on an OpenShift Container Platform cluster by using the OpenShift CLI (oc).

3.1.1. About updating single node OpenShift Container Platform
Copy link

You can update a single-node OpenShift Container Platform cluster by using either the console or CLI.

However, note the following limitations:

The prerequisite to pause the MachineHealthCheck resources is not required because there is no other node to perform the health check.
Restoring a single-node OpenShift Container Platform cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your update fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.
Updating a single-node OpenShift Container Platform cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:
- If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.
- If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OpenShift Container Platform because there is no other node in the cluster to reschedule the workloads to.
- If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.

Important

There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically.

3.1.2. Prerequisites for a cluster update
Copy link

You must satisfy the following prerequisites before updating a cluster using the CLI.

Have access to the cluster as a user with admin privileges. See "Using RBAC to define and apply permissions" for more information.
Have a recent etcd backup in case your update fails and you must restore your cluster to a previous state.
Have a recent Container Storage Interface (CSI) volume snapshot in case you need to restore persistent volumes due to a pod failure.
Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default software catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" for more information on how to check compatibility and, if necessary, update the installed Operators.
Ensure that all machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
If your cluster uses manually maintained credentials, update the cloud provider resources for the new release. For more information, including how to determine if this is a requirement for your cluster, see "Preparing to update a cluster with manually maintained credentials".
Ensure that you address all Upgradeable=False conditions so the cluster allows an update to the next minor version. An alert displays at the top of the Cluster Settings page when you have one or more cluster Operators that cannot be updated. You can still update to the next available patch update for the minor release you are currently on.
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If minAvailable is set to 1 in PodDisruptionBudget, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and the PodDisruptionBudget field can prevent the node drain.

Important

When an update is failing to complete, the Cluster Version Operator (CVO) reports the status of any blocking components while attempting to reconcile the update. Rolling your cluster back to a previous version is not supported. If your update is failing to complete, contact Red Hat support.
Using the unsupportedConfigOverrides section to modify the configuration of an Operator is unsupported and might block cluster updates. You must remove this setting before you can update your cluster.

3.1.3. Pausing a MachineHealthCheck resource
Copy link

During the update process, nodes in the cluster might become temporarily unavailable. For worker nodes, the MachineHealthCheck resources might identify such nodes as unhealthy and reboot them. To avoid rebooting worker nodes, you must pause all the MachineHealthCheck resources before updating the cluster.

Note

Some MachineHealthCheck resources might not need to be paused. If your MachineHealthCheck resource relies on unrecoverable conditions, pausing that MHC is unnecessary.

Prerequisites

You installed the OpenShift CLI (oc).

Procedure

List all of the available MachineHealthCheck resources that you want to pause by running the following command:
```
$ oc get machinehealthcheck -n openshift-machine-api
```

For each MachineHealthCheck resource, pause the machine health check by running the following command:

$ oc -n openshift-machine-api annotate mhc <mhc_name> cluster.x-k8s.io/paused=""

The annotated MachineHealthCheck resource resembles the following YAML file:

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example
  namespace: openshift-machine-api
  annotations:
    cluster.x-k8s.io/paused: ""
spec:
  selector:
    matchLabels:
      role: worker
  unhealthyConditions:
  - type:    "Ready"
    status:  "Unknown"
    timeout: "300s"
  - type:    "Ready"
    status:  "False"
    timeout: "300s"
  maxUnhealthy: "40%"
status:
  currentHealthy: 5
  expectedMachines: 5

Important

Resume the machine health checks after updating the cluster. To resume the check, remove the pause annotation from the MachineHealthCheck resource by running the following command:

$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused-

3.1.4. Updating a cluster by using the CLI
Copy link

You can use the OpenShift CLI (oc) to review and request cluster updates.

You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.

Prerequisites

You installed the OpenShift CLI (oc) that matches the version for your updated version.
You are logged in to the cluster as user with cluster-admin privileges.
You have paused all MachineHealthCheck resources.

Procedure

View the available updates and note the version number of the update that you want to apply by running the following command:

$ oc adm upgrade

Example output

Cluster version is 4.13.10
Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.13 (available channels: candidate-4.13, candidate-4.14, fast-4.13, stable-4.13)
Recommended updates:
  VERSION     IMAGE
  4.13.14     quay.io/openshift-release-dev/ocp-release@sha256:406fcc160c097f61080412afcfa7fd65284ac8741ac7ad5b480e304aba73674b
  4.13.13     quay.io/openshift-release-dev/ocp-release@sha256:d62495768e335c79a215ba56771ff5ae97e3cbb2bf49ed8fb3f6cefabcdc0f17
  4.13.12     quay.io/openshift-release-dev/ocp-release@sha256:73946971c03b43a0dc6f7b0946b26a177c2f3c9d37105441315b4e3359373a55
  4.13.11     quay.io/openshift-release-dev/ocp-release@sha256:e1c2377fdae1d063aaddc753b99acf25972b6997ab9a0b7e80cfef627b9ef3dd

Note

If there are no recommended updates, updates that have known issues might still be available. See Updating along a conditional update path for more information.
For details and information on how to perform a Control Plane Only update, see "Performing a Control Plane Only update".

Based on your organization requirements, set the appropriate update channel by running the following command. For example, you can set your channel to stable-4.13 or fast-4.13. For more information about channels, see "Understanding update channels and releases".
```
$ oc adm upgrade channel <channel>
```
Example command
```
$ oc adm upgrade channel stable-4.17
```
Important
For production clusters, you must subscribe to a stable-*, eus-*, or fast-* channel.
Note
When you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner you declare the update channel, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time.
If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path.
Apply an update:
- To update to the latest version, run the following command:
  $ oc adm upgrade --to-latest=true
  1
- To update to a specific version, run the following command:
  $ oc adm upgrade --to=<version>
  Replace <version> with the update version that you obtained from the output of the oc adm upgrade recommend command.
  Important
  When using the oc adm upgrade --help command, there is a listed option for the --force flag. This is heavily discouraged, because using the --force option bypasses cluster-side guards, including release verification and precondition checks. Using the --force flag does not guarantee a successful update. Bypassing guards puts the cluster at risk.
Review the status of the Cluster Version Operator:
```
$ oc adm upgrade
```

After the update completes, confirm that the cluster version has updated to the new version by running the following command:

$ oc adm upgrade

Example output

Cluster version is <version>

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-<version> (available channels: candidate-<version>, eus-<version>, fast-<version>, stable-<version>)

No updates available. You may force an update to a specific release image, but doing so might not be supported and might result in downtime or data loss.

If you are updating your cluster to the next minor version, such as version X.y to X.(y+1), confirm that your nodes are updated before deploying workloads that rely on a new feature. Run the following command:

$ oc get nodes

Example output

NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-168-251.ec2.internal   Ready    master   82m   v1.30.3
ip-10-0-170-223.ec2.internal   Ready    master   82m   v1.30.3
ip-10-0-179-95.ec2.internal    Ready    worker   70m   v1.30.3
ip-10-0-182-134.ec2.internal   Ready    worker   70m   v1.30.3
ip-10-0-211-16.ec2.internal    Ready    master   82m   v1.30.3
ip-10-0-250-100.ec2.internal   Ready    worker   69m   v1.30.3

3.1.5. Gathering cluster update status using oc adm upgrade status (Technology Preview)
Copy link

When updating your cluster, it is useful to understand how your update is progressing. While the oc adm upgrade command returns limited information about the status of your update, this release introduces the oc adm upgrade status command as a Technology Preview feature. This command decouples status information from the oc adm upgrade command and provides specific information regarding a cluster update, including the status of the control plane and worker node updates.

The oc adm upgrade status command is read-only and will never alter any state in your cluster.

Important

The oc adm upgrade status command is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The oc adm upgrade status command can be used for clusters from version 4.12 up to the latest supported release.

Important

While your cluster does not need to be a Technology Preview-enabled cluster, you must enable the OC_ENABLE_CMD_UPGRADE_STATUS Technology Preview environment variable, otherwise the OpenShift CLI (oc) will not recognize the command and you will not be able to use the feature.

Procedure

Set the OC_ENABLE_CMD_UPGRADE_STATUS environmental variable to true by running the following command:
```
$ export OC_ENABLE_CMD_UPGRADE_STATUS=true
```

Run the oc adm upgrade status command:

$ oc adm upgrade status

Example 3.1. Example output for an update progressing successfully

= Control Plane =
Assessment:      Progressing
Target Version:  4.14.1 (from 4.14.0)
Completion:      97%
Duration:        54m
Operator Status: 32 Healthy, 1 Unavailable

Control Plane Nodes
NAME                                        ASSESSMENT    PHASE      VERSION   EST    MESSAGE
ip-10-0-53-40.us-east-2.compute.internal    Progressing   Draining   4.14.0    +10m
ip-10-0-30-217.us-east-2.compute.internal   Outdated      Pending    4.14.0    ?
ip-10-0-92-180.us-east-2.compute.internal   Outdated      Pending    4.14.0    ?

= Worker Upgrade =

= Worker Pool =
Worker Pool:     worker
Assessment:      Progressing
Completion:      0%
Worker Status:   3 Total, 2 Available, 1 Progressing, 3 Outdated, 1 Draining, 0 Excluded, 0 Degraded

Worker Pool Nodes
NAME                                        ASSESSMENT    PHASE      VERSION   EST    MESSAGE
ip-10-0-4-159.us-east-2.compute.internal    Progressing   Draining   4.14.0    +10m
ip-10-0-20-162.us-east-2.compute.internal   Outdated      Pending    4.14.0    ?
ip-10-0-99-40.us-east-2.compute.internal    Outdated      Pending    4.14.0    ?

= Worker Pool =
Worker Pool:     infra
Assessment:      Progressing
Completion:      0%
Worker Status:   1 Total, 0 Available, 1 Progressing, 1 Outdated, 1 Draining, 0 Excluded, 0 Degraded

Worker Pool Node
NAME                                             ASSESSMENT    PHASE      VERSION   EST    MESSAGE
ip-10-0-4-159-infra.us-east-2.compute.internal   Progressing   Draining   4.14.0    +10m

= Update Health =
SINCE   LEVEL   IMPACT   MESSAGE
14m4s   Info    None     Update is proceeding well

With this information, you can make informed decisions on how to proceed with your update.

3.1.6. Updating along a conditional update path
Copy link

You can update along a recommended conditional update path using the web console or the OpenShift CLI (oc). When a conditional update is not recommended for your cluster, you can update along a conditional update path using the OpenShift CLI (oc) 4.10 or later.

Procedure

To view the description of the update when it is not recommended because a risk might apply, run the following command:
```
$ oc adm upgrade --include-not-recommended
```
If the cluster administrator evaluates the potential known risks and decides it is acceptable for the current cluster, then the administrator can waive the safety guards and proceed the update by running the following command:
```
$ oc adm upgrade --allow-not-recommended --to <version> <.>
```
<.> <version> is the update version that you obtained from the output of the previous command, which is supported but also has known issues or risks.

3.1.7. Changing the update server by using the CLI
Copy link

You can change the update server your cluster uses to retrieve information about update paths.

Changing the update server is optional. If you have an OpenShift Update Service (OSUS) installed and configured locally, you must set the URL for the server as the upstream to use the local server during updates. The default value for upstream is https://api.openshift.com/api/upgrades_info/v1/graph.

Procedure

Change the upstream parameter value in the cluster version by running the following command:
```
$ oc patch clusterversion/version --patch '{"spec":{"upstream":"<update_server_url>"}}' --type=merge
```
Replace <update_server_url> with the URL for the update server.
Example output
```
clusterversion.config.openshift.io/version patched
```

3.2. Updating a cluster using the web console
Copy link

You can perform minor version and patch updates on an OpenShift Container Platform cluster by using the web console.

Note

Use the web console or oc adm upgrade channel <channel> to change the update channel. You can follow the steps in Updating a cluster using the CLI to complete the update after you change to a 4.17 channel.

3.2.1. Before updating the OpenShift Container Platform cluster
Copy link

Before updating your cluster, you must consider several factors in order to improve the chances of performing a successful update.

Consider the following information:

Whether you have recently backed up etcd.
In PodDisruptionBudget, if minAvailable is set to 1, the nodes are drained to apply pending machine configs that might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and the PodDisruptionBudget field can prevent the node drain.
You might need to update the cloud provider resources for the new release if your cluster uses manually maintained credentials.
You must review administrator acknowledgement requests, take any recommended actions, and provide the acknowledgement when you are ready.
You can perform a partial update by updating the worker or custom pool nodes to accommodate the time it takes to update. You can pause and resume within the progress bar of each pool.

Important

When an update is failing to complete, the Cluster Version Operator (CVO) reports the status of any blocking components while attempting to reconcile the update. Rolling your cluster back to a previous version is not supported. If your update is failing to complete, contact Red Hat support.
Using the unsupportedConfigOverrides section to modify the configuration of an Operator is unsupported and might block cluster updates. You must remove this setting before you can update your cluster.

3.2.2. Changing the update server by using the web console
Copy link

You can change the update server your cluster uses to retrieve information about update paths.

Changing the update server is optional. If you have an OpenShift Update Service (OSUS) installed and configured locally, you must set the URL for the server as the upstream to use the local server during updates.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.

Procedure

On the web console, navigate to Administration Cluster Settings and click version.
Click the YAML tab and then edit the upstream parameter value:
Example YAML snippet
```
  ...
  spec:
    clusterID: db93436d-7b05-42cc-b856-43e11ad2d31a
    upstream: '<update_server_url>'
  ...
```
Replace <update_server_url> with the URL for the update server.
The default upstream value is https://api.openshift.com/api/upgrades_info/v1/graph.
Click Save.

3.2.3. Pausing a MachineHealthCheck resource by using the web console
Copy link

During the update process, nodes in the cluster might become temporarily unavailable. For worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck resources before updating the cluster.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.

Procedure

On the web console, navigate to Compute MachineHealthChecks.
For each MachineHealthCheck resource, pause the machine health checks by adding the cluster.x-k8s.io/paused="" annotation to the resource. For example, to add the annotation to the machine-api-termination-handler resource, complete the following steps:
1. Click the Options menu next to the machine-api-termination-handler and click Edit annotations.
2. In the Edit annotations dialog, click Add more.
3. In the Key and Value fields, add cluster.x-k8s.io/paused and "" values, respectively, and click Save.

3.2.4. Updating a cluster by using the web console
Copy link

If updates are available, you can update your cluster from the web console.

You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.

Prerequisites

Have access to the web console as a user with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.
Pause all MachineHealthCheck resources.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.

Procedure

From the web console, click Administration Cluster Settings and review the contents of the Details tab.
For production clusters, ensure that the Channel is set to the correct channel for the version that you want to update to, such as stable-4.17.
Important
For production clusters, you must subscribe to a stable-*, eus-* or fast-* channel.
Note
When you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner you declare the update channel, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time.
If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path.
If the Update status is not Updates available, you cannot update your cluster.
Select channel indicates the cluster version that your cluster is running or is updating to.
Select a version to update to, and click Save.
The Input channel Update status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.
Note
If you are updating your cluster to the next minor version, for example from version 4.10 to 4.11, confirm that your nodes are updated before deploying workloads that rely on a new feature. Any pools with worker nodes that are not yet updated are displayed on the Cluster Settings page.
After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.
- If updates are available, continue to perform updates in the current channel until you can no longer update.
- If no updates are available, change the Channel to the stable-*, eus-* or fast-* channel for the next minor version, and update to the version that you want in that channel.
You might need to perform several intermediate updates until you reach the version that you want.

3.2.5. Viewing conditional updates in the web console
Copy link

You can view and assess the risks associated with particular updates with conditional updates.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.
Pause all MachineHealthCheck resources.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing an advanced update strategy, such as a canary rollout, an EUS update, or a control-plane update.

Procedure

From the web console, click Administration Cluster settings page and review the contents of the Details tab.
You can enable the Include versions with known issues feature in the Select new version dropdown of the Update cluster modal to populate the dropdown list with conditional updates.
Note
If a version with known issues is selected, more information is provided with potential risks that are associated with the version.
Review the notification detailing the potential risks to updating.

3.2.6. Performing a canary rollout update
Copy link

In some specific use cases, you might want a more controlled update process where you do not want specific nodes updated concurrently with the rest of the cluster.

These use cases include, but are not limited to the following situations:

You have mission-critical applications that you do not want unavailable during the update. You can slowly test the applications on your nodes in small batches after the update.
You have a small maintenance window that does not allow the time for all nodes to be updated, or you have multiple maintenance windows.

The rolling update process is not a typical update workflow. With larger clusters, it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that can affect the entire cluster. It is recommended that you carefully consider whether your organization wants to use a rolling update and carefully plan the implementation of the process before you start.

The rolling update process described in this topic involves:

Creating one or more custom machine config pools (MCPs).
Labeling each node that you do not want to update immediately to move those nodes to the custom MCPs.
Pausing those custom MCPs, which prevents updates to those nodes.
Performing the cluster update.
Unpausing one custom MCP, which triggers the update on those nodes.
Testing the applications on those nodes to make sure the applications work as expected on those newly-updated nodes.
Optionally removing the custom labels from the remaining nodes in small batches and testing the applications on those nodes.

Note

Pausing an MCP should be done with careful consideration and for short periods of time only.

If you want to use the canary rollout update process, see "Performing a canary rollout update".

3.2.7. About updating single node OpenShift Container Platform
Copy link

You can update a single-node OpenShift Container Platform cluster by using either the console or CLI.

However, note the following limitations:

The prerequisite to pause the MachineHealthCheck resources is not required because there is no other node to perform the health check.
Restoring a single-node OpenShift Container Platform cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your update fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.
Updating a single-node OpenShift Container Platform cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:
- If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.
- If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OpenShift Container Platform because there is no other node in the cluster to reschedule the workloads to.
- If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.

Important

There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically.

3.3. Performing a Control Plane Only update
Copy link

To reduce the rebooting of non-control plane hosts during cluster updates, you can perform a Control Plane Only update for your cluster.

Due to fundamental Kubernetes design, all OpenShift Container Platform updates between minor versions must be serialized. You must update from OpenShift Container Platform <4.y> to <4.y+1>, and then to <4.y+2>. You cannot update from OpenShift Container Platform <4.y> to <4.y+2> directly. However, administrators who want to update between two even-numbered minor versions can do so incurring only a single reboot of non-control plane hosts.

Important

This update was previously known as an EUS-to-EUS update and is now referred to as a Control Plane Only update. These updates are only viable between even-numbered minor versions of OpenShift Container Platform.

There are several caveats to consider when attempting a Control Plane Only update.

Control Plane Only updates are only offered after updates between all versions involved have been made available in stable channels.
If you encounter issues during or after updating to the odd-numbered minor version but before updating to the next even-numbered version, then remediation of those issues may require that non-control plane hosts complete the update to the odd-numbered version before moving forward.
You can do a partial update by updating the worker or custom pool nodes to accommodate the time it takes for maintenance.
Until the machine config pools are unpaused and the update is complete, some features and bugs fixes in <4.y+1> and <4.y+2> of OpenShift Container Platform are not available.
All the clusters might update using EUS channels for a conventional update without pools paused, but only clusters with non control-plane MachineConfigPools objects can do Control Plane Only updates with pools paused.

3.3.1. Performing a Control Plane Only update
Copy link

You can perform a Control Plane Only update by pausing all non-master machine config pools, performing updates from OpenShift Container Platform <4.y> to <4.y+1> to <4.y+2>, then unpausing the machine config pools.

Following this procedure reduces the total update duration and the number of times worker nodes are restarted.

Prerequisites

You reviewed the release notes for OpenShift Container Platform <4.y+1> and <4.y+2>.
You reviewed the release notes and product lifecycles for any layered products and Operator Lifecycle Manager (OLM) Operators. Some products and OLM Operators might require updates either before or during a Control Plane Only update.
You are familiar with version-specific prerequisites, such as the removal of deprecated APIs, that are required before updating from OpenShift Container Platform <4.y+1> to <4.y+2>.
If your cluster uses in-tree vSphere volumes, you updated vSphere to version 7.0u3L+ or 8.0u2+.
Important
If you do not update vSphere to 7.0u3L+ or 8.0u2+ before initiating an OpenShift Container Platform update, known issues might occur with your cluster after the update. For more information, see Known Issues with OpenShift 4.12 to 4.13 or 4.13 to 4.14 vSphere CSI Storage Migration.

3.3.1.1. Control Plane Only update using the web console
Copy link

You can perform a Control Plane Only update by using the web console.

Prerequisites

You verified that machine config pools are unpaused.
You have access to the web console as a user with admin privileges.

Procedure

Using the Administrator perspective on the web console, update any Operator Lifecycle Manager (OLM) Operators to the versions that are compatible with your intended updated version. For more information, see "Updating installed Operators".
Verify that all machine config pools display a status of Up to date and that no machine config pool displays a status of UPDATING.
To view the status of all machine config pools, click Compute MachineConfigPools and review the contents of the Update status column.
Note
If your machine config pools have an Updating status, wait for this status to change to Up to date. This process could take several minutes.
Set your channel to eus-<4.y+2>.
To set your channel, click Administration Cluster Settings Channel. You can edit your channel by clicking on the current hyperlinked channel.
Pause all worker machine pools except for the master pool. You can perform this action on the MachineConfigPools tab under the Compute page. Select the vertical ellipses next to the machine config pool you’d like to pause and click Pause updates.
Update to version <4.y+1> and complete up to the Save step. For more information, see "Updating a cluster by using the web console".
Ensure that the <4.y+1> updates are complete by viewing the Last completed version of your cluster. You can find this information on the Cluster Settings page under the Details tab.
If necessary, update your OLM Operators by using the Administrator perspective on the web console. For more information, see "Updating installed Operators".
Update to version <4.y+2> and complete up to the Save step. For more information, see "Updating a cluster by using the web console".
Ensure that the <4.y+2> update is complete by viewing the Last completed version of your cluster. You can find this information on the Cluster Settings page under the Details tab.
Unpause all previously paused machine config pools. You can perform this action on the MachineConfigPools tab under the Compute page. Select the vertical ellipses next to the machine config pool you’d like to unpause and click Unpause updates.
Important
If pools are paused, the cluster is not permitted to upgrade to any future minor versions, and some maintenance tasks are inhibited. This puts the cluster at risk for future degradation.
Verify that your previously paused pools are updated and that your cluster has completed the update to version <4.y+2>.
You can verify that your pools have updated on the MachineConfigPools tab under the Compute page by confirming that the Update status has a value of Up to date.
Important
When you update a cluster that contains Red Hat Enterprise Linux (RHEL) compute machines, those machines temporarily become unavailable during the update process. You must run the upgrade playbook against each RHEL machine as it enters the NotReady state for the cluster to finish updating. For more information, see "Updating a cluster that includes RHEL compute machines".
You can verify that your cluster has completed the update by viewing the Last completed version of your cluster. You can find this information on the Cluster Settings page under the Details tab.

3.3.1.2. Control Plane Only update using the CLI
Copy link

You can perform a Control Plane Only update by using the OpenShift CLI (oc).

Prerequisites

You verified that machine config pools are unpaused.
You updated the OpenShift CLI (oc) to the target version before each update.
Important
It is highly discouraged to skip this prerequisite. If the OpenShift CLI (oc) is not updated to the target version before your update, unexpected issues may occur.

Procedure

Using the Administrator perspective on the web console, update any Operator Lifecycle Manager (OLM) Operators to the versions that are compatible with your intended updated version. You can find more information on how to perform this action in "Updating installed Operators"; see "Additional resources".

Verify that all machine config pools display a status of UPDATED and that no machine config pool displays a status of UPDATING. To view the status of all machine config pools, run the following command:

$ oc get mcp

Example output

NAME     CONFIG                                         	UPDATED   UPDATING
master   rendered-master-ecbb9582781c1091e1c9f19d50cf836c       True  	  False
worker   rendered-worker-00a3f0c68ae94e747193156b491553d5       True  	  False

Your current version is <4.y>, and your intended version to update is <4.y+2>. Change to the eus-<4.y+2> channel by running the following command:
```
$ oc adm upgrade channel eus-<4.y+2>
```
Note
If you receive an error message indicating that eus-<4.y+2> is not one of the available channels, this indicates that Red Hat is still rolling out EUS version updates. This rollout process generally takes 45-90 days starting at the GA date.
Pause all worker machine pools except for the master pool by running the following command:
```
$ oc patch mcp/worker --type merge --patch '{"spec":{"paused":true}}'
```
Note
You cannot pause the master pool.
Update to the latest version by running the following command:
```
$ oc adm upgrade --to-latest
```
Example output
```
Updating to latest version <4.y+1.z>
```
Review the cluster version to ensure that the updates are complete by running the following command:
```
$ oc adm upgrade
```
Example output
```
Cluster version is <4.y+1.z>
...
```
Update to version <4.y+2> by running the following command:
```
$ oc adm upgrade --to-latest
```
Retrieve the cluster version to ensure that the <4.y+2> updates are complete by running the following command:
```
$ oc adm upgrade
```
Example output
```
Cluster version is <4.y+2.z>
...
```
To update your worker nodes to <4.y+2>, unpause all previously paused machine config pools by running the following command:
```
$ oc patch mcp/worker --type merge --patch '{"spec":{"paused":false}}'
```
Important
If pools are not unpaused, the cluster is not permitted to update to any future minor versions, and some maintenance tasks are inhibited. This puts the cluster at risk for future degradation.
Verify that your previously paused pools are updated and that the update to version <4.y+2> is complete by running the following command:
```
$ oc get mcp
```
Important
When you update a cluster that contains Red Hat Enterprise Linux (RHEL) compute machines, those machines temporarily become unavailable during the update process. You must run the upgrade playbook against each RHEL machine as it enters the NotReady state for the cluster to finish updating. For more information, see "Updating a cluster that includes RHEL compute machines" in the additional resources section.
Example output
```
NAME 	   CONFIG                                            UPDATED     UPDATING
master   rendered-master-52da4d2760807cb2b96a3402179a9a4c    True  	 False
worker   rendered-worker-4756f60eccae96fb9dcb4c392c69d497    True 	 False
```

3.3.1.3. Control Plane Only updates for layered products and Operators installed through Operator Lifecycle Manager
Copy link

There are additional steps to consider when performing Control Plane Only updates for clusters with either layered products or Operators installed through Operator Lifecycle Manager (OLM).

Layered products refer to products that are made of multiple underlying products that are intended to be used together and cannot be broken into individual subscriptions. For examples of layered OpenShift Container Platform products, see Layered Offering On OpenShift.

As you perform a Control Plane Only update for the clusters of layered products and those of Operators that have been installed through OLM, you must complete the following actions:

You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" for more information on how to check compatibility and, if necessary, update the installed Operators.
Confirm the cluster version compatibility between the current and intended Operator versions. You can verify which versions your OLM Operators are compatible with by using the Red Hat OpenShift Container Platform Operator Update Information Checker.

For example, the following high level steps describe how to perform a Control Plane Only update from <4.y> to <4.y+2> for OpenShift Data Foundation (ODF). This can be done through the CLI or web console. For information about how to update clusters through your desired interface, see "Control Plane Only update using the web console" and "Control Plane Only update using the CLI".

Pause the worker machine pools.
Update OpenShift Container Platform from <4.y> to <4.y+1>.
Update ODF from <4.y> to <4.y+1>.
Update OpenShift Container Platform from <4.y+1> to <4.y+2>.
Update ODF to <4.y+2>.
Unpause the worker machine pools.

Note

The update to ODF <4.y+2> can happen before or after worker machine pools have been unpaused.

3.4. Performing a canary rollout update
Copy link

For a more controlled rollout of worker node updates, you can use a canary update. A canary update is an update strategy where worker node updates are performed in discrete, sequential stages instead of updating all worker nodes at the same time.

This strategy can be useful in the following scenarios:

You want a more controlled rollout of worker node updates to ensure that mission-critical applications stay available during the entire update, even if the update process causes your applications to fail.
You want to update a small subset of worker nodes, evaluate cluster and workload health over a period of time, and then update the remaining nodes.
You want to fit worker node updates, which often require a host reboot, into smaller defined maintenance windows when it is not possible to take a large maintenance window to update the entire cluster at one time.

In these scenarios, you can create multiple custom machine config pools (MCPs) to prevent certain worker nodes from updating when you update the cluster. After the rest of the cluster is updated, you can update those worker nodes in batches at appropriate times.

3.4.1. Example Canary update strategy
Copy link

To better understand how the canary rollout strategy works, it is useful to consider an example of an update using the strategy.

The following example describes a canary update strategy where you have a cluster with 100 nodes with 10% excess capacity, you have maintenance windows that must not exceed 4 hours, and you know that it takes no longer than 8 minutes to drain and reboot a worker node.

Note

The previous values are an example only. The time it takes to drain a node might vary depending on factors such as workloads.

3.4.1.1. Definition of custom machine config pools
Copy link

In order to organize the worker node updates into separate stages, you can begin by defining the following machine config pools:

workerpool-canary with 10 nodes
workerpool-A with 30 nodes
workerpool-B with 30 nodes
workerpool-C with 30 nodes

3.4.1.2. Update of the canary worker pool
Copy link

During your first maintenance window, you pause the machine config pools (MCPs) for workerpool-A, workerpool-B, and workerpool-C, and then initiate the cluster update. This updates components that run on top of OpenShift Container Platform and the 10 nodes that are part of the unpaused workerpool-canary MCP. The other three MCPs are not updated because they were paused.

3.4.1.3. Whether or not to proceed with the remaining worker pool updates
Copy link

If for some reason you determine that your cluster or workload health was negatively affected by the workerpool-canary update, you then cordon and drain all nodes in that pool while still maintaining sufficient capacity until you have diagnosed and resolved the problem. When everything is working as expected, you evaluate the cluster and workload health before deciding to unpause, and thus update, workerpool-A, workerpool-B, and workerpool-C in succession during each additional maintenance window.

Managing worker node updates using custom machine config pools (MCPs) provides flexibility, however it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that might affect the entire cluster. It is recommended that you carefully consider your organizational needs and carefully plan the implementation of the process before you start.

Important

Pausing a machine config pool prevents the Machine Config Operator from applying any configuration changes on the associated nodes. Pausing an MCP also prevents any automatically rotated certificates from being pushed to the associated nodes, including the automatic CA rotation of the kube-apiserver-to-kubelet-signer CA certificate.

If the MCP is paused when the kube-apiserver-to-kubelet-signer CA certificate expires and the MCO attempts to automatically renew the certificate, the MCO cannot push the newly rotated certificates to those nodes. This causes failure in multiple oc commands, including oc debug, oc logs, oc exec, and oc attach. You receive alerts in the Alerting UI of the OpenShift Container Platform web console if an MCP is paused when the certificates are rotated.

Pausing an MCP should be done with careful consideration about the kube-apiserver-to-kubelet-signer CA certificate expiration and for short periods of time only.

Note

It is not recommended to update the MCPs to different OpenShift Container Platform versions. For example, do not update one MCP from 4.y.10 to 4.y.11 and another to 4.y.12. This scenario has not been tested and might result in an undefined cluster state.

3.4.2. About the canary rollout update process and MCPs
Copy link

In OpenShift Container Platform, nodes are not considered individually. Instead, they are grouped into machine config pools (MCPs). By default, nodes in an OpenShift Container Platform cluster are grouped into two MCPs: one for the control plane nodes and one for the worker nodes.

An OpenShift Container Platform update affects all MCPs concurrently.

During the update, the Machine Config Operator (MCO) drains and cordons all nodes within an MCP up to the specified maxUnavailable number of nodes, if a max number is specified. By default, maxUnavailable is set to 1. Draining and cordoning a node deschedules all pods on the node and marks the node as unschedulable.

After the node is drained, the Machine Config Daemon applies a new machine configuration, which can include updating the operating system (OS). Updating the OS requires the host to reboot.

3.4.2.1. Using custom machine config pools
Copy link

To prevent specific nodes from being updated, you can create custom MCPs. Because the MCO does not update nodes within paused MCPs, you can pause the MCPs containing nodes that you do not want to update before initiating a cluster update.

Using one or more custom MCPs can give you more control over the sequence in which you update your worker nodes. For example, after you update the nodes in the first MCP, you can verify the application compatibility and then update the rest of the nodes gradually to the new version.

Warning

The default setting for maxUnavailable is 1 for all the machine config pools in OpenShift Container Platform. It is recommended to not change this value and update one control plane node at a time. Do not change this value to 3 for the control plane pool.

Note

To ensure the stability of the control plane, creating a custom MCP from the control plane nodes is not supported. The Machine Config Operator (MCO) ignores any custom MCP created for the control plane nodes.

3.4.2.2. Considerations when using custom machine config pools
Copy link

Give careful consideration to the number of MCPs that you create and the number of nodes in each MCP, based on your workload deployment topology. For example, if you must fit updates into specific maintenance windows, you must know how many nodes OpenShift Container Platform can update within a given window. This number is dependent on your unique cluster and workload characteristics.

You must also consider how much extra capacity is available in your cluster to determine the number of custom MCPs and the amount of nodes within each MCP. In a case where your applications fail to work as expected on newly updated nodes, you can cordon and drain those nodes in the pool, which moves the application pods to other nodes. However, you must determine whether the available nodes in the remaining MCPs can provide sufficient quality-of-service (QoS) for your applications.

Note

You can use this update process with all documented OpenShift Container Platform update processes. However, the process does not work with Red Hat Enterprise Linux (RHEL) machines, which are updated using Ansible playbooks.

3.4.3. About performing a canary rollout update
Copy link

The process of a canary update can be understood as several high-level steps.

The following steps outline the high-level workflow of the process:

Create custom machine config pools (MCP) based on the worker pool.
Note
You can change the maxUnavailable setting in an MCP to specify the percentage or the number of machines that can be updating at any given time. The default is 1.
Warning
The default setting for maxUnavailable is 1 for all the machine config pools in OpenShift Container Platform. It is recommended to not change this value and update one control plane node at a time. Do not change this value to 3 for the control plane pool.
Add a node selector to the custom MCPs. For each node that you do not want to update simultaneously with the rest of the cluster, add a matching label to the nodes. This label associates the node to the MCP.
Important
Do not remove the default worker label from the nodes. The nodes must have a role label to function properly in the cluster.
Pause the MCPs you do not want to update as part of the update process.
Perform the cluster update. The update process updates the MCPs that are not paused, including the control plane nodes.
Test your applications on the updated nodes to ensure they are working as expected.
Unpause one of the remaining MCPs, wait for the nodes in that pool to finish updating, and test the applications on those nodes. Repeat this process until all worker nodes are updated.
Optional: Remove the custom label from updated nodes and delete the custom MCPs.

3.4.4. Creating machine config pools to perform a canary rollout update
Copy link

To perform a canary rollout update, you must first create one or more custom machine config pools (MCP).

Procedure

List the worker nodes in your cluster by running the following command:

$ oc get -l 'node-role.kubernetes.io/master!=' -o 'jsonpath={range .items[*]}{.metadata.name}{"\n"}{end}' nodes

Example output

ci-ln-pwnll6b-f76d1-s8t9n-worker-a-s75z4
ci-ln-pwnll6b-f76d1-s8t9n-worker-b-dglj2
ci-ln-pwnll6b-f76d1-s8t9n-worker-c-lldbm

For each node that you want to delay, add a custom label to the node by running the following command:

$ oc label node <node_name> node-role.kubernetes.io/<custom_label>=

For example:

$ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary=

Example output

node/ci-ln-gtrwm8t-f76d1-spbl7-worker-a-xk76k labeled

Create the new MCP:

Create an MCP YAML file:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: workerpool-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,workerpool-canary]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/workerpool-canary: ""

where:

metadata.name: Specifies a name for the MCP.
spec.machineConfigSelector.matchExpressions.values: Specifies the worker and custom MCP name.
spec.nodeSelectormatchLabels.node-role.kubernetes.io/workerpool-canary: Specifies the custom label you added to the nodes that you want in this pool.

Create the MachineConfigPool object by running the following command:

$ oc create -f <file_name>

Example output

machineconfigpool.machineconfiguration.openshift.io/workerpool-canary created

View the list of MCPs in the cluster and their current state by running the following command:

$ oc get machineconfigpool

Example output

NAME              CONFIG                                                        UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master            rendered-master-b0bb90c4921860f2a5d8a2f8137c1867              True      False      False      3              3                   3                     0                      97m
workerpool-canary rendered-workerpool-canary-87ba3dec1ad78cb6aecebf7fbb476a36   True      False      False      1              1                   1                     0                      2m42s
worker            rendered-worker-87ba3dec1ad78cb6aecebf7fbb476a36              True      False      False      2              2                   2                     0                      97m

The new machine config pool, workerpool-canary, is created and the number of nodes to which you added the custom label are shown in the machine counts. The worker MCP machine counts are reduced by the same number. It can take several minutes to update the machine counts. In this example, one node was moved from the worker MCP to the workerpool-canary MCP.

3.4.5. Managing machine configuration inheritance for a worker pool canary
Copy link

You can configure a machine config pool (MCP) canary to inherit any MachineConfig assigned to an existing MCP. This configuration is useful when you want to use an MCP canary to test as you update nodes one at a time for an existing MCP.

Prerequisites

You have created one or more MCPs.

Procedure

Create a secondary MCP as described in the following two steps:

Save the following configuration file as machineConfigPool.yaml.

Example machineConfigPool YAML

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf: ""
# ...

Create the new machine config pool by running the following command:

$ oc create -f machineConfigPool.yaml

Example output

machineconfigpool.machineconfiguration.openshift.io/worker-perf created

Add some machines to the secondary MCP. The following example labels the worker nodes worker-a, worker-b, and worker-c to the MCP worker-perf:

$ oc label node worker-a node-role.kubernetes.io/worker-perf=''

$ oc label node worker-b node-role.kubernetes.io/worker-perf=''

$ oc label node worker-c node-role.kubernetes.io/worker-perf=''

Create a new MachineConfig for the MCP worker-perf as described in the following two steps:

Save the following MachineConfig example as a file called new-machineconfig.yaml:

Example MachineConfig YAML

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-perf
  name: 06-kdump-enable-worker-perf
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - enabled: true
        name: kdump.service
  kernelArguments:
    - crashkernel=512M
# ...

Apply the MachineConfig by running the following command:
```
$ oc create -f new-machineconfig.yaml
```

Create the new canary MCP and add machines from the MCP you created in the previous steps. The following example creates an MCP called worker-perf-canary, and adds machines from the worker-perf MCP that you previosuly created.
1. Label the canary worker node worker-a by running the following command:
  $ oc label node worker-a node-role.kubernetes.io/worker-perf-canary=''
2. Remove the canary worker node worker-a from the original MCP by running the following command:
  $ oc label node worker-a node-role.kubernetes.io/worker-perf-
3. Save the following file as machineConfigPool-Canary.yaml.
  Example machineConfigPool-Canary.yaml file
  apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-perf-canary spec: machineConfigSelector: matchExpressions: - { key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-perf,worker-perf-canary] } nodeSelector: matchLabels: node-role.kubernetes.io/worker-perf-canary: ""
  where:
  spec.machineConfigSelector.matchExpressions.values
  Specifies a value you can use to configure members of an additional MachineConfig. This example includes worker-perf-canary as an additional value. This is an optional value.
4. Create the new worker-perf-canary by running the following command:
  $ oc create -f machineConfigPool-Canary.yaml
  Example output
  machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary created

Check if the MachineConfig is inherited in worker-perf-canary.

Verify that no MCP is degraded by running the following command:

$ oc get mcp

Example output

NAME                  CONFIG                                                          UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                rendered-master-2bf1379b39e22bae858ea1a3ff54b2ac                True      False      False      3              3                   3                     0                      5d16h
worker                rendered-worker-b9576d51e030413cfab12eb5b9841f34                True      False      False      0              0                   0                     0                      5d16h
worker-perf          rendered-worker-perf-b98a1f62485fa702c4329d17d9364f6a          True      False      False      2              2                   2                     0                      56m
worker-perf-canary   rendered-worker-perf-canary-b98a1f62485fa702c4329d17d9364f6a   True      False      False      1              1                   1                     0                      44m

Verify that the machines are inherited from worker-perf into worker-perf-canary.

$ oc get nodes

Example output

NAME       STATUS   ROLES                        AGE     VERSION
...
worker-a   Ready    worker,worker-perf-canary   5d15h   v1.27.13+e709aa5
worker-b   Ready    worker,worker-perf          5d15h   v1.27.13+e709aa5
worker-c   Ready    worker,worker-perf          5d15h   v1.27.13+e709aa5

Verify that kdump service is enabled on worker-a by running the following command:

$ systemctl status kdump.service

Example output

NAME       STATUS   ROLES                        AGE     VERSION
...
kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: disabled)
     Active: active (exited) since Tue 2024-09-03 12:44:43 UTC; 10s ago
    Process: 4151139 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
   Main PID: 4151139 (code=exited, status=0/SUCCESS)

Verify that the MCP has updated the crashkernel by running the following command:
```
$ cat /proc/cmdline
```
The output should include the updated crashekernel value, for example:
Example output
```
crashkernel=512M
```

Optional: If you are satisfied with the upgrade, you can return worker-a to worker-perf.
1. Return worker-a to worker-perf by running the following command:
  $ oc label node worker-a node-role.kubernetes.io/worker-perf=''
2. Remove worker-a from the canary MCP by running the following command:
  $ oc label node worker-a node-role.kubernetes.io/worker-perf-canary-

3.4.6. Pausing the machine config pools
Copy link

After you create your custom machine config pools (MCPs), you then pause those MCPs. Pausing an MCP prevents the Machine Config Operator (MCO) from updating the nodes associated with that MCP.

Procedure

Patch the MCP that you want paused by running the following command:

$ oc patch mcp/<mcp_name> --patch '{"spec":{"paused":true}}' --type=merge

For example:

$  oc patch mcp/workerpool-canary --patch '{"spec":{"paused":true}}' --type=merge

Example output

machineconfigpool.machineconfiguration.openshift.io/workerpool-canary patched

3.4.7. Performing the cluster update
Copy link

After the machine config pools (MCP) enter a ready state, you can perform the cluster update.

Procedure

See one of the following update methods, as appropriate for your cluster:
- "Updating a cluster using the web console"
- "Updating a cluster using the CLI"

3.4.8. Unpausing the machine config pools
Copy link

After the OpenShift Container Platform update is complete, unpause your custom machine config pools (MCP) one at a time. Unpausing an MCP allows the Machine Config Operator (MCO) to update the nodes associated with that MCP.

Procedure

Patch the MCP that you want to unpause:

$ oc patch mcp/<mcp_name> --patch '{"spec":{"paused":false}}' --type=merge

For example:

$  oc patch mcp/workerpool-canary --patch '{"spec":{"paused":false}}' --type=merge

Example output

machineconfigpool.machineconfiguration.openshift.io/workerpool-canary patched

Optional: Check the progress of the update by using one of the following options:
1. Check the progress from the web console by clicking Administration Cluster settings.
2. Check the progress by running the following command:
  $ oc get machineconfigpools
Test your applications on the updated nodes to ensure that they are working as expected.
Repeat this process for any other paused MCPs, one at a time.
Note
In case of a failure, such as your applications not working on the updated nodes, you can cordon and drain the nodes in the pool, which moves the application pods to other nodes to help maintain the quality-of-service for the applications. This first MCP should be no larger than the excess capacity.

3.4.9. Moving a node to the original machine config pool
Copy link

After you update and verify applications on nodes in a custom machine config pool (MCP), move the nodes back to their original MCP by removing the custom label that you added to the nodes.

Important

A node must have a role to be properly functioning in the cluster.

Procedure

For each node in a custom MCP, remove the custom label from the node by running the following command:
```
$ oc label node <node_name> node-role.kubernetes.io/<custom_label>-
```
For example:
```
$ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary-
```
Example output
```
node/ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz labeled
```
The Machine Config Operator moves the nodes back to the original MCP and reconciles the node to the MCP configuration.

To ensure that node has been removed from the custom MCP, view the list of MCPs in the cluster and their current state by running the following command:

$ oc get mcp

Example output

NAME                CONFIG                                                   UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master              rendered-master-1203f157d053fd987c7cbd91e3fbc0ed         True      False      False      3              3                   3                     0                      61m
workerpool-canary   rendered-mcp-noupdate-5ad4791166c468f3a35cd16e734c9028   True      False      False      0              0                   0                     0                      21m
worker              rendered-worker-5ad4791166c468f3a35cd16e734c9028         True      False      False      3              3                   3                     0                      61m

When the node is removed from the custom MCP and moved back to the original MCP, it can take several minutes to update the machine counts. In this example, one node was moved from the removed workerpool-canary MCP to the worker MCP.

Optional: Delete the custom MCP by running the following command:
```
$ oc delete mcp <mcp_name>
```

3.5. Updating a cluster that includes RHEL compute machines
Copy link

You can perform minor version and patch updates on an OpenShift Container Platform cluster. If your cluster contains Red Hat Enterprise Linux (RHEL) machines, you must take additional steps to update those machines.

Important

The use of RHEL compute machines on OpenShift Container Platform clusters has been deprecated and will be removed in a future release.

3.5.1. Prerequisites
Copy link

Have access to the cluster as a user with admin privileges. See Using RBAC to define and apply permissions.
Have a recent etcd backup in case your update fails and you must restore your cluster to a previous state.
Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.
If your cluster uses manually maintained credentials, update the cloud provider resources for the new release. For more information, including how to determine if this is a requirement for your cluster, see Preparing to update a cluster with manually maintained credentials.
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If minAvailable is set to 1 in PodDisruptionBudget, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and the PodDisruptionBudget field can prevent the node drain.

3.5.2. Updating a cluster by using the web console
Copy link

If updates are available, you can update your cluster from the web console.

You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.

Prerequisites

Have access to the web console as a user with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.
Pause all MachineHealthCheck resources.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.

Procedure

From the web console, click Administration Cluster Settings and review the contents of the Details tab.
For production clusters, ensure that the Channel is set to the correct channel for the version that you want to update to, such as stable-4.17.
Important
For production clusters, you must subscribe to a stable-*, eus-* or fast-* channel.
Note
When you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner you declare the update channel, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time.
If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path.
If the Update status is not Updates available, you cannot update your cluster.
Select channel indicates the cluster version that your cluster is running or is updating to.
Select a version to update to, and click Save.
The Input channel Update status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.
Note
If you are updating your cluster to the next minor version, for example from version 4.10 to 4.11, confirm that your nodes are updated before deploying workloads that rely on a new feature. Any pools with worker nodes that are not yet updated are displayed on the Cluster Settings page.
After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.
- If updates are available, continue to perform updates in the current channel until you can no longer update.
- If no updates are available, change the Channel to the stable-*, eus-* or fast-* channel for the next minor version, and update to the version that you want in that channel.
You might need to perform several intermediate updates until you reach the version that you want.
Important
When you update a cluster that contains Red Hat Enterprise Linux (RHEL) worker machines, those workers temporarily become unavailable during the update process. You must run the update playbook against each RHEL machine as it enters the NotReady state for the cluster to finish updating.

3.5.3. Optional: Adding hooks to perform Ansible tasks on RHEL machines
Copy link

You can use hooks to run Ansible tasks on the RHEL compute machines during the OpenShift Container Platform update.

3.5.3.1. About Ansible hooks for updates
Copy link

When you update OpenShift Container Platform, you can run custom tasks on your Red Hat Enterprise Linux (RHEL) nodes during specific operations by using hooks. Hooks allow you to provide files that define tasks to run before or after specific update tasks. You can use hooks to validate or modify custom infrastructure when you update the RHEL compute nodes in you OpenShift Container Platform cluster.

Because when a hook fails, the operation fails, you must design hooks that are idempotent, or can run multiple times and provide the same results.

Hooks have the following important limitations: - Hooks do not have a defined or versioned interface. They can use internal openshift-ansible variables, but it is possible that the variables will be modified or removed in future OpenShift Container Platform releases. - Hooks do not have error handling, so an error in a hook halts the update process. If you get an error, you must address the problem and then start the update again.

3.5.3.2. Configuring the Ansible inventory file to use hooks
Copy link

You define the hooks to use when you update the Red Hat Enterprise Linux (RHEL) compute machines, which are also known as worker machines, in the hosts inventory file under the all:vars section.

Prerequisites

You have access to the machine that you used to add the RHEL compute machines cluster. You must have access to the hosts Ansible inventory file that defines your RHEL machines.

Procedure

After you design the hook, create a YAML file that defines the Ansible tasks for it. This file must be a set of tasks and cannot be a playbook, as shown in the following example:

---
# Trivial example forcing an operator to acknowledge the start of an upgrade
# file=/home/user/openshift-ansible/hooks/pre_compute.yml

- name: note the start of a compute machine update
  debug:
      msg: "Compute machine upgrade of {{ inventory_hostname }} is about to start"

- name: require the user agree to start an upgrade
  pause:
      prompt: "Press Enter to start the compute machine update"

Modify the hosts Ansible inventory file to specify the hook files. The hook files are specified as parameter values in the [all:vars] section, as shown:
Example hook definitions in an inventory file
```
[all:vars]
openshift_node_pre_upgrade_hook=/home/user/openshift-ansible/hooks/pre_node.yml
openshift_node_post_upgrade_hook=/home/user/openshift-ansible/hooks/post_node.yml
```
To avoid ambiguity in the paths to the hook, use absolute paths instead of a relative paths in their definitions.

3.5.3.3. Available hooks for RHEL compute machines
Copy link

You can use the following hooks when you update the Red Hat Enterprise Linux (RHEL) compute machines in your OpenShift Container Platform cluster.

Expand

Hook name	Description
`openshift_node_pre_cordon_hook`	Runs before each node is cordoned. This hook runs against each node in serial. If a task must run against a different host, the task must use `delegate_to` or `local_action`.
`openshift_node_pre_upgrade_hook`	Runs after each node is cordoned but before it is updated. This hook runs against each node in serial. If a task must run against a different host, the task must use `delegate_to` or `local_action`.
`openshift_node_pre_uncordon_hook`	Runs after each node is updated but before it is uncordoned. This hook runs against each node in serial. If a task must run against a different host, they task must use `delegate_to` or `local_action`.
`openshift_node_post_upgrade_hook`	Runs after each node uncordoned. It is the last node update action. This hook runs against each node in serial. If a task must run against a different host, the task must use `delegate_to` or `local_action`.

3.5.4. Updating RHEL compute machines in your cluster
Copy link

After you update your cluster, you must update the Red Hat Enterprise Linux (RHEL) compute machines in your cluster.

Important

Red Hat Enterprise Linux (RHEL) versions 8.6 and later are supported for RHEL compute machines.

You can also update your compute machines to another minor version of OpenShift Container Platform if you are using RHEL as the operating system. You do not need to exclude any RPM packages from RHEL when performing a minor version update.

Important

You cannot update RHEL 7 compute machines to RHEL 8. You must deploy new RHEL 8 hosts, and the old RHEL 7 hosts should be removed.

Prerequisites

You updated your cluster.
Important
Because the RHEL machines require assets that are generated by the cluster to complete the update process, you must update the cluster before you update the RHEL worker machines in it.
You have access to the local machine that you used to add the RHEL compute machines to your cluster. You must have access to the hosts Ansible inventory file that defines your RHEL machines and the upgrade playbook.
For updates to a minor version, the RPM repository is using the same version of OpenShift Container Platform that is running on your cluster.

Procedure

Stop and disable firewalld on the host:
```
# systemctl disable --now firewalld.service
```
Note
By default, the base OS RHEL with "Minimal" installation option enables firewalld service. Having the firewalld service enabled on your host prevents you from accessing OpenShift Container Platform logs on the worker. Do not enable firewalld later if you wish to continue accessing OpenShift Container Platform logs on the worker.
Enable the repositories that are required for OpenShift Container Platform 4.17:
1. On the machine that you run the Ansible playbooks, update the required repositories:
  # subscription-manager repos --disable=rhocp-4.16-for-rhel-8-x86_64-rpms \ --enable=rhocp-4.17-for-rhel-8-x86_64-rpms
  Important
  As of OpenShift Container Platform 4.11, the Ansible playbooks are provided only for RHEL 8. If a RHEL 7 system was used as a host for the OpenShift Container Platform 4.10 Ansible playbooks, you must either update the Ansible host to RHEL 8, or create a new Ansible host on a RHEL 8 system and copy over the inventories from the old Ansible host.
2. On the machine that you run the Ansible playbooks, update the Ansible package:
  # yum swap ansible ansible-core
3. On the machine that you run the Ansible playbooks, update the required packages, including openshift-ansible:
  # yum update openshift-ansible openshift-clients
4. On each RHEL compute node, update the required repositories:
  # subscription-manager repos --disable=rhocp-4.16-for-rhel-8-x86_64-rpms \ --enable=rhocp-4.17-for-rhel-8-x86_64-rpms
Update a RHEL worker machine:
1. Review your Ansible inventory file at /<path>/inventory/hosts and update its contents so that the RHEL 8 machines are listed in the [workers] section, as shown in the following example:
  [all:vars] ansible_user=root #ansible_become=True openshift_kubeconfig_path="~/.kube/config" [workers] mycluster-rhel8-0.example.com mycluster-rhel8-1.example.com mycluster-rhel8-2.example.com mycluster-rhel8-3.example.com
2. Change to the openshift-ansible directory:
  $ cd /usr/share/ansible/openshift-ansible
3. Run the upgrade playbook:
  $ ansible-playbook -i /<path>/inventory/hosts playbooks/upgrade.yml
  1
  1 1
  For <path>, specify the path to the Ansible inventory file that you created.
  Note
  The upgrade playbook only updates the OpenShift Container Platform packages. It does not update the operating system packages.

After you update all of the workers, confirm that all of your cluster nodes have updated to the new version:

# oc get node

Example output

NAME                        STATUS                        ROLES    AGE    VERSION
mycluster-control-plane-0   Ready                         master   145m   v1.30.3
mycluster-control-plane-1   Ready                         master   145m   v1.30.3
mycluster-control-plane-2   Ready                         master   145m   v1.30.3
mycluster-rhel8-0           Ready                         worker   98m    v1.30.3
mycluster-rhel8-1           Ready                         worker   98m    v1.30.3
mycluster-rhel8-2           Ready                         worker   98m    v1.30.3
mycluster-rhel8-3           Ready                         worker   98m    v1.30.3

Optional: Update the operating system packages that were not updated by the upgrade playbook. To update packages that are not on 4.17, use the following command:
```
# yum update
```
Note
You do not need to exclude RPM packages if you are using the same RPM repository that you used when you installed 4.17.

3.6. Updating a cluster in a disconnected environment
Copy link

You can update a cluster in an environment without access to the internet by taking additional steps to prepare your environment.

For information about updating a cluster in a disconnected environment, see About cluster updates in a disconnected environment.

3.7. Updating hardware on nodes running on vSphere
Copy link

You must ensure that your nodes running in vSphere are running on the hardware version supported by OpenShift Container Platform. Currently, hardware version 15 or later is supported for vSphere virtual machines in a cluster. You can update your virtual hardware immediately or schedule an update in vCenter.

Important

Version 4.17 of OpenShift Container Platform requires VMware virtual hardware version 15 or later.
Before upgrading OpenShift 4.12 to OpenShift 4.13, you must update vSphere to v8.0 Update 1 or later; otherwise, the OpenShift 4.12 cluster is marked un-upgradeable.

Warning

Updating custom API certificates triggers the Machine Config Operator (MCO) to initiate a rolling reboot of the control plane nodes. These nodes must be updated serially. Ensure each node returns to a Ready state and the etcd static pods are healthy before the next node in the sequence begins its update. Failure to do so might result in a loss of etcd quorum and cluster-wide downtime.

3.7.1. Updating the virtual hardware for control plane nodes on vSphere
Copy link

You can update the virtual hardware for control plane nodes on vSphere.

To reduce the risk of downtime, it is recommended that control plane nodes be updated serially. This ensures that the Kubernetes API remains available and etcd retains quorum.

Prerequisites

You have cluster administrator permissions to execute the required permissions in the vCenter instance hosting your OpenShift Container Platform cluster.
Your vSphere ESXi hosts are version 8.0 Update 1 or later.

Procedure

List the control plane nodes in your cluster by running the following command:

$ oc get nodes -l node-role.kubernetes.io/master

Example output

NAME                    STATUS   ROLES    AGE   VERSION
control-plane-node-0    Ready    master   75m   v1.30.3
control-plane-node-1    Ready    master   75m   v1.30.3
control-plane-node-2    Ready    master   75m   v1.30.3

Note the names of your control plane nodes.

Mark the control plane node as unschedulable by running the following command:
```
$ oc adm cordon <control_plane_node>
```
Shut down the virtual machine (VM) associated with the control plane node. Do this in the vSphere client by right-clicking the VM and selecting Power Shut Down Guest OS. Do not shut down the VM using Power Off because it might not shut down safely.
Update the VM in the vSphere client. Follow Upgrade the Compatibility of a Virtual Machine Manually (VMware vSphere documentation).
Power on the VM associated with the control plane node. Do this in the vSphere client by right-clicking the VM and selecting Power On.
Run the following command and wait for the node to report as Ready:
```
$ oc wait --for=condition=Ready node/<control_plane_node>
```
Mark the control plane node as schedulable again by running the following command:
```
$ oc adm uncordon <control_plane_node>
```
Repeat this procedure for each control plane node in your cluster.

3.7.2. Updating the virtual hardware for compute nodes on vSphere
Copy link

You can update the virtual hardware for compute nodes on vSphere.

To reduce the risk of downtime, it is recommended that compute nodes be updated serially.

Note

Multiple compute nodes can be updated in parallel given workloads are tolerant of having multiple nodes in a NotReady state. It is the responsibility of the administrator to ensure that the required compute nodes are available.

Prerequisites

You have cluster administrator permissions to execute the required permissions in the vCenter instance hosting your OpenShift Container Platform cluster.
Your vSphere ESXi hosts are version 8.0 Update 1 or later.

Procedure

List the compute nodes in your cluster by running the following command:

$ oc get nodes -l node-role.kubernetes.io/worker

Example output

NAME              STATUS   ROLES    AGE   VERSION
compute-node-0    Ready    worker   30m   v1.30.3
compute-node-1    Ready    worker   30m   v1.30.3
compute-node-2    Ready    worker   30m   v1.30.3

Note the names of your compute nodes.

Mark the compute node as unschedulable by running the following command:
```
$ oc adm cordon <compute_node>
```
Evacuate the pods from the compute node. There are several ways to do this. For example, you can evacuate all or selected pods on a node by running the following command:
```
$ oc adm drain <compute_node> [--pod-selector=<pod_selector>]
```
See "Evacuating pods on nodes" for other options to evacuate pods from a node.
Shut down the virtual machine (VM) associated with the compute node. Do this in the vSphere client by right-clicking the VM and selecting Power Shut Down Guest OS. Do not shut down the VM using Power Off because it might not shut down safely.
Update the VM in the vSphere client. Follow Upgrade the Compatibility of a Virtual Machine Manually (VMware vSphere documentation).
Power on the VM associated with the compute node. Do this in the vSphere client by right-clicking the VM and selecting Power On.
Run the following command and wait for the node to report as Ready:
```
$ oc wait --for=condition=Ready node/<compute_node>
```
Mark the compute node as schedulable again by running the following command:
```
$ oc adm uncordon <compute_node>
```
Repeat this procedure for each compute node in your cluster.

3.7.3. Updating the virtual hardware for template on vSphere
Copy link

You can update the virtual hardware for templates on vSphere.

Prerequisites

You have cluster administrator permissions to execute the required permissions in the vCenter instance hosting your OpenShift Container Platform cluster.
Your vSphere ESXi hosts are version 8.0 Update 1 or later.

Procedure

If the RHCOS template is configured as a vSphere template, follow Convert a Template to a Virtual Machine (VMware vSphere documentation).
Note
Once converted from a template, do not power on the virtual machine.
Update the virtual machine (VM) in the VMware vSphere client. Complete the steps outlined in Upgrade the Compatibility of a Virtual Machine Manually (VMware vSphere documentation).
Important
If you modified the VM settings, those changes might reset after moving to a newer virtual hardware. Please review that all your configured settings are still in place after your upgrade before proceeding to the next step.
Convert the VM in the vSphere client to a template by right-clicking on the VM and then selecting Template Convert to Template.
Important
The steps for converting a VM to a template might change in future vSphere documentation versions.

3.7.4. Scheduled updates for virtual hardware on vSphere
Copy link

Virtual hardware updates can be scheduled to occur when a virtual machine is powered on or rebooted. You can schedule your virtual hardware updates exclusively in vCenter by following Schedule a Compatibility Upgrade for a Virtual Machine (VMware vSphere documentation).

When scheduling an update prior to performing an update of OpenShift Container Platform, the virtual hardware update occurs when the nodes are rebooted during the course of the OpenShift Container Platform update.

3.8. Migrating to a cluster with multi-architecture compute machines
Copy link

You can migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines by updating to a multi-architecture, manifest-listed payload. This allows you to add mixed architecture compute nodes to your cluster.

For information about configuring your multi-architecture compute machines, see "Configuring multi-architecture compute machines on an OpenShift Container Platform cluster".

Before migrating your single-architecture cluster to a cluster with multi-architecture compute machines, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig custom resource. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.

Important

Migration from a multi-architecture payload to a single-architecture payload is not supported. Once a cluster has transitioned to using a multi-architecture payload, it can no longer accept a single-architecture update payload.

3.8.1. Migrating to a cluster with multi-architecture compute machines using the CLI
Copy link

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
Your OpenShift Container Platform version is up to date to at least version 4.13.0.
For more information on how to update your cluster version, see Updating a cluster using the web console or Updating a cluster using the CLI.
You have installed the OpenShift CLI (oc) that matches the version for your current cluster.
Your oc client is updated to at least verion 4.13.0.
Your OpenShift Container Platform cluster is installed on AWS, Azure, Google Cloud, bare metal or IBM P/Z platforms.
For more information on selecting a supported platform for your cluster installation, see Selecting a cluster installation type.

Procedure

Verify that the RetrievedUpdates condition is True in the Cluster Version Operator (CVO) by running the following command:
```
$ oc get clusterversion/version -o=jsonpath="{.status.conditions[?(.type=='RetrievedUpdates')].status}"
```
If the RetrievedUpates condition is False, you can find supplemental information regarding the failure by using the following command:
```
$ oc adm upgrade
```
For more information about cluster version condition types, see Understanding cluster version condition types.
If the condition RetrievedUpdates is False, change the channel to stable-<4.y> or fast-<4.y> with the following command:
```
$ oc adm upgrade channel <channel>
```
After setting the channel, verify if RetrievedUpdates is True.
For more information about channels, see Understanding update channels and releases.
Migrate to the multi-architecture payload with following command:
```
$ oc adm upgrade --to-multi-arch
```

Verification

You can monitor the migration by running the following command:
```
$ oc adm upgrade
```
Example output
```
working towards ${VERSION}: 106 of 841 done (12% complete), waiting on machine-config
```
Important
Machine launches may fail as the cluster settles into the new state. To notice and recover when machines fail to launch, we recommend deploying machine health checks. For more information about machine health checks and how to deploy them, see About machine health checks.
1. Optional: To retrieve more detailed information about the status of your update, monitor the migration by running the following command:
  $ oc adm upgrade status
  For more information about how to use the oc adm upgrade status command, see Gathering cluster update status using oc adm upgrade status (Technology Preview).

The migrations must be complete and all the cluster operators must be stable before you can add compute machine sets with different architectures to your cluster.

3.9. Updating the boot loader on RHCOS nodes using bootupd
Copy link

To update the boot loader on RHCOS nodes using bootupd, you must either run the bootupctl update command on RHCOS machines manually or provide a machine config with a systemd unit.

Unlike grubby or other boot loader tools, bootupd does not manage kernel space configuration such as passing kernel arguments. To configure kernel arguments, see Adding kernel arguments to nodes.

Note

You can use bootupd to update the boot loader to protect against the BootHole vulnerability.

3.9.1. Updating the boot loader manually
Copy link

You can manually inspect the status of the system and update the boot loader by using the bootupctl command-line tool.

Inspect the system status:

# bootupctl status

Example output for x86_64

Component EFI
  Installed: grub2-efi-x64-1:2.04-31.el8_4.1.x86_64,shim-x64-15-8.el8_1.x86_64
  Update: At latest version

Example output for aarch64

Component EFI
  Installed: grub2-efi-aa64-1:2.02-99.el8_4.1.aarch64,shim-aa64-15.4-2.el8_1.aarch64
  Update: At latest version

OpenShift Container Platform clusters initially installed on version 4.4 and older require an explicit adoption phase.
If the system status is Adoptable, perform the adoption:
```
# bootupctl adopt-and-update
```
Example output
```
Updated: grub2-efi-x64-1:2.04-31.el8_4.1.x86_64,shim-x64-15-8.el8_1.x86_64
```
If an update is available, apply the update so that the changes take effect on the next reboot:
```
# bootupctl update
```
Example output
```
Updated: grub2-efi-x64-1:2.04-31.el8_4.1.x86_64,shim-x64-15-8.el8_1.x86_64
```

3.9.2. Updating the bootloader automatically via a machine config
Copy link

Another way to automatically update the boot loader with bootupd is to create a systemd service unit that will update the boot loader as needed on every boot. This unit will run the bootupctl update command during the boot process and will be installed on the nodes via a machine config.

Note

This configuration is not enabled by default as unexpected interruptions of the update operation may lead to unbootable nodes. If you enable this configuration, make sure to avoid interrupting nodes during the boot process while the bootloader update is in progress. The boot loader update operation generally completes quickly thus the risk is low.

Create a Butane config file, 99-worker-bootupctl-update.bu, including the contents of the bootupctl-update.service systemd unit.

Note

The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.17.0. See "Creating machine configs with Butane" for information about Butane.

Example output

variant: openshift
version: 4.17.0
metadata:
  name: 99-worker-chrony

1


  labels:
    machineconfiguration.openshift.io/role: worker

2


systemd:
  units:
  - name: bootupctl-update.service
    enabled: true
    contents: |
      [Unit]
      Description=Bootupd automatic update

      [Service]
      ExecStart=/usr/bin/bootupctl update
      RemainAfterExit=yes

      [Install]
      WantedBy=multi-user.target

1 2: On control plane nodes, substitute master for worker in both of these locations.

Use Butane to generate a MachineConfig object file, 99-worker-bootupctl-update.yaml, containing the configuration to be delivered to the nodes:
```
$ butane 99-worker-bootupctl-update.bu -o 99-worker-bootupctl-update.yaml
```
Apply the configurations in one of two ways:
- If the cluster is not running yet, after you generate manifest files, add the MachineConfig object file to the <installation_directory>/openshift directory, and then continue to create the cluster.
- If the cluster is already running, apply the file:
  $ oc apply -f ./99-worker-bootupctl-update.yaml

3.1. Updating a cluster using the CLICopy linkLink copied to clipboard!

3.1.1. About updating single node OpenShift Container PlatformCopy linkLink copied to clipboard!

3.1.2. Prerequisites for a cluster updateCopy linkLink copied to clipboard!

3.1.3. Pausing a MachineHealthCheck resourceCopy linkLink copied to clipboard!

3.1.4. Updating a cluster by using the CLICopy linkLink copied to clipboard!

3.1.5. Gathering cluster update status using oc adm upgrade status (Technology Preview)Copy linkLink copied to clipboard!

3.1.6. Updating along a conditional update pathCopy linkLink copied to clipboard!

3.1.7. Changing the update server by using the CLICopy linkLink copied to clipboard!

3.2. Updating a cluster using the web consoleCopy linkLink copied to clipboard!

3.2.1. Before updating the OpenShift Container Platform clusterCopy linkLink copied to clipboard!

3.2.2. Changing the update server by using the web consoleCopy linkLink copied to clipboard!

3.2.3. Pausing a MachineHealthCheck resource by using the web consoleCopy linkLink copied to clipboard!

3.2.4. Updating a cluster by using the web consoleCopy linkLink copied to clipboard!

3.2.5. Viewing conditional updates in the web consoleCopy linkLink copied to clipboard!

3.2.6. Performing a canary rollout updateCopy linkLink copied to clipboard!

3.2.7. About updating single node OpenShift Container PlatformCopy linkLink copied to clipboard!

3.3. Performing a Control Plane Only updateCopy linkLink copied to clipboard!

3.3.1. Performing a Control Plane Only updateCopy linkLink copied to clipboard!

3.3.1.1. Control Plane Only update using the web consoleCopy linkLink copied to clipboard!

3.3.1.2. Control Plane Only update using the CLICopy linkLink copied to clipboard!

3.3.1.3. Control Plane Only updates for layered products and Operators installed through Operator Lifecycle ManagerCopy linkLink copied to clipboard!

3.4. Performing a canary rollout updateCopy linkLink copied to clipboard!

3.4.1. Example Canary update strategyCopy linkLink copied to clipboard!

3.4.1.1. Definition of custom machine config poolsCopy linkLink copied to clipboard!

3.4.1.2. Update of the canary worker poolCopy linkLink copied to clipboard!

3.4.1.3. Whether or not to proceed with the remaining worker pool updatesCopy linkLink copied to clipboard!

3.4.2. About the canary rollout update process and MCPsCopy linkLink copied to clipboard!

3.4.2.1. Using custom machine config poolsCopy linkLink copied to clipboard!

3.4.2.2. Considerations when using custom machine config poolsCopy linkLink copied to clipboard!

3.4.3. About performing a canary rollout updateCopy linkLink copied to clipboard!

3.4.4. Creating machine config pools to perform a canary rollout updateCopy linkLink copied to clipboard!

3.4.5. Managing machine configuration inheritance for a worker pool canaryCopy linkLink copied to clipboard!

3.4.6. Pausing the machine config poolsCopy linkLink copied to clipboard!

3.4.7. Performing the cluster updateCopy linkLink copied to clipboard!

3.4.8. Unpausing the machine config poolsCopy linkLink copied to clipboard!

3.4.9. Moving a node to the original machine config poolCopy linkLink copied to clipboard!

3.5. Updating a cluster that includes RHEL compute machinesCopy linkLink copied to clipboard!

3.5.1. PrerequisitesCopy linkLink copied to clipboard!

3.5.2. Updating a cluster by using the web consoleCopy linkLink copied to clipboard!

3.5.3. Optional: Adding hooks to perform Ansible tasks on RHEL machinesCopy linkLink copied to clipboard!

3.5.3.1. About Ansible hooks for updatesCopy linkLink copied to clipboard!

3.5.3.2. Configuring the Ansible inventory file to use hooksCopy linkLink copied to clipboard!

3.5.3.3. Available hooks for RHEL compute machinesCopy linkLink copied to clipboard!

3.5.4. Updating RHEL compute machines in your clusterCopy linkLink copied to clipboard!

3.6. Updating a cluster in a disconnected environmentCopy linkLink copied to clipboard!

3.7. Updating hardware on nodes running on vSphereCopy linkLink copied to clipboard!

3.7.1. Updating the virtual hardware for control plane nodes on vSphereCopy linkLink copied to clipboard!

3.7.2. Updating the virtual hardware for compute nodes on vSphereCopy linkLink copied to clipboard!

3.7.3. Updating the virtual hardware for template on vSphereCopy linkLink copied to clipboard!

3.7.4. Scheduled updates for virtual hardware on vSphereCopy linkLink copied to clipboard!

3.8. Migrating to a cluster with multi-architecture compute machinesCopy linkLink copied to clipboard!

3.8.1. Migrating to a cluster with multi-architecture compute machines using the CLICopy linkLink copied to clipboard!

3.9. Updating the boot loader on RHCOS nodes using bootupdCopy linkLink copied to clipboard!

3.9.1. Updating the boot loader manuallyCopy linkLink copied to clipboard!

3.9.2. Updating the bootloader automatically via a machine configCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. Updating a cluster using the CLI
Copy link

3.1.1. About updating single node OpenShift Container Platform
Copy link

3.1.2. Prerequisites for a cluster update
Copy link

3.1.3. Pausing a MachineHealthCheck resource
Copy link

3.1.4. Updating a cluster by using the CLI
Copy link

3.1.5. Gathering cluster update status using oc adm upgrade status (Technology Preview)
Copy link

3.1.6. Updating along a conditional update path
Copy link

3.1.7. Changing the update server by using the CLI
Copy link

3.2. Updating a cluster using the web console
Copy link

3.2.1. Before updating the OpenShift Container Platform cluster
Copy link

3.2.2. Changing the update server by using the web console
Copy link

3.2.3. Pausing a MachineHealthCheck resource by using the web console
Copy link

3.2.4. Updating a cluster by using the web console
Copy link

3.2.5. Viewing conditional updates in the web console
Copy link

3.2.6. Performing a canary rollout update
Copy link

3.2.7. About updating single node OpenShift Container Platform
Copy link

3.3. Performing a Control Plane Only update
Copy link

3.3.1. Performing a Control Plane Only update
Copy link

3.3.1.1. Control Plane Only update using the web console
Copy link

3.3.1.2. Control Plane Only update using the CLI
Copy link

3.3.1.3. Control Plane Only updates for layered products and Operators installed through Operator Lifecycle Manager
Copy link

3.4. Performing a canary rollout update
Copy link

3.4.1. Example Canary update strategy
Copy link

3.4.1.1. Definition of custom machine config pools
Copy link

3.4.1.2. Update of the canary worker pool
Copy link

3.4.1.3. Whether or not to proceed with the remaining worker pool updates
Copy link

3.4.2. About the canary rollout update process and MCPs
Copy link

3.4.2.1. Using custom machine config pools
Copy link

3.4.2.2. Considerations when using custom machine config pools
Copy link

3.4.3. About performing a canary rollout update
Copy link

3.4.4. Creating machine config pools to perform a canary rollout update
Copy link

3.4.5. Managing machine configuration inheritance for a worker pool canary
Copy link

3.4.6. Pausing the machine config pools
Copy link

3.4.7. Performing the cluster update
Copy link

3.4.8. Unpausing the machine config pools
Copy link

3.4.9. Moving a node to the original machine config pool
Copy link

3.5. Updating a cluster that includes RHEL compute machines
Copy link

3.5.1. Prerequisites
Copy link

3.5.2. Updating a cluster by using the web console
Copy link

3.5.3. Optional: Adding hooks to perform Ansible tasks on RHEL machines
Copy link

3.5.3.1. About Ansible hooks for updates
Copy link

3.5.3.2. Configuring the Ansible inventory file to use hooks
Copy link

3.5.3.3. Available hooks for RHEL compute machines
Copy link

3.5.4. Updating RHEL compute machines in your cluster
Copy link

3.6. Updating a cluster in a disconnected environment
Copy link

3.7. Updating hardware on nodes running on vSphere
Copy link

3.7.1. Updating the virtual hardware for control plane nodes on vSphere
Copy link

3.7.2. Updating the virtual hardware for compute nodes on vSphere
Copy link

3.7.3. Updating the virtual hardware for template on vSphere
Copy link

3.7.4. Scheduled updates for virtual hardware on vSphere
Copy link

3.8. Migrating to a cluster with multi-architecture compute machines
Copy link

3.8.1. Migrating to a cluster with multi-architecture compute machines using the CLI
Copy link

3.9. Updating the boot loader on RHCOS nodes using bootupd
Copy link

3.9.1. Updating the boot loader manually
Copy link

3.9.2. Updating the bootloader automatically via a machine config
Copy link