Chapter 3. Performing a cluster update
3.1. Updating a cluster using the CLI
You can perform minor version and patch updates on an OpenShift Container Platform cluster by using the OpenShift CLI (oc
).
3.1.1. Prerequisites
-
Have access to the cluster as a user with
admin
privileges. See Using RBAC to define and apply permissions. - Have a recent etcd backup in case your update fails and you must restore your cluster to a previous state.
- Have a recent Container Storage Interface (CSI) volume snapshot in case you need to restore persistent volumes due to a pod failure.
- Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.
- You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See Updating installed Operators for more information on how to check compatibility and, if necessary, update the installed Operators.
- Ensure that all machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
- If your cluster uses manually maintained credentials, update the cloud provider resources for the new release. For more information, including how to determine if this is a requirement for your cluster, see Preparing to update a cluster with manually maintained credentials.
-
Ensure that you address all
Upgradeable=False
conditions so the cluster allows an update to the next minor version. An alert displays at the top of the Cluster Settings page when you have one or more cluster Operators that cannot be updated. You can still update to the next available patch update for the minor release you are currently on. - Review the list of APIs that were removed in Kubernetes 1.28, migrate any affected components to use the new API version, and provide the administrator acknowledgment. For more information, see Preparing to update to OpenShift Container Platform 4.16.
-
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If
minAvailable
is set to 1 inPodDisruptionBudget
, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and thePodDisruptionBudget
field can prevent the node drain.
- When an update is failing to complete, the Cluster Version Operator (CVO) reports the status of any blocking components while attempting to reconcile the update. Rolling your cluster back to a previous version is not supported. If your update is failing to complete, contact Red Hat support.
-
Using the
unsupportedConfigOverrides
section to modify the configuration of an Operator is unsupported and might block cluster updates. You must remove this setting before you can update your cluster.
Additional resources
3.1.2. Pausing a MachineHealthCheck resource
During the update process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck
resources before updating the cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
).
Procedure
To list all the available
MachineHealthCheck
resources that you want to pause, run the following command:$ oc get machinehealthcheck -n openshift-machine-api
To pause the machine health checks, add the
cluster.x-k8s.io/paused=""
annotation to theMachineHealthCheck
resource. Run the following command:$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused=""
The annotated
MachineHealthCheck
resource resembles the following YAML file:apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: name: example namespace: openshift-machine-api annotations: cluster.x-k8s.io/paused: "" spec: selector: matchLabels: role: worker unhealthyConditions: - type: "Ready" status: "Unknown" timeout: "300s" - type: "Ready" status: "False" timeout: "300s" maxUnhealthy: "40%" status: currentHealthy: 5 expectedMachines: 5
ImportantResume the machine health checks after updating the cluster. To resume the check, remove the pause annotation from the
MachineHealthCheck
resource by running the following command:$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused-
3.1.3. About updating single node OpenShift Container Platform
You can update, or upgrade, a single-node OpenShift Container Platform cluster by using either the console or CLI.
However, note the following limitations:
-
The prerequisite to pause the
MachineHealthCheck
resources is not required because there is no other node to perform the health check. - Restoring a single-node OpenShift Container Platform cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your update fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.
Updating a single-node OpenShift Container Platform cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:
- If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.
- If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OpenShift Container Platform because there is no other node in the cluster to reschedule the workloads to.
- If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.
There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically.
Additional resources
- For information on which machine configuration changes require a reboot, see the note in About the Machine Config Operator.
3.1.4. Updating a cluster by using the CLI
You can use the OpenShift CLI (oc
) to review and request cluster updates.
You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.
Prerequisites
-
Install the OpenShift CLI (
oc
) that matches the version for your updated version. -
Log in to the cluster as user with
cluster-admin
privileges. -
Pause all
MachineHealthCheck
resources.
Procedure
View the available updates and note the version number of the update that you want to apply:
$ oc adm upgrade
Example output
Cluster version is 4.13.10 Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.13 (available channels: candidate-4.13, candidate-4.14, fast-4.13, stable-4.13) Recommended updates: VERSION IMAGE 4.13.14 quay.io/openshift-release-dev/ocp-release@sha256:406fcc160c097f61080412afcfa7fd65284ac8741ac7ad5b480e304aba73674b 4.13.13 quay.io/openshift-release-dev/ocp-release@sha256:d62495768e335c79a215ba56771ff5ae97e3cbb2bf49ed8fb3f6cefabcdc0f17 4.13.12 quay.io/openshift-release-dev/ocp-release@sha256:73946971c03b43a0dc6f7b0946b26a177c2f3c9d37105441315b4e3359373a55 4.13.11 quay.io/openshift-release-dev/ocp-release@sha256:e1c2377fdae1d063aaddc753b99acf25972b6997ab9a0b7e80cfef627b9ef3dd
Note- If there are no recommended updates, updates that have known issues might still be available. See Updating along a conditional update path for more information.
- For details and information on how to perform a Control Plane Only update, please refer to the Preparing to perform a Control Plane Only update page, listed in the Additional resources section.
Based on your organization requirements, set the appropriate update channel. For example, you can set your channel to
stable-4.13
orfast-4.13
. For more information about channels, refer to Understanding update channels and releases listed in the Additional resources section.$ oc adm upgrade channel <channel>
For example, to set the channel to
stable-4.16
:$ oc adm upgrade channel stable-4.16
ImportantFor production clusters, you must subscribe to a
stable-*
,eus-*
, orfast-*
channel.NoteWhen you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner the update channel is declared, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time.
If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path.
Apply an update:
To update to the latest version:
$ oc adm upgrade --to-latest=true 1
To update to a specific version:
$ oc adm upgrade --to=<version> 1
ImportantWhen using
oc adm upgrade --help
, there is a listed option for--force
. This is heavily discouraged, as using the--force
option bypasses cluster-side guards, including release verification and precondition checks. Using--force
does not guarantee a successful update. Bypassing guards put the cluster at risk.
Review the status of the Cluster Version Operator:
$ oc adm upgrade
After the update completes, you can confirm that the cluster version has updated to the new version:
$ oc adm upgrade
Example output
Cluster version is <version> Upstream is unset, so the cluster will use an appropriate default. Channel: stable-<version> (available channels: candidate-<version>, eus-<version>, fast-<version>, stable-<version>) No updates available. You may force an update to a specific release image, but doing so might not be supported and might result in downtime or data loss.
If you are updating your cluster to the next minor version, such as version X.y to X.(y+1), it is recommended to confirm that your nodes are updated before deploying workloads that rely on a new feature:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION ip-10-0-168-251.ec2.internal Ready master 82m v1.29.4 ip-10-0-170-223.ec2.internal Ready master 82m v1.29.4 ip-10-0-179-95.ec2.internal Ready worker 70m v1.29.4 ip-10-0-182-134.ec2.internal Ready worker 70m v1.29.4 ip-10-0-211-16.ec2.internal Ready master 82m v1.29.4 ip-10-0-250-100.ec2.internal Ready worker 69m v1.29.4
3.1.5. Retrieving information about a cluster update using oc adm upgrade status (Technology Preview)
When updating your cluster, it is useful to understand how your update is progressing. While the oc adm upgrade
command returns limited information about the status of your update, this release introduces the oc adm upgrade status
command as a Technology Preview feature. This command decouples status information from the oc adm upgrade
command and provides specific information regarding a cluster update, including the status of the control plane and worker node updates.
The oc adm upgrade status
command is read-only and will never alter any state in your cluster.
The oc adm upgrade status
command is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The oc adm upgrade status
command can be used for clusters from version 4.12 up to the latest supported release.
While your cluster does not need to be a Technology Preview-enabled cluster, you must enable the OC_ENABLE_CMD_UPGRADE_STATUS
Technology Preview environment variable, otherwise the OpenShift CLI (oc
) will not recognize the command and you will not be able to use the feature.
Procedure
Set the
OC_ENABLE_CMD_UPGRADE_STATUS
environmental variable totrue
by running the following command:$ export OC_ENABLE_CMD_UPGRADE_STATUS=true
Run the
oc adm upgrade status
command:$ oc adm upgrade status
Example 3.1. Example output for an update progressing successfully
= Control Plane = Assessment: Progressing Target Version: 4.14.1 (from 4.14.0) Completion: 97% Duration: 54m Operator Status: 32 Healthy, 1 Unavailable Control Plane Nodes NAME ASSESSMENT PHASE VERSION EST MESSAGE ip-10-0-53-40.us-east-2.compute.internal Progressing Draining 4.14.0 +10m ip-10-0-30-217.us-east-2.compute.internal Outdated Pending 4.14.0 ? ip-10-0-92-180.us-east-2.compute.internal Outdated Pending 4.14.0 ? = Worker Upgrade = = Worker Pool = Worker Pool: worker Assessment: Progressing Completion: 0% Worker Status: 3 Total, 2 Available, 1 Progressing, 3 Outdated, 1 Draining, 0 Excluded, 0 Degraded Worker Pool Nodes NAME ASSESSMENT PHASE VERSION EST MESSAGE ip-10-0-4-159.us-east-2.compute.internal Progressing Draining 4.14.0 +10m ip-10-0-20-162.us-east-2.compute.internal Outdated Pending 4.14.0 ? ip-10-0-99-40.us-east-2.compute.internal Outdated Pending 4.14.0 ? = Worker Pool = Worker Pool: infra Assessment: Progressing Completion: 0% Worker Status: 1 Total, 0 Available, 1 Progressing, 1 Outdated, 1 Draining, 0 Excluded, 0 Degraded Worker Pool Node NAME ASSESSMENT PHASE VERSION EST MESSAGE ip-10-0-4-159-infra.us-east-2.compute.internal Progressing Draining 4.14.0 +10m = Update Health = SINCE LEVEL IMPACT MESSAGE 14m4s Info None Update is proceeding well
With this information, you can make informed decisions on how to proceed with your update.
3.1.6. Updating along a conditional update path
You can update along a recommended conditional update path using the web console or the OpenShift CLI (oc
). When a conditional update is not recommended for your cluster, you can update along a conditional update path using the OpenShift CLI (oc
) 4.10 or later.
Procedure
To view the description of the update when it is not recommended because a risk might apply, run the following command:
$ oc adm upgrade --include-not-recommended
If the cluster administrator evaluates the potential known risks and decides it is acceptable for the current cluster, then the administrator can waive the safety guards and proceed the update by running the following command:
$ oc adm upgrade --allow-not-recommended --to <version> <.>
<.>
<version>
is the update version that you obtained from the output of the previous command, which is supported but also has known issues or risks.
Additional resources
3.1.7. Changing the update server by using the CLI
Changing the update server is optional. If you have an OpenShift Update Service (OSUS) installed and configured locally, you must set the URL for the server as the upstream
to use the local server during updates. The default value for upstream
is https://api.openshift.com/api/upgrades_info/v1/graph
.
Procedure
Change the
upstream
parameter value in the cluster version:$ oc patch clusterversion/version --patch '{"spec":{"upstream":"<update-server-url>"}}' --type=merge
The
<update-server-url>
variable specifies the URL for the update server.Example output
clusterversion.config.openshift.io/version patched
3.2. Updating a cluster using the web console
You can perform minor version and patch updates on an OpenShift Container Platform cluster by using the web console.
Use the web console or oc adm upgrade channel <channel>
to change the update channel. You can follow the steps in Updating a cluster using the CLI to complete the update after you change to a 4.16 channel.
3.2.1. Before updating the OpenShift Container Platform cluster
Before updating, consider the following:
- You have recently backed up etcd.
-
In
PodDisruptionBudget
, ifminAvailable
is set to1
, the nodes are drained to apply pending machine configs that might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and thePodDisruptionBudget
field can prevent the node drain. - You might need to update the cloud provider resources for the new release if your cluster uses manually maintained credentials.
- You must review administrator acknowledgement requests, take any recommended actions, and provide the acknowledgement when you are ready.
- You can perform a partial update by updating the worker or custom pool nodes to accommodate the time it takes to update. You can pause and resume within the progress bar of each pool.
- When an update is failing to complete, the Cluster Version Operator (CVO) reports the status of any blocking components while attempting to reconcile the update. Rolling your cluster back to a previous version is not supported. If your update is failing to complete, contact Red Hat support.
-
Using the
unsupportedConfigOverrides
section to modify the configuration of an Operator is unsupported and might block cluster updates. You must remove this setting before you can update your cluster.
3.2.2. Changing the update server by using the web console
Changing the update server is optional. If you have an OpenShift Update Service (OSUS) installed and configured locally, you must set the URL for the server as the upstream
to use the local server during updates.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
Procedure
-
Navigate to Administration
Cluster Settings, click version. Click the YAML tab and then edit the
upstream
parameter value:Example output
... spec: clusterID: db93436d-7b05-42cc-b856-43e11ad2d31a upstream: '<update-server-url>' 1 ...
- 1
- The
<update-server-url>
variable specifies the URL for the update server.
The default
upstream
ishttps://api.openshift.com/api/upgrades_info/v1/graph
.- Click Save.
Additional resources
3.2.3. Pausing a MachineHealthCheck resource by using the web console
During the update process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck
resources before updating the cluster.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
-
Navigate to Compute
MachineHealthChecks. To pause the machine health checks, add the
cluster.x-k8s.io/paused=""
annotation to eachMachineHealthCheck
resource. For example, to add the annotation to themachine-api-termination-handler
resource, complete the following steps:-
Click the Options menu
next to the
machine-api-termination-handler
and click Edit annotations. - In the Edit annotations dialog, click Add more.
-
In the Key and Value fields, add
cluster.x-k8s.io/paused
and""
values, respectively, and click Save.
-
Click the Options menu
next to the
3.2.4. Updating a cluster by using the web console
If updates are available, you can update your cluster from the web console.
You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.
Prerequisites
-
Have access to the web console as a user with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
-
Pause all
MachineHealthCheck
resources. - You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
- Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
- Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.
Procedure
-
From the web console, click Administration
Cluster Settings and review the contents of the Details tab. For production clusters, ensure that the Channel is set to the correct channel for the version that you want to update to, such as
stable-4.16
.ImportantFor production clusters, you must subscribe to a
stable-*
,eus-*
orfast-*
channel.NoteWhen you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner the update channel is declared, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time.
If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path.
- If the Update status is not Updates available, you cannot update your cluster.
- Select channel indicates the cluster version that your cluster is running or is updating to.
Select a version to update to, and click Save.
The Input channel Update status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.
NoteIf you are updating your cluster to the next minor version, for example from version 4.10 to 4.11, confirm that your nodes are updated before deploying workloads that rely on a new feature. Any pools with worker nodes that are not yet updated are displayed on the Cluster Settings page.
After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.
- If updates are available, continue to perform updates in the current channel until you can no longer update.
-
If no updates are available, change the Channel to the
stable-*
,eus-*
orfast-*
channel for the next minor version, and update to the version that you want in that channel.
You might need to perform several intermediate updates until you reach the version that you want.
Additional resources
3.2.5. Viewing conditional updates in the web console
You can view and assess the risks associated with particular updates with conditional updates.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
-
Pause all
MachineHealthCheck
resources. - You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
- Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing an advanced update strategy, such as a canary rollout, an EUS update, or a control-plane update.
Procedure
-
From the web console, click Administration
Cluster settings page and review the contents of the Details tab. You can enable the
Include versions with known issues
feature in the Select new version dropdown of the Update cluster modal to populate the dropdown list with conditional updates.NoteIf a version with known issues is selected, more information is provided with potential risks that are associated with the version.
- Review the notification detailing the potential risks to updating.
Additional resources
3.2.6. Performing a canary rollout update
In some specific use cases, you might want a more controlled update process where you do not want specific nodes updated concurrently with the rest of the cluster. These use cases include, but are not limited to:
- You have mission-critical applications that you do not want unavailable during the update. You can slowly test the applications on your nodes in small batches after the update.
- You have a small maintenance window that does not allow the time for all nodes to be updated, or you have multiple maintenance windows.
The rolling update process is not a typical update workflow. With larger clusters, it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that can affect the entire cluster. It is recommended that you carefully consider whether your organization wants to use a rolling update and carefully plan the implementation of the process before you start.
The rolling update process described in this topic involves:
- Creating one or more custom machine config pools (MCPs).
- Labeling each node that you do not want to update immediately to move those nodes to the custom MCPs.
- Pausing those custom MCPs, which prevents updates to those nodes.
- Performing the cluster update.
- Unpausing one custom MCP, which triggers the update on those nodes.
- Testing the applications on those nodes to make sure the applications work as expected on those newly-updated nodes.
- Optionally removing the custom labels from the remaining nodes in small batches and testing the applications on those nodes.
Pausing an MCP should be done with careful consideration and for short periods of time only.
If you want to use the canary rollout update process, see Performing a canary rollout update.
3.2.7. About updating single node OpenShift Container Platform
You can update, or upgrade, a single-node OpenShift Container Platform cluster by using either the console or CLI.
However, note the following limitations:
-
The prerequisite to pause the
MachineHealthCheck
resources is not required because there is no other node to perform the health check. - Restoring a single-node OpenShift Container Platform cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your update fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.
Updating a single-node OpenShift Container Platform cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:
- If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.
- If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OpenShift Container Platform because there is no other node in the cluster to reschedule the workloads to.
- If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.
There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically.
Additional resources
3.3. Performing a Control Plane Only update
Due to fundamental Kubernetes design, all OpenShift Container Platform updates between minor versions must be serialized. You must update from OpenShift Container Platform <4.y> to <4.y+1>, and then to <4.y+2>. You cannot update from OpenShift Container Platform <4.y> to <4.y+2> directly. However, administrators who want to update between two Extended Update Support (EUS) versions can do so incurring only a single reboot of non-control plane hosts.
This update was previously known as an EUS-to-EUS update and is now referred to as a Control Plane Only update. These updates are only viable between even-numbered minor versions of OpenShift Container Platform.
There are several caveats to consider when attempting a Control Plane Only update.
-
Control Plane Only updates are only offered after updates between all versions involved have been made available in
stable
channels. - If you encounter issues during or after updating to the odd-numbered minor version but before updating to the next even-numbered version, then remediation of those issues may require that non-control plane hosts complete the update to the odd-numbered version before moving forward.
- You can do a partial update by updating the worker or custom pool nodes to accommodate the time it takes for maintenance.
- Until the machine config pools are unpaused and the update is complete, some features and bugs fixes in <4.y+1> and <4.y+2> of OpenShift Container Platform are not available.
-
All the clusters might update using EUS channels for a conventional update without pools paused, but only clusters with non control-plane
MachineConfigPools
objects can do Control Plane Only updates with pools paused.
3.3.1. Performing a Control Plane Only update
The following procedure pauses all non-master machine config pools and performs updates from OpenShift Container Platform <4.y> to <4.y+1> to <4.y+2>, then unpauses the previously paused machine config pools. Following this procedure reduces the total update duration and the number of times worker nodes are restarted.
Prerequisites
- Review the release notes for OpenShift Container Platform <4.y+1> and <4.y+2>
- Review the release notes and product lifecycles for any layered products and Operator Lifecycle Manager (OLM) Operators. Some may require updates either before or during a Control Plane Only update.
- Ensure that you are familiar with version-specific prerequisites, such as the removal of deprecated APIs, that are required prior to updating from OpenShift Container Platform <4.y+1> to <4.y+2>.
3.3.1.1. Performing a Control Plane Only update using the web console
Prerequisites
- Verify that machine config pools are unpaused.
-
Have access to the web console as a user with
admin
privileges.
Procedure
- Using the Administrator perspective on the web console, update any Operator Lifecycle Manager (OLM) Operators to the versions that are compatible with your intended updated version. You can find more information on how to perform this action in "Updating installed Operators"; see "Additional resources".
Verify that all machine config pools display a status of
Up to date
and that no machine config pool displays a status ofUPDATING
.To view the status of all machine config pools, click Compute
MachineConfigPools and review the contents of the Update status column. NoteIf your machine config pools have an
Updating
status, please wait for this status to change toUp to date
. This process could take several minutes.Set your channel to
eus-<4.y+2>
.To set your channel, click Administration
Cluster Settings Channel. You can edit your channel by clicking on the current hyperlinked channel. - Pause all worker machine pools except for the master pool. You can perform this action on the MachineConfigPools tab under the Compute page. Select the vertical ellipses next to the machine config pool you’d like to pause and click Pause updates.
- Update to version <4.y+1> and complete up to the Save step. You can find more information on how to perform these actions in "Updating a cluster by using the web console"; see "Additional resources".
- Ensure that the <4.y+1> updates are complete by viewing the Last completed version of your cluster. You can find this information on the Cluster Settings page under the Details tab.
- If necessary, update your OLM Operators by using the Administrator perspective on the web console. You can find more information on how to perform these actions in "Updating installed Operators"; see "Additional resources".
- Update to version <4.y+2> and complete up to the Save step. You can find more information on how to perform these actions in "Updating a cluster by using the web console"; see "Additional resources".
- Ensure that the <4.y+2> update is complete by viewing the Last completed version of your cluster. You can find this information on the Cluster Settings page under the Details tab.
Unpause all previously paused machine config pools. You can perform this action on the MachineConfigPools tab under the Compute page. Select the vertical ellipses next to the machine config pool you’d like to unpause and click Unpause updates.
ImportantIf pools are paused, the cluster is not permitted to upgrade to any future minor versions, and some maintenance tasks are inhibited. This puts the cluster at risk for future degradation.
Verify that your previously paused pools are updated and that your cluster has completed the update to version <4.y+2>.
You can verify that your pools have updated on the MachineConfigPools tab under the Compute page by confirming that the Update status has a value of Up to date.
ImportantWhen you update a cluster that contains Red Hat Enterprise Linux (RHEL) compute machines, those machines temporarily become unavailable during the update process. You must run the upgrade playbook against each RHEL machine as it enters the
NotReady
state for the cluster to finish updating. For more information, see "Updating a cluster that includes RHEL compute machines" in the additional resources section.You can verify that your cluster has completed the update by viewing the Last completed version of your cluster. You can find this information on the Cluster Settings page under the Details tab.
3.3.1.2. Performing a Control Plane Only update using the CLI
Prerequisites
- Verify that machine config pools are unpaused.
-
Update the OpenShift CLI (
oc
) to the target version before each update.
It is highly discouraged to skip this prerequisite. If the OpenShift CLI (oc
) is not updated to the target version before your update, unexpected issues may occur.
Procedure
- Using the Administrator perspective on the web console, update any Operator Lifecycle Manager (OLM) Operators to the versions that are compatible with your intended updated version. You can find more information on how to perform this action in "Updating installed Operators"; see "Additional resources".
Verify that all machine config pools display a status of
UPDATED
and that no machine config pool displays a status ofUPDATING
. To view the status of all machine config pools, run the following command:$ oc get mcp
Example output
NAME CONFIG UPDATED UPDATING master rendered-master-ecbb9582781c1091e1c9f19d50cf836c True False worker rendered-worker-00a3f0c68ae94e747193156b491553d5 True False
Your current version is <4.y>, and your intended version to update is <4.y+2>. Change to the
eus-<4.y+2>
channel by running the following command:$ oc adm upgrade channel eus-<4.y+2>
NoteIf you receive an error message indicating that
eus-<4.y+2>
is not one of the available channels, this indicates that Red Hat is still rolling out EUS version updates. This rollout process generally takes 45-90 days starting at the GA date.Pause all worker machine pools except for the master pool by running the following command:
$ oc patch mcp/worker --type merge --patch '{"spec":{"paused":true}}'
NoteYou cannot pause the master pool.
Update to the latest version by running the following command:
$ oc adm upgrade --to-latest
Example output
Updating to latest version <4.y+1.z>
Review the cluster version to ensure that the updates are complete by running the following command:
$ oc adm upgrade
Example output
Cluster version is <4.y+1.z> ...
Update to version <4.y+2> by running the following command:
$ oc adm upgrade --to-latest
Retrieve the cluster version to ensure that the <4.y+2> updates are complete by running the following command:
$ oc adm upgrade
Example output
Cluster version is <4.y+2.z> ...
To update your worker nodes to <4.y+2>, unpause all previously paused machine config pools by running the following command:
$ oc patch mcp/worker --type merge --patch '{"spec":{"paused":false}}'
ImportantIf pools are not unpaused, the cluster is not permitted to update to any future minor versions, and some maintenance tasks are inhibited. This puts the cluster at risk for future degradation.
Verify that your previously paused pools are updated and that the update to version <4.y+2> is complete by running the following command:
$ oc get mcp
ImportantWhen you update a cluster that contains Red Hat Enterprise Linux (RHEL) compute machines, those machines temporarily become unavailable during the update process. You must run the upgrade playbook against each RHEL machine as it enters the
NotReady
state for the cluster to finish updating. For more information, see "Updating a cluster that includes RHEL compute machines" in the additional resources section.Example output
NAME CONFIG UPDATED UPDATING master rendered-master-52da4d2760807cb2b96a3402179a9a4c True False worker rendered-worker-4756f60eccae96fb9dcb4c392c69d497 True False
Additional resources
3.3.1.3. Performing a Control Plane Only update for layered products and Operators installed through Operator Lifecycle Manager
In addition to the Control Plane Only update steps mentioned for the web console and CLI, there are additional steps to consider when performing Control Plane Only updates for clusters with the following:
- Layered products
- Operators installed through Operator Lifecycle Manager (OLM)
What is a layered product?
Layered products refer to products that are made of multiple underlying products that are intended to be used together and cannot be broken into individual subscriptions. For examples of layered OpenShift Container Platform products, see Layered Offering On OpenShift.
As you perform a Control Plane Only update for the clusters of layered products and those of Operators that have been installed through OLM, you must complete the following:
- You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
- Confirm the cluster version compatibility between the current and intended Operator versions. You can verify which versions your OLM Operators are compatible with by using the Red Hat OpenShift Container Platform Operator Update Information Checker.
As an example, here are the steps to perform a Control Plane Only update from <4.y> to <4.y+2> for OpenShift Data Foundation (ODF). This can be done through the CLI or web console. For information about how to update clusters through your desired interface, see Performing a Control Plane Only update using the web console and Performing a Control Plane Only update using the CLI in "Additional resources".
Example workflow
- Pause the worker machine pools.
-
Update OpenShift <4.y>
OpenShift <4.y+1>. -
Update ODF <4.y>
ODF <4.y+1>. -
Update OpenShift <4.y+1>
OpenShift <4.y+2>. - Update to ODF <4.y+2>.
- Unpause the worker machine pools.
The update to ODF <4.y+2> can happen before or after worker machine pools have been unpaused.
3.4. Performing a canary rollout update
A canary update is an update strategy where worker node updates are performed in discrete, sequential stages instead of updating all worker nodes at the same time. This strategy can be useful in the following scenarios:
- You want a more controlled rollout of worker node updates to ensure that mission-critical applications stay available during the whole update, even if the update process causes your applications to fail.
- You want to update a small subset of worker nodes, evaluate cluster and workload health over a period of time, and then update the remaining nodes.
- You want to fit worker node updates, which often require a host reboot, into smaller defined maintenance windows when it is not possible to take a large maintenance window to update the entire cluster at one time.
In these scenarios, you can create multiple custom machine config pools (MCPs) to prevent certain worker nodes from updating when you update the cluster. After the rest of the cluster is updated, you can update those worker nodes in batches at appropriate times.
3.4.1. Example Canary update strategy
The following example describes a canary update strategy where you have a cluster with 100 nodes with 10% excess capacity, you have maintenance windows that must not exceed 4 hours, and you know that it takes no longer than 8 minutes to drain and reboot a worker node.
The previous values are an example only. The time it takes to drain a node might vary depending on factors such as workloads.
Defining custom machine config pools
In order to organize the worker node updates into separate stages, you can begin by defining the following MCPs:
- workerpool-canary with 10 nodes
- workerpool-A with 30 nodes
- workerpool-B with 30 nodes
- workerpool-C with 30 nodes
Updating the canary worker pool
During your first maintenance window, you pause the MCPs for workerpool-A, workerpool-B, and workerpool-C, and then initiate the cluster update. This updates components that run on top of OpenShift Container Platform and the 10 nodes that are part of the unpaused workerpool-canary MCP. The other three MCPs are not updated because they were paused.
Determining whether to proceed with the remaining worker pool updates
If for some reason you determine that your cluster or workload health was negatively affected by the workerpool-canary update, you then cordon and drain all nodes in that pool while still maintaining sufficient capacity until you have diagnosed and resolved the problem. When everything is working as expected, you evaluate the cluster and workload health before deciding to unpause, and thus update, workerpool-A, workerpool-B, and workerpool-C in succession during each additional maintenance window.
Managing worker node updates using custom MCPs provides flexibility, however it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that might affect the entire cluster. It is recommended that you carefully consider your organizational needs and carefully plan the implementation of the process before you start.
Pausing a machine config pool prevents the Machine Config Operator from applying any configuration changes on the associated nodes. Pausing an MCP also prevents any automatically rotated certificates from being pushed to the associated nodes, including the automatic CA rotation of the kube-apiserver-to-kubelet-signer
CA certificate.
If the MCP is paused when the kube-apiserver-to-kubelet-signer
CA certificate expires and the MCO attempts to automatically renew the certificate, the MCO cannot push the newly rotated certificates to those nodes. This causes failure in multiple oc
commands, including oc debug
, oc logs
, oc exec
, and oc attach
. You receive alerts in the Alerting UI of the OpenShift Container Platform web console if an MCP is paused when the certificates are rotated.
Pausing an MCP should be done with careful consideration about the kube-apiserver-to-kubelet-signer
CA certificate expiration and for short periods of time only.
It is not recommended to update the MCPs to different OpenShift Container Platform versions. For example, do not update one MCP from 4.y.10 to 4.y.11 and another to 4.y.12. This scenario has not been tested and might result in an undefined cluster state.
3.4.2. About the canary rollout update process and MCPs
In OpenShift Container Platform, nodes are not considered individually. Instead, they are grouped into machine config pools (MCPs). By default, nodes in an OpenShift Container Platform cluster are grouped into two MCPs: one for the control plane nodes and one for the worker nodes. An OpenShift Container Platform update affects all MCPs concurrently.
During the update, the Machine Config Operator (MCO) drains and cordons all nodes within an MCP up to the specified maxUnavailable
number of nodes, if a max number is specified. By default, maxUnavailable
is set to 1
. Draining and cordoning a node deschedules all pods on the node and marks the node as unschedulable.
After the node is drained, the Machine Config Daemon applies a new machine configuration, which can include updating the operating system (OS). Updating the OS requires the host to reboot.
Using custom machine config pools
To prevent specific nodes from being updated, you can create custom MCPs. Because the MCO does not update nodes within paused MCPs, you can pause the MCPs containing nodes that you do not want to update before initiating a cluster update.
Using one or more custom MCPs can give you more control over the sequence in which you update your worker nodes. For example, after you update the nodes in the first MCP, you can verify the application compatibility and then update the rest of the nodes gradually to the new version.
The default setting for maxUnavailable
is 1
for all the machine config pools in OpenShift Container Platform. It is recommended to not change this value and update one control plane node at a time. Do not change this value to 3
for the control plane pool.
To ensure the stability of the control plane, creating a custom MCP from the control plane nodes is not supported. The Machine Config Operator (MCO) ignores any custom MCP created for the control plane nodes.
Considerations when using custom machine config pools
Give careful consideration to the number of MCPs that you create and the number of nodes in each MCP, based on your workload deployment topology. For example, if you must fit updates into specific maintenance windows, you must know how many nodes OpenShift Container Platform can update within a given window. This number is dependent on your unique cluster and workload characteristics.
You must also consider how much extra capacity is available in your cluster to determine the number of custom MCPs and the amount of nodes within each MCP. In a case where your applications fail to work as expected on newly updated nodes, you can cordon and drain those nodes in the pool, which moves the application pods to other nodes. However, you must determine whether the available nodes in the remaining MCPs can provide sufficient quality-of-service (QoS) for your applications.
You can use this update process with all documented OpenShift Container Platform update processes. However, the process does not work with Red Hat Enterprise Linux (RHEL) machines, which are updated using Ansible playbooks.
3.4.3. About performing a canary rollout update
The following steps outline the high-level workflow of the canary rollout update process:
Create custom machine config pools (MCP) based on the worker pool.
NoteYou can change the
maxUnavailable
setting in an MCP to specify the percentage or the number of machines that can be updating at any given time. The default is1
.WarningThe default setting for
maxUnavailable
is1
for all the machine config pools in OpenShift Container Platform. It is recommended to not change this value and update one control plane node at a time. Do not change this value to3
for the control plane pool.Add a node selector to the custom MCPs. For each node that you do not want to update simultaneously with the rest of the cluster, add a matching label to the nodes. This label associates the node to the MCP.
ImportantDo not remove the default worker label from the nodes. The nodes must have a role label to function properly in the cluster.
- Pause the MCPs you do not want to update as part of the update process.
- Perform the cluster update. The update process updates the MCPs that are not paused, including the control plane nodes.
- Test your applications on the updated nodes to ensure they are working as expected.
- Unpause one of the remaining MCPs, wait for the nodes in that pool to finish updating, and test the applications on those nodes. Repeat this process until all worker nodes are updated.
- Optional: Remove the custom label from updated nodes and delete the custom MCPs.
3.4.4. Creating machine config pools to perform a canary rollout update
To perform a canary rollout update, you must first create one or more custom machine config pools (MCP).
Procedure
List the worker nodes in your cluster by running the following command:
$ oc get -l 'node-role.kubernetes.io/master!=' -o 'jsonpath={range .items[*]}{.metadata.name}{"\n"}{end}' nodes
Example output
ci-ln-pwnll6b-f76d1-s8t9n-worker-a-s75z4 ci-ln-pwnll6b-f76d1-s8t9n-worker-b-dglj2 ci-ln-pwnll6b-f76d1-s8t9n-worker-c-lldbm
For each node that you want to delay, add a custom label to the node by running the following command:
$ oc label node <node_name> node-role.kubernetes.io/<custom_label>=
For example:
$ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary=
Example output
node/ci-ln-gtrwm8t-f76d1-spbl7-worker-a-xk76k labeled
Create the new MCP:
Create an MCP YAML file:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: workerpool-canary 1 spec: machineConfigSelector: matchExpressions: - { key: machineconfiguration.openshift.io/role, operator: In, values: [worker,workerpool-canary] 2 } nodeSelector: matchLabels: node-role.kubernetes.io/workerpool-canary: "" 3
Create the
MachineConfigPool
object by running the following command:$ oc create -f <file_name>
Example output
machineconfigpool.machineconfiguration.openshift.io/workerpool-canary created
View the list of MCPs in the cluster and their current state by running the following command:
$ oc get machineconfigpool
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-b0bb90c4921860f2a5d8a2f8137c1867 True False False 3 3 3 0 97m workerpool-canary rendered-workerpool-canary-87ba3dec1ad78cb6aecebf7fbb476a36 True False False 1 1 1 0 2m42s worker rendered-worker-87ba3dec1ad78cb6aecebf7fbb476a36 True False False 2 2 2 0 97m
The new machine config pool,
workerpool-canary
, is created and the number of nodes to which you added the custom label are shown in the machine counts. The worker MCP machine counts are reduced by the same number. It can take several minutes to update the machine counts. In this example, one node was moved from theworker
MCP to theworkerpool-canary
MCP.
3.4.5. Managing machine configuration inheritance for a worker pool canary
You can configure a machine config pool (MCP) canary to inherit any MachineConfig
assigned to an existing MCP. This configuration is useful when you want to use an MCP canary to test as you update nodes one at a time for an existing MCP.
Prerequisites
- You have created one or more MCPs.
Procedure
Create a secondary MCP as described in the following two steps:
Save the following configuration file as
machineConfigPool.yaml
.Example
machineConfigPool
YAMLapiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-perf spec: machineConfigSelector: matchExpressions: - { key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-perf] } nodeSelector: matchLabels: node-role.kubernetes.io/worker-perf: "" # ...
Create the new machine config pool by running the following command:
$ oc create -f machineConfigPool.yaml
Example output
machineconfigpool.machineconfiguration.openshift.io/worker-perf created
Add some machines to the secondary MCP. The following example labels the worker nodes
worker-a
,worker-b
, andworker-c
to the MCPworker-perf
:$ oc label node worker-a node-role.kubernetes.io/worker-perf=''
$ oc label node worker-b node-role.kubernetes.io/worker-perf=''
$ oc label node worker-c node-role.kubernetes.io/worker-perf=''
Create a new
MachineConfig
for the MCPworker-perf
as described in the following two steps:Save the following
MachineConfig
example as a file callednew-machineconfig.yaml
:Example
MachineConfig
YAMLapiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker-perf name: 06-kdump-enable-worker-perf spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump.service kernelArguments: - crashkernel=512M # ...
Apply the
MachineConfig
by running the following command:$ oc create -f new-machineconfig.yaml
Create the new canary MCP and add machines from the MCP you created in the previous steps. The following example creates an MCP called
worker-perf-canary
, and adds machines from theworker-perf
MCP that you previosuly created.Label the canary worker node
worker-a
by running the following command:$ oc label node worker-a node-role.kubernetes.io/worker-perf-canary=''
Remove the canary worker node
worker-a
from the original MCP by running the following command:$ oc label node worker-a node-role.kubernetes.io/worker-perf-
Save the following file as
machineConfigPool-Canary.yaml
.Example
machineConfigPool-Canary.yaml
fileapiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-perf-canary spec: machineConfigSelector: matchExpressions: - { key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-perf,worker-perf-canary] 1 } nodeSelector: matchLabels: node-role.kubernetes.io/worker-perf-canary: ""
- 1
- Optional value. This example includes
worker-perf-canary
as an additional value. You can use a value in this way to configure members of an additionalMachineConfig
.
Create the new
worker-perf-canary
by running the following command:$ oc create -f machineConfigPool-Canary.yaml
Example output
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary created
Check if the
MachineConfig
is inherited inworker-perf-canary
.Verify that no MCP is degraded by running the following command:
$ oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-2bf1379b39e22bae858ea1a3ff54b2ac True False False 3 3 3 0 5d16h worker rendered-worker-b9576d51e030413cfab12eb5b9841f34 True False False 0 0 0 0 5d16h worker-perf rendered-worker-perf-b98a1f62485fa702c4329d17d9364f6a True False False 2 2 2 0 56m worker-perf-canary rendered-worker-perf-canary-b98a1f62485fa702c4329d17d9364f6a True False False 1 1 1 0 44m
Verify that the machines are inherited from
worker-perf
intoworker-perf-canary
.$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION ... worker-a Ready worker,worker-perf-canary 5d15h v1.27.13+e709aa5 worker-b Ready worker,worker-perf 5d15h v1.27.13+e709aa5 worker-c Ready worker,worker-perf 5d15h v1.27.13+e709aa5
Verify that
kdump
service is enabled onworker-a
by running the following command:$ systemctl status kdump.service
Example output
NAME STATUS ROLES AGE VERSION ... kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: disabled) Active: active (exited) since Tue 2024-09-03 12:44:43 UTC; 10s ago Process: 4151139 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS) Main PID: 4151139 (code=exited, status=0/SUCCESS)
Verify that the MCP has updated the
crashkernel
by running the following command:$ cat /proc/cmdline
The output should include the updated
crashekernel
value, for example:Example output
crashkernel=512M
Optional: If you are satisfied with the upgrade, you can return
worker-a
toworker-perf
.Return
worker-a
toworker-perf
by running the following command:$ oc label node worker-a node-role.kubernetes.io/worker-perf=''
Remove
worker-a
from the canary MCP by running the following command:$ oc label node worker-a node-role.kubernetes.io/worker-perf-canary-
3.4.6. Pausing the machine config pools
After you create your custom machine config pools (MCPs), you then pause those MCPs. Pausing an MCP prevents the Machine Config Operator (MCO) from updating the nodes associated with that MCP.
Procedure
Patch the MCP that you want paused by running the following command:
$ oc patch mcp/<mcp_name> --patch '{"spec":{"paused":true}}' --type=merge
For example:
$ oc patch mcp/workerpool-canary --patch '{"spec":{"paused":true}}' --type=merge
Example output
machineconfigpool.machineconfiguration.openshift.io/workerpool-canary patched
3.4.7. Performing the cluster update
After the machine config pools (MCP) enter a ready state, you can perform the cluster update. See one of the following update methods, as appropriate for your cluster:
After the cluster update is complete, you can begin to unpause the MCPs one at a time.
3.4.8. Unpausing the machine config pools
After the OpenShift Container Platform update is complete, unpause your custom machine config pools (MCP) one at a time. Unpausing an MCP allows the Machine Config Operator (MCO) to update the nodes associated with that MCP.
Procedure
Patch the MCP that you want to unpause:
$ oc patch mcp/<mcp_name> --patch '{"spec":{"paused":false}}' --type=merge
For example:
$ oc patch mcp/workerpool-canary --patch '{"spec":{"paused":false}}' --type=merge
Example output
machineconfigpool.machineconfiguration.openshift.io/workerpool-canary patched
Optional: Check the progress of the update by using one of the following options:
-
Check the progress from the web console by clicking Administration
Cluster settings. Check the progress by running the following command:
$ oc get machineconfigpools
-
Check the progress from the web console by clicking Administration
- Test your applications on the updated nodes to ensure that they are working as expected.
- Repeat this process for any other paused MCPs, one at a time.
In case of a failure, such as your applications not working on the updated nodes, you can cordon and drain the nodes in the pool, which moves the application pods to other nodes to help maintain the quality-of-service for the applications. This first MCP should be no larger than the excess capacity.
3.4.9. Moving a node to the original machine config pool
After you update and verify applications on nodes in a custom machine config pool (MCP), move the nodes back to their original MCP by removing the custom label that you added to the nodes.
A node must have a role to be properly functioning in the cluster.
Procedure
For each node in a custom MCP, remove the custom label from the node by running the following command:
$ oc label node <node_name> node-role.kubernetes.io/<custom_label>-
For example:
$ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary-
Example output
node/ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz labeled
The Machine Config Operator moves the nodes back to the original MCP and reconciles the node to the MCP configuration.
To ensure that node has been removed from the custom MCP, view the list of MCPs in the cluster and their current state by running the following command:
$ oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-1203f157d053fd987c7cbd91e3fbc0ed True False False 3 3 3 0 61m workerpool-canary rendered-mcp-noupdate-5ad4791166c468f3a35cd16e734c9028 True False False 0 0 0 0 21m worker rendered-worker-5ad4791166c468f3a35cd16e734c9028 True False False 3 3 3 0 61m
When the node is removed from the custom MCP and moved back to the original MCP, it can take several minutes to update the machine counts. In this example, one node was moved from the removed
workerpool-canary
MCP to theworker
MCP.Optional: Delete the custom MCP by running the following command:
$ oc delete mcp <mcp_name>
3.5. Updating a cluster that includes RHEL compute machines
You can perform minor version and patch updates on an OpenShift Container Platform cluster. If your cluster contains Red Hat Enterprise Linux (RHEL) machines, you must take additional steps to update those machines.
The use of RHEL compute machines on OpenShift Container Platform clusters has been deprecated and will be removed in a future release.
3.5.1. Prerequisites
-
Have access to the cluster as a user with
admin
privileges. See Using RBAC to define and apply permissions. - Have a recent etcd backup in case your update fails and you must restore your cluster to a previous state.
- Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.
- If your cluster uses manually maintained credentials, update the cloud provider resources for the new release. For more information, including how to determine if this is a requirement for your cluster, see Preparing to update a cluster with manually maintained credentials.
-
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If
minAvailable
is set to 1 inPodDisruptionBudget
, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and thePodDisruptionBudget
field can prevent the node drain.
Additional resources
3.5.2. Updating a cluster by using the web console
If updates are available, you can update your cluster from the web console.
You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.
Prerequisites
-
Have access to the web console as a user with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
-
Pause all
MachineHealthCheck
resources. - You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
- Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
- Your RHEL7 workers are replaced with RHEL8 or RHCOS workers. Red Hat does not support in-place RHEL7 to RHEL8 updates for RHEL workers; those hosts must be replaced with a clean operating system install.
Procedure
-
From the web console, click Administration
Cluster Settings and review the contents of the Details tab. For production clusters, ensure that the Channel is set to the correct channel for the version that you want to update to, such as
stable-4.16
.ImportantFor production clusters, you must subscribe to a
stable-*
,eus-*
orfast-*
channel.NoteWhen you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner the update channel is declared, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time.
If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path.
- If the Update status is not Updates available, you cannot update your cluster.
- Select channel indicates the cluster version that your cluster is running or is updating to.
Select a version to update to, and click Save.
The Input channel Update status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.
NoteIf you are updating your cluster to the next minor version, for example from version 4.10 to 4.11, confirm that your nodes are updated before deploying workloads that rely on a new feature. Any pools with worker nodes that are not yet updated are displayed on the Cluster Settings page.
After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.
- If updates are available, continue to perform updates in the current channel until you can no longer update.
-
If no updates are available, change the Channel to the
stable-*
,eus-*
orfast-*
channel for the next minor version, and update to the version that you want in that channel.
You might need to perform several intermediate updates until you reach the version that you want.
ImportantWhen you update a cluster that contains Red Hat Enterprise Linux (RHEL) worker machines, those workers temporarily become unavailable during the update process. You must run the update playbook against each RHEL machine as it enters the
NotReady
state for the cluster to finish updating.
Additional resources
3.5.3. Optional: Adding hooks to perform Ansible tasks on RHEL machines
You can use hooks to run Ansible tasks on the RHEL compute machines during the OpenShift Container Platform update.
3.5.3.1. About Ansible hooks for updates
When you update OpenShift Container Platform, you can run custom tasks on your Red Hat Enterprise Linux (RHEL) nodes during specific operations by using hooks. Hooks allow you to provide files that define tasks to run before or after specific update tasks. You can use hooks to validate or modify custom infrastructure when you update the RHEL compute nodes in you OpenShift Container Platform cluster.
Because when a hook fails, the operation fails, you must design hooks that are idempotent, or can run multiple times and provide the same results.
Hooks have the following important limitations: - Hooks do not have a defined or versioned interface. They can use internal openshift-ansible
variables, but it is possible that the variables will be modified or removed in future OpenShift Container Platform releases. - Hooks do not have error handling, so an error in a hook halts the update process. If you get an error, you must address the problem and then start the update again.
3.5.3.2. Configuring the Ansible inventory file to use hooks
You define the hooks to use when you update the Red Hat Enterprise Linux (RHEL) compute machines, which are also known as worker machines, in the hosts
inventory file under the all:vars
section.
Prerequisites
-
You have access to the machine that you used to add the RHEL compute machines cluster. You must have access to the
hosts
Ansible inventory file that defines your RHEL machines.
Procedure
After you design the hook, create a YAML file that defines the Ansible tasks for it. This file must be a set of tasks and cannot be a playbook, as shown in the following example:
--- # Trivial example forcing an operator to acknowledge the start of an upgrade # file=/home/user/openshift-ansible/hooks/pre_compute.yml - name: note the start of a compute machine update debug: msg: "Compute machine upgrade of {{ inventory_hostname }} is about to start" - name: require the user agree to start an upgrade pause: prompt: "Press Enter to start the compute machine update"
Modify the
hosts
Ansible inventory file to specify the hook files. The hook files are specified as parameter values in the[all:vars]
section, as shown:Example hook definitions in an inventory file
[all:vars] openshift_node_pre_upgrade_hook=/home/user/openshift-ansible/hooks/pre_node.yml openshift_node_post_upgrade_hook=/home/user/openshift-ansible/hooks/post_node.yml
To avoid ambiguity in the paths to the hook, use absolute paths instead of a relative paths in their definitions.
3.5.3.3. Available hooks for RHEL compute machines
You can use the following hooks when you update the Red Hat Enterprise Linux (RHEL) compute machines in your OpenShift Container Platform cluster.
Hook name | Description |
---|---|
|
|
|
|
|
|
|
|
3.5.4. Updating RHEL compute machines in your cluster
After you update your cluster, you must update the Red Hat Enterprise Linux (RHEL) compute machines in your cluster.
Red Hat Enterprise Linux (RHEL) versions 8.6 and later are supported for RHEL compute machines.
You can also update your compute machines to another minor version of OpenShift Container Platform if you are using RHEL as the operating system. You do not need to exclude any RPM packages from RHEL when performing a minor version update.
You cannot update RHEL 7 compute machines to RHEL 8. You must deploy new RHEL 8 hosts, and the old RHEL 7 hosts should be removed.
Prerequisites
You updated your cluster.
ImportantBecause the RHEL machines require assets that are generated by the cluster to complete the update process, you must update the cluster before you update the RHEL worker machines in it.
-
You have access to the local machine that you used to add the RHEL compute machines to your cluster. You must have access to the
hosts
Ansible inventory file that defines your RHEL machines and theupgrade
playbook. - For updates to a minor version, the RPM repository is using the same version of OpenShift Container Platform that is running on your cluster.
Procedure
Stop and disable firewalld on the host:
# systemctl disable --now firewalld.service
NoteBy default, the base OS RHEL with "Minimal" installation option enables firewalld service. Having the firewalld service enabled on your host prevents you from accessing OpenShift Container Platform logs on the worker. Do not enable firewalld later if you wish to continue accessing OpenShift Container Platform logs on the worker.
Enable the repositories that are required for OpenShift Container Platform 4.16:
On the machine that you run the Ansible playbooks, update the required repositories:
# subscription-manager repos --disable=rhocp-4.15-for-rhel-8-x86_64-rpms \ --enable=rhocp-4.16-for-rhel-8-x86_64-rpms
ImportantAs of OpenShift Container Platform 4.11, the Ansible playbooks are provided only for RHEL 8. If a RHEL 7 system was used as a host for the OpenShift Container Platform 4.10 Ansible playbooks, you must either update the Ansible host to RHEL 8, or create a new Ansible host on a RHEL 8 system and copy over the inventories from the old Ansible host.
On the machine that you run the Ansible playbooks, update the Ansible package:
# yum swap ansible ansible-core
On the machine that you run the Ansible playbooks, update the required packages, including
openshift-ansible
:# yum update openshift-ansible openshift-clients
On each RHEL compute node, update the required repositories:
# subscription-manager repos --disable=rhocp-4.15-for-rhel-8-x86_64-rpms \ --enable=rhocp-4.16-for-rhel-8-x86_64-rpms
Update a RHEL worker machine:
Review your Ansible inventory file at
/<path>/inventory/hosts
and update its contents so that the RHEL 8 machines are listed in the[workers]
section, as shown in the following example:[all:vars] ansible_user=root #ansible_become=True openshift_kubeconfig_path="~/.kube/config" [workers] mycluster-rhel8-0.example.com mycluster-rhel8-1.example.com mycluster-rhel8-2.example.com mycluster-rhel8-3.example.com
Change to the
openshift-ansible
directory:$ cd /usr/share/ansible/openshift-ansible
Run the
upgrade
playbook:$ ansible-playbook -i /<path>/inventory/hosts playbooks/upgrade.yml 1
- 1
- For
<path>
, specify the path to the Ansible inventory file that you created.
NoteThe
upgrade
playbook only updates the OpenShift Container Platform packages. It does not update the operating system packages.
After you update all of the workers, confirm that all of your cluster nodes have updated to the new version:
# oc get node
Example output
NAME STATUS ROLES AGE VERSION mycluster-control-plane-0 Ready master 145m v1.29.4 mycluster-control-plane-1 Ready master 145m v1.29.4 mycluster-control-plane-2 Ready master 145m v1.29.4 mycluster-rhel8-0 Ready worker 98m v1.29.4 mycluster-rhel8-1 Ready worker 98m v1.29.4 mycluster-rhel8-2 Ready worker 98m v1.29.4 mycluster-rhel8-3 Ready worker 98m v1.29.4
Optional: Update the operating system packages that were not updated by the
upgrade
playbook. To update packages that are not on 4.16, use the following command:# yum update
NoteYou do not need to exclude RPM packages if you are using the same RPM repository that you used when you installed 4.16.
3.6. Updating a cluster in a disconnected environment
3.6.1. About cluster updates in a disconnected environment
A disconnected environment is one in which your cluster nodes cannot access the internet. For this reason, you must populate a registry with the installation images. If your registry host cannot access both the internet and the cluster, you can mirror the images to a file system that is disconnected from that environment and then bring that host or removable media across that gap. If the local container registry and the cluster are connected to the mirror registry’s host, you can directly push the release images to the local registry.
A single container image registry is sufficient to host mirrored images for several clusters in the disconnected network.
3.6.1.1. Mirroring OpenShift Container Platform images
To update your cluster in a disconnected environment, your cluster environment must have access to a mirror registry that has the necessary images and resources for your targeted update. The following page has instructions for mirroring images onto a repository in your disconnected cluster:
3.6.1.2. Performing a cluster update in a disconnected environment
You can use one of the following procedures to update a disconnected OpenShift Container Platform cluster:
3.6.1.3. Uninstalling the OpenShift Update Service from a cluster
You can use the following procedure to uninstall a local copy of the OpenShift Update Service (OSUS) from your cluster:
3.6.2. Mirroring OpenShift Container Platform images
You must mirror container images onto a mirror registry before you can update a cluster in a disconnected environment. You can also use this procedure in connected environments to ensure your clusters run only approved container images that have satisfied your organizational controls for external content.
Your mirror registry must be running at all times while the cluster is running.
The following steps outline the high-level workflow on how to mirror images to a mirror registry:
-
Install the OpenShift CLI (
oc
) on all devices being used to retrieve and push release images. - Download the registry pull secret and add it to your cluster.
If you use the oc-mirror OpenShift CLI (
oc
) plugin:- Install the oc-mirror plugin on all devices being used to retrieve and push release images.
- Create an image set configuration file for the plugin to use when determining which release images to mirror. You can edit this configuration file later to change which release images that the plugin mirrors.
- Mirror your targeted release images directly to a mirror registry, or to removable media and then to a mirror registry.
- Configure your cluster to use the resources generated by the oc-mirror plugin.
- Repeat these steps as needed to update your mirror registry.
If you use the
oc adm release mirror
command:- Set environment variables that correspond to your environment and the release images you want to mirror.
- Mirror your targeted release images directly to a mirror registry, or to removable media and then to a mirror registry.
- Repeat these steps as needed to update your mirror registry.
Compared to using the oc adm release mirror
command, the oc-mirror plugin has the following advantages:
- It can mirror content other than container images.
- After mirroring images for the first time, it is easier to update images in the registry.
- The oc-mirror plugin provides an automated way to mirror the release payload from Quay, and also builds the latest graph data image for the OpenShift Update Service running in the disconnected environment.
3.6.2.1. Mirroring resources using the oc-mirror plugin
You can use the oc-mirror OpenShift CLI (oc
) plugin to mirror images to a mirror registry in your fully or partially disconnected environments. You must run oc-mirror from a system with internet connectivity to download the required images from the official Red Hat registries.
See Mirroring images for a disconnected installation using the oc-mirror plugin for additional details.
3.6.2.2. Mirroring images using the oc adm release mirror command
You can use the oc adm release mirror
command to mirror images to your mirror registry.
3.6.2.2.1. Prerequisites
You must have a container image registry that supports Docker v2-2 in the location that will host the OpenShift Container Platform cluster, such as Red Hat Quay.
NoteIf you use Red Hat Quay, you must use version 3.6 or later with the oc-mirror plugin. If you have an entitlement to Red Hat Quay, see the documentation on deploying Red Hat Quay for proof-of-concept purposes or by using the Quay Operator. If you need additional assistance selecting and installing a registry, contact your sales representative or Red Hat Support.
If you do not have an existing solution for a container image registry, the mirror registry for Red Hat OpenShift is included in OpenShift Container Platform subscriptions. The mirror registry for Red Hat OpenShift is a small-scale container registry that you can use to mirror OpenShift Container Platform container images in disconnected installations and updates.
3.6.2.2.2. Preparing your mirror host
Before you perform the mirror procedure, you must prepare the host to retrieve content and push it to the remote location.
3.6.2.2.2.1. Installing the OpenShift CLI
You can install the OpenShift CLI (oc
) to interact with OpenShift Container Platform from a command-line interface. You can install oc
on Linux, Windows, or macOS.
If you installed an earlier version of oc
, you cannot use it to complete all of the commands in OpenShift Container Platform 4.16. Download and install the new version of oc
. If you are updating a cluster in a disconnected environment, install the oc
version that you plan to update to.
You can install the OpenShift CLI (oc
) binary on Linux by using the following procedure.
Procedure
- Navigate to the OpenShift Container Platform downloads page on the Red Hat Customer Portal.
- Select the architecture from the Product Variant drop-down list.
- Select the appropriate version from the Version drop-down list.
- Click Download Now next to the OpenShift v4.16 Linux Client entry and save the file.
Unpack the archive:
$ tar xvf <file>
Place the
oc
binary in a directory that is on yourPATH
.To check your
PATH
, execute the following command:$ echo $PATH
Verification
After you install the OpenShift CLI, it is available using the
oc
command:$ oc <command>
You can install the OpenShift CLI (oc
) binary on Windows by using the following procedure.
Procedure
- Navigate to the OpenShift Container Platform downloads page on the Red Hat Customer Portal.
- Select the appropriate version from the Version drop-down list.
- Click Download Now next to the OpenShift v4.16 Windows Client entry and save the file.
- Unzip the archive with a ZIP program.
Move the
oc
binary to a directory that is on yourPATH
.To check your
PATH
, open the command prompt and execute the following command:C:\> path
Verification
After you install the OpenShift CLI, it is available using the
oc
command:C:\> oc <command>
You can install the OpenShift CLI (oc
) binary on macOS by using the following procedure.
Procedure
- Navigate to the OpenShift Container Platform downloads page on the Red Hat Customer Portal.
- Select the appropriate version from the Version drop-down list.
Click Download Now next to the OpenShift v4.16 macOS Client entry and save the file.
NoteFor macOS arm64, choose the OpenShift v4.16 macOS arm64 Client entry.
- Unpack and unzip the archive.
Move the
oc
binary to a directory on your PATH.To check your
PATH
, open a terminal and execute the following command:$ echo $PATH
Verification
Verify your installation by using an
oc
command:$ oc <command>
Additional resources
3.6.2.2.2.2. Configuring credentials that allow images to be mirrored
Create a container image registry credentials file that enables you to mirror images from Red Hat to your mirror.
Do not use this image registry credentials file as the pull secret when you install a cluster. If you provide this file when you install cluster, all of the machines in the cluster will have write access to your mirror registry.
This process requires that you have write access to a container image registry on the mirror registry and adds the credentials to a registry pull secret.
Prerequisites
- You configured a mirror registry to use in your disconnected environment.
- You identified an image repository location on your mirror registry to mirror images into.
- You provisioned a mirror registry account that allows images to be uploaded to that image repository.
Procedure
Complete the following steps on the installation host:
-
Download your
registry.redhat.io
pull secret from Red Hat OpenShift Cluster Manager. Make a copy of your pull secret in JSON format:
$ cat ./pull-secret | jq . > <path>/<pull_secret_file_in_json> 1
- 1
- Specify the path to the folder to store the pull secret in and a name for the JSON file that you create.
The contents of the file resemble the following example:
{ "auths": { "cloud.openshift.com": { "auth": "b3BlbnNo...", "email": "you@example.com" }, "quay.io": { "auth": "b3BlbnNo...", "email": "you@example.com" }, "registry.connect.redhat.com": { "auth": "NTE3Njg5Nj...", "email": "you@example.com" }, "registry.redhat.io": { "auth": "NTE3Njg5Nj...", "email": "you@example.com" } } }
Optional: If using the oc-mirror plugin, save the file as either
~/.docker/config.json
or$XDG_RUNTIME_DIR/containers/auth.json
:If the
.docker
or$XDG_RUNTIME_DIR/containers
directories do not exist, create one by entering the following command:$ mkdir -p <directory_name>
Where
<directory_name>
is either~/.docker
or$XDG_RUNTIME_DIR/containers
.Copy the pull secret to the appropriate directory by entering the following command:
$ cp <path>/<pull_secret_file_in_json> <directory_name>/<auth_file>
Where
<directory_name>
is either~/.docker
or$XDG_RUNTIME_DIR/containers
, and<auth_file>
is eitherconfig.json
orauth.json
.
Generate the base64-encoded user name and password or token for your mirror registry:
$ echo -n '<user_name>:<password>' | base64 -w0 1 BGVtbYk3ZHAtqXs=
- 1
- For
<user_name>
and<password>
, specify the user name and password that you configured for your registry.
Edit the JSON file and add a section that describes your registry to it:
"auths": { "<mirror_registry>": { 1 "auth": "<credentials>", 2 "email": "you@example.com" } },
The file resembles the following example:
{ "auths": { "registry.example.com": { "auth": "BGVtbYk3ZHAtqXs=", "email": "you@example.com" }, "cloud.openshift.com": { "auth": "b3BlbnNo...", "email": "you@example.com" }, "quay.io": { "auth": "b3BlbnNo...", "email": "you@example.com" }, "registry.connect.redhat.com": { "auth": "NTE3Njg5Nj...", "email": "you@example.com" }, "registry.redhat.io": { "auth": "NTE3Njg5Nj...", "email": "you@example.com" } } }
3.6.2.2.3. Mirroring images to a mirror registry
To avoid excessive memory usage by the OpenShift Update Service application, you must mirror release images to a separate repository as described in the following procedure.
Prerequisites
- You configured a mirror registry to use in your disconnected environment and can access the certificate and credentials that you configured.
- You downloaded the pull secret from Red Hat OpenShift Cluster Manager and modified it to include authentication to your mirror repository.
- If you use self-signed certificates, you have specified a Subject Alternative Name in the certificates.
Procedure
- Use the Red Hat OpenShift Container Platform Update Graph visualizer and update planner to plan an update from one version to another. The OpenShift Update Graph provides channel graphs and a way to confirm that there is an update path between your current and intended cluster versions.
Set the required environment variables:
Export the release version:
$ export OCP_RELEASE=<release_version>
For
<release_version>
, specify the tag that corresponds to the version of OpenShift Container Platform to which you want to update, such as4.5.4
.Export the local registry name and host port:
$ LOCAL_REGISTRY='<local_registry_host_name>:<local_registry_host_port>'
For
<local_registry_host_name>
, specify the registry domain name for your mirror repository, and for<local_registry_host_port>
, specify the port that it serves content on.Export the local repository name:
$ LOCAL_REPOSITORY='<local_repository_name>'
For
<local_repository_name>
, specify the name of the repository to create in your registry, such asocp4/openshift4
.If you are using the OpenShift Update Service, export an additional local repository name to contain the release images:
$ LOCAL_RELEASE_IMAGES_REPOSITORY='<local_release_images_repository_name>'
For
<local_release_images_repository_name>
, specify the name of the repository to create in your registry, such asocp4/openshift4-release-images
.Export the name of the repository to mirror:
$ PRODUCT_REPO='openshift-release-dev'
For a production release, you must specify
openshift-release-dev
.Export the path to your registry pull secret:
$ LOCAL_SECRET_JSON='<path_to_pull_secret>'
For
<path_to_pull_secret>
, specify the absolute path to and file name of the pull secret for your mirror registry that you created.NoteIf your cluster uses an
ImageContentSourcePolicy
object to configure repository mirroring, you can use only global pull secrets for mirrored registries. You cannot add a pull secret to a project.Export the release mirror:
$ RELEASE_NAME="ocp-release"
For a production release, you must specify
ocp-release
.Export the type of architecture for your cluster:
$ ARCHITECTURE=<cluster_architecture> 1
- 1
- Specify the architecture of the cluster, such as
x86_64
,aarch64
,s390x
, orppc64le
.
Export the path to the directory to host the mirrored images:
$ REMOVABLE_MEDIA_PATH=<path> 1
- 1
- Specify the full path, including the initial forward slash (/) character.
Review the images and configuration manifests to mirror:
$ oc adm release mirror -a ${LOCAL_SECRET_JSON} --to-dir=${REMOVABLE_MEDIA_PATH}/mirror quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-${ARCHITECTURE} --dry-run
Mirror the version images to the mirror registry.
If your mirror host does not have internet access, take the following actions:
- Connect the removable media to a system that is connected to the internet.
Mirror the images and configuration manifests to a directory on the removable media:
$ oc adm release mirror -a ${LOCAL_SECRET_JSON} --to-dir=${REMOVABLE_MEDIA_PATH}/mirror quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-${ARCHITECTURE}
NoteThis command also generates and saves the mirrored release image signature config map onto the removable media.
Take the media to the disconnected environment and upload the images to the local container registry.
$ oc image mirror -a ${LOCAL_SECRET_JSON} --from-dir=${REMOVABLE_MEDIA_PATH}/mirror "file://openshift/release:${OCP_RELEASE}*" ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} 1
- 1
- For
REMOVABLE_MEDIA_PATH
, you must use the same path that you specified when you mirrored the images.
-
Use
oc
command-line interface (CLI) to log in to the cluster that you are updating. Apply the mirrored release image signature config map to the connected cluster:
$ oc apply -f ${REMOVABLE_MEDIA_PATH}/mirror/config/<image_signature_file> 1
- 1
- For
<image_signature_file>
, specify the path and name of the file, for example,signature-sha256-81154f5c03294534.yaml
.
If you are using the OpenShift Update Service, mirror the release image to a separate repository:
$ oc image mirror -a ${LOCAL_SECRET_JSON} ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE} ${LOCAL_REGISTRY}/${LOCAL_RELEASE_IMAGES_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE}
If the local container registry and the cluster are connected to the mirror host, take the following actions:
Directly push the release images to the local registry and apply the config map to the cluster by using following command:
$ oc adm release mirror -a ${LOCAL_SECRET_JSON} --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-${ARCHITECTURE} \ --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} --apply-release-image-signature
NoteIf you include the
--apply-release-image-signature
option, do not create the config map for image signature verification.If you are using the OpenShift Update Service, mirror the release image to a separate repository:
$ oc image mirror -a ${LOCAL_SECRET_JSON} ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE} ${LOCAL_REGISTRY}/${LOCAL_RELEASE_IMAGES_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE}
3.6.3. Updating a cluster in a disconnected environment using the OpenShift Update Service
To get an update experience similar to connected clusters, you can use the following procedures to install and configure the OpenShift Update Service (OSUS) in a disconnected environment.
The following steps outline the high-level workflow on how to update a cluster in a disconnected environment using OSUS:
- Configure access to a secured registry.
- Update the global cluster pull secret to access your mirror registry.
- Install the OSUS Operator.
- Create a graph data container image for the OpenShift Update Service.
- Install the OSUS application and configure your clusters to use the OpenShift Update Service in your environment.
- Perform a supported update procedure from the documentation as you would with a connected cluster.
3.6.3.1. Using the OpenShift Update Service in a disconnected environment
The OpenShift Update Service (OSUS) provides update recommendations to OpenShift Container Platform clusters. Red Hat publicly hosts the OpenShift Update Service, and clusters in a connected environment can connect to the service through public APIs to retrieve update recommendations.
However, clusters in a disconnected environment cannot access these public APIs to retrieve update information. To have a similar update experience in a disconnected environment, you can install and configure the OpenShift Update Service so that it is available within the disconnected environment.
A single OSUS instance is capable of serving recommendations to thousands of clusters. OSUS can be scaled horizontally to cater to more clusters by changing the replica value. So for most disconnected use cases, one OSUS instance is enough. For example, Red Hat hosts just one OSUS instance for the entire fleet of connected clusters.
If you want to keep update recommendations separate in different environments, you can run one OSUS instance for each environment. For example, in a case where you have separate test and stage environments, you might not want a cluster in a stage environment to receive update recommendations to version A if that version has not been tested in the test environment yet.
The following sections describe how to install an OSUS instance and configure it to provide update recommendations to a cluster.
Additional resources
3.6.3.2. Prerequisites
-
You must have the
oc
command-line interface (CLI) tool installed. - You must provision a container image registry in your environment with the container images for your update, as described in Mirroring OpenShift Container Platform images.
3.6.3.3. Configuring access to a secured registry for the OpenShift Update Service
If the release images are contained in a registry whose HTTPS X.509 certificate is signed by a custom certificate authority, complete the steps in Configuring additional trust stores for image registry access along with following changes for the update service.
The OpenShift Update Service Operator needs the config map key name updateservice-registry
in the registry CA cert.
Image registry CA config map example for the update service
apiVersion: v1 kind: ConfigMap metadata: name: my-registry-ca data: updateservice-registry: | 1 -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- registry-with-port.example.com..5000: | 2 -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE-----
3.6.3.4. Updating the global cluster pull secret
You can update the global pull secret for your cluster by either replacing the current pull secret or appending a new pull secret.
The procedure is required when users use a separate registry to store images than the registry used during installation.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Optional: To append a new pull secret to the existing pull secret, complete the following steps:
Enter the following command to download the pull secret:
$ oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' ><pull_secret_location> 1
- 1
- Provide the path to the pull secret file.
Enter the following command to add the new pull secret:
$ oc registry login --registry="<registry>" \ 1 --auth-basic="<username>:<password>" \ 2 --to=<pull_secret_location> 3
Alternatively, you can perform a manual update to the pull secret file.
Enter the following command to update the global pull secret for your cluster:
$ oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull_secret_location> 1
- 1
- Provide the path to the new pull secret file.
This update is rolled out to all nodes, which can take some time depending on the size of your cluster.
NoteAs of OpenShift Container Platform 4.7.4, changes to the global pull secret no longer trigger a node drain or reboot.
3.6.3.5. Installing the OpenShift Update Service Operator
To install the OpenShift Update Service, you must first install the OpenShift Update Service Operator by using the OpenShift Container Platform web console or CLI.
For clusters that are installed in disconnected environments, also known as disconnected clusters, Operator Lifecycle Manager by default cannot access the Red Hat-provided OperatorHub sources hosted on remote registries because those remote sources require full internet connectivity. For more information, see Using Operator Lifecycle Manager on restricted networks.
3.6.3.5.1. Installing the OpenShift Update Service Operator by using the web console
You can use the web console to install the OpenShift Update Service Operator.
Procedure
In the web console, click Operators
OperatorHub. NoteEnter
Update Service
into the Filter by keyword… field to find the Operator faster.Choose OpenShift Update Service from the list of available Operators, and click Install.
- Select an Update channel.
- Select a Version.
- Select A specific namespace on the cluster under Installation Mode.
-
Select a namespace for Installed Namespace or accept the recommended namespace
openshift-update-service
. Select an Update approval strategy:
- The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual strategy requires a cluster administrator to approve the Operator update.
- Click Install.
-
Go to Operators
Installed Operators and verify that the OpenShift Update Service Operator is installed. - Ensure that OpenShift Update Service is listed in the correct namespace with a Status of Succeeded.
3.6.3.5.2. Installing the OpenShift Update Service Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the OpenShift Update Service Operator.
Procedure
Create a namespace for the OpenShift Update Service Operator:
Create a
Namespace
object YAML file, for example,update-service-namespace.yaml
, for the OpenShift Update Service Operator:apiVersion: v1 kind: Namespace metadata: name: openshift-update-service annotations: openshift.io/node-selector: "" labels: openshift.io/cluster-monitoring: "true" 1
- 1
- Set the
openshift.io/cluster-monitoring
label to enable Operator-recommended cluster monitoring on this namespace.
Create the namespace:
$ oc create -f <filename>.yaml
For example:
$ oc create -f update-service-namespace.yaml
Install the OpenShift Update Service Operator by creating the following objects:
Create an
OperatorGroup
object YAML file, for example,update-service-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: update-service-operator-group namespace: openshift-update-service spec: targetNamespaces: - openshift-update-service
Create an
OperatorGroup
object:$ oc -n openshift-update-service create -f <filename>.yaml
For example:
$ oc -n openshift-update-service create -f update-service-operator-group.yaml
Create a
Subscription
object YAML file, for example,update-service-subscription.yaml
:Example Subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: update-service-subscription namespace: openshift-update-service spec: channel: v1 installPlanApproval: "Automatic" source: "redhat-operators" 1 sourceNamespace: "openshift-marketplace" name: "cincinnati-operator"
- 1
- Specify the name of the catalog source that provides the Operator. For clusters that do not use a custom Operator Lifecycle Manager (OLM), specify
redhat-operators
. If your OpenShift Container Platform cluster is installed in a disconnected environment, specify the name of theCatalogSource
object created when you configured Operator Lifecycle Manager (OLM).
Create the
Subscription
object:$ oc create -f <filename>.yaml
For example:
$ oc -n openshift-update-service create -f update-service-subscription.yaml
The OpenShift Update Service Operator is installed to the
openshift-update-service
namespace and targets theopenshift-update-service
namespace.
Verify the Operator installation:
$ oc -n openshift-update-service get clusterserviceversions
Example output
NAME DISPLAY VERSION REPLACES PHASE update-service-operator.v4.6.0 OpenShift Update Service 4.6.0 Succeeded ...
If the OpenShift Update Service Operator is listed, the installation was successful. The version number might be different than shown.
Additional resources
3.6.3.6. Creating the OpenShift Update Service graph data container image
The OpenShift Update Service requires a graph data container image, from which the OpenShift Update Service retrieves information about channel membership and blocked update edges. Graph data is typically fetched directly from the update graph data repository. In environments where an internet connection is unavailable, loading this information from an init container is another way to make the graph data available to the OpenShift Update Service. The role of the init container is to provide a local copy of the graph data, and during pod initialization, the init container copies the data to a volume that is accessible by the service.
The oc-mirror OpenShift CLI (oc
) plugin creates this graph data container image in addition to mirroring release images. If you used the oc-mirror plugin to mirror your release images, you can skip this procedure.
Procedure
Create a Dockerfile, for example,
./Dockerfile
, containing the following:FROM registry.access.redhat.com/ubi9/ubi:latest RUN curl -L -o cincinnati-graph-data.tar.gz https://api.openshift.com/api/upgrades_info/graph-data RUN mkdir -p /var/lib/cincinnati-graph-data && tar xvzf cincinnati-graph-data.tar.gz -C /var/lib/cincinnati-graph-data/ --no-overwrite-dir --no-same-owner CMD ["/bin/bash", "-c" ,"exec cp -rp /var/lib/cincinnati-graph-data/* /var/lib/cincinnati/graph-data"]
Use the docker file created in the above step to build a graph data container image, for example,
registry.example.com/openshift/graph-data:latest
:$ podman build -f ./Dockerfile -t registry.example.com/openshift/graph-data:latest
Push the graph data container image created in the previous step to a repository that is accessible to the OpenShift Update Service, for example,
registry.example.com/openshift/graph-data:latest
:$ podman push registry.example.com/openshift/graph-data:latest
NoteTo push a graph data image to a registry in a disconnected environment, copy the graph data container image created in the previous step to a repository that is accessible to the OpenShift Update Service. Run
oc image mirror --help
for available options.
3.6.3.7. Creating an OpenShift Update Service application
You can create an OpenShift Update Service application by using the OpenShift Container Platform web console or CLI.
3.6.3.7.1. Creating an OpenShift Update Service application by using the web console
You can use the OpenShift Container Platform web console to create an OpenShift Update Service application by using the OpenShift Update Service Operator.
Prerequisites
- The OpenShift Update Service Operator has been installed.
- The OpenShift Update Service graph data container image has been created and pushed to a repository that is accessible to the OpenShift Update Service.
- The current release and update target releases have been mirrored to a registry in the disconnected environment.
Procedure
-
In the web console, click Operators
Installed Operators. - Choose OpenShift Update Service from the list of installed Operators.
- Click the Update Service tab.
- Click Create UpdateService.
-
Enter a name in the Name field, for example,
service
. -
Enter the local pullspec in the Graph Data Image field to the graph data container image created in "Creating the OpenShift Update Service graph data container image", for example,
registry.example.com/openshift/graph-data:latest
. -
In the Releases field, enter the registry and repository created to contain the release images in "Mirroring the OpenShift Container Platform image repository", for example,
registry.example.com/ocp4/openshift4-release-images
. -
Enter
2
in the Replicas field. - Click Create to create the OpenShift Update Service application.
Verify the OpenShift Update Service application:
- From the UpdateServices list in the Update Service tab, click the Update Service application just created.
- Click the Resources tab.
- Verify each application resource has a status of Created.
3.6.3.7.2. Creating an OpenShift Update Service application by using the CLI
You can use the OpenShift CLI (oc
) to create an OpenShift Update Service application.
Prerequisites
- The OpenShift Update Service Operator has been installed.
- The OpenShift Update Service graph data container image has been created and pushed to a repository that is accessible to the OpenShift Update Service.
- The current release and update target releases have been mirrored to a registry in the disconnected environment.
Procedure
Configure the OpenShift Update Service target namespace, for example,
openshift-update-service
:$ NAMESPACE=openshift-update-service
The namespace must match the
targetNamespaces
value from the operator group.Configure the name of the OpenShift Update Service application, for example,
service
:$ NAME=service
Configure the registry and repository for the release images as configured in "Mirroring the OpenShift Container Platform image repository", for example,
registry.example.com/ocp4/openshift4-release-images
:$ RELEASE_IMAGES=registry.example.com/ocp4/openshift4-release-images
Set the local pullspec for the graph data image to the graph data container image created in "Creating the OpenShift Update Service graph data container image", for example,
registry.example.com/openshift/graph-data:latest
:$ GRAPH_DATA_IMAGE=registry.example.com/openshift/graph-data:latest
Create an OpenShift Update Service application object:
$ oc -n "${NAMESPACE}" create -f - <<EOF apiVersion: updateservice.operator.openshift.io/v1 kind: UpdateService metadata: name: ${NAME} spec: replicas: 2 releases: ${RELEASE_IMAGES} graphDataImage: ${GRAPH_DATA_IMAGE} EOF
Verify the OpenShift Update Service application:
Use the following command to obtain a policy engine route:
$ while sleep 1; do POLICY_ENGINE_GRAPH_URI="$(oc -n "${NAMESPACE}" get -o jsonpath='{.status.policyEngineURI}/api/upgrades_info/v1/graph{"\n"}' updateservice "${NAME}")"; SCHEME="${POLICY_ENGINE_GRAPH_URI%%:*}"; if test "${SCHEME}" = http -o "${SCHEME}" = https; then break; fi; done
You might need to poll until the command succeeds.
Retrieve a graph from the policy engine. Be sure to specify a valid version for
channel
. For example, if running in OpenShift Container Platform 4.16, usestable-4.16
:$ while sleep 10; do HTTP_CODE="$(curl --header Accept:application/json --output /dev/stderr --write-out "%{http_code}" "${POLICY_ENGINE_GRAPH_URI}?channel=stable-4.6")"; if test "${HTTP_CODE}" -eq 200; then break; fi; echo "${HTTP_CODE}"; done
This polls until the graph request succeeds; however, the resulting graph might be empty depending on which release images you have mirrored.
The policy engine route name must not be more than 63 characters based on RFC-1123. If you see ReconcileCompleted
status as false
with the reason CreateRouteFailed
caused by host must conform to DNS 1123 naming convention and must be no more than 63 characters
, try creating the Update Service with a shorter name.
3.6.3.8. Configuring the Cluster Version Operator (CVO)
After the OpenShift Update Service Operator has been installed and the OpenShift Update Service application has been created, the Cluster Version Operator (CVO) can be updated to pull graph data from the OpenShift Update Service installed in your environment.
Prerequisites
- The OpenShift Update Service Operator has been installed.
- The OpenShift Update Service graph data container image has been created and pushed to a repository that is accessible to the OpenShift Update Service.
- The current release and update target releases have been mirrored to a registry in the disconnected environment.
- The OpenShift Update Service application has been created.
Procedure
Set the OpenShift Update Service target namespace, for example,
openshift-update-service
:$ NAMESPACE=openshift-update-service
Set the name of the OpenShift Update Service application, for example,
service
:$ NAME=service
Obtain the policy engine route:
$ POLICY_ENGINE_GRAPH_URI="$(oc -n "${NAMESPACE}" get -o jsonpath='{.status.policyEngineURI}/api/upgrades_info/v1/graph{"\n"}' updateservice "${NAME}")"
Set the patch for the pull graph data:
$ PATCH="{\"spec\":{\"upstream\":\"${POLICY_ENGINE_GRAPH_URI}\"}}"
Patch the CVO to use the OpenShift Update Service in your environment:
$ oc patch clusterversion version -p $PATCH --type merge
See Configuring the cluster-wide proxy to configure the CA to trust the update server.
3.6.3.9. Next steps
Before updating your cluster, confirm that the following conditions are met:
- The Cluster Version Operator (CVO) is configured to use your installed OpenShift Update Service application.
The release image signature config map for the new release is applied to your cluster.
NoteThe Cluster Version Operator (CVO) uses release image signatures to ensure that release images have not been modified, by verifying that the release image signatures match the expected result.
- The current release and update target release images are mirrored to a registry in the disconnected environment.
- A recent graph data container image has been mirrored to your registry.
A recent version of the OpenShift Update Service Operator is installed.
NoteIf you have not recently installed or updated the OpenShift Update Service Operator, there might be a more recent version available. See Using Operator Lifecycle Manager on restricted networks for more information about how to update your OLM catalog in a disconnected environment.
After you configure your cluster to use the installed OpenShift Update Service and local mirror registry, you can use any of the following update methods:
3.6.4. Updating a cluster in a disconnected environment without the OpenShift Update Service
Use the following procedures to update a cluster in a disconnected environment without access to the OpenShift Update Service.
3.6.4.1. Prerequisites
-
You must have the
oc
command-line interface (CLI) tool installed. - You must provision a local container image registry with the container images for your update, as described in Mirroring OpenShift Container Platform images.
-
You must have access to the cluster as a user with
admin
privileges. See Using RBAC to define and apply permissions. - You must have a recent etcd backup in case your update fails and you must restore your cluster to a previous state.
- You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See Updating installed Operators for more information on how to check compatibility and, if necessary, update the installed Operators.
- You must ensure that all machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
- If your cluster uses manually maintained credentials, update the cloud provider resources for the new release. For more information, including how to determine if this is a requirement for your cluster, see Preparing to update a cluster with manually maintained credentials.
-
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If
minAvailable
is set to 1 inPodDisruptionBudget
, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and thePodDisruptionBudget
field can prevent the node drain.
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If minAvailable
is set to 1 in PodDisruptionBudget
, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and the PodDisruptionBudget
field can prevent the node drain.
3.6.4.2. Pausing a MachineHealthCheck resource
During the update process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck
resources before updating the cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
).
Procedure
To list all the available
MachineHealthCheck
resources that you want to pause, run the following command:$ oc get machinehealthcheck -n openshift-machine-api
To pause the machine health checks, add the
cluster.x-k8s.io/paused=""
annotation to theMachineHealthCheck
resource. Run the following command:$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused=""
The annotated
MachineHealthCheck
resource resembles the following YAML file:apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: name: example namespace: openshift-machine-api annotations: cluster.x-k8s.io/paused: "" spec: selector: matchLabels: role: worker unhealthyConditions: - type: "Ready" status: "Unknown" timeout: "300s" - type: "Ready" status: "False" timeout: "300s" maxUnhealthy: "40%" status: currentHealthy: 5 expectedMachines: 5
ImportantResume the machine health checks after updating the cluster. To resume the check, remove the pause annotation from the
MachineHealthCheck
resource by running the following command:$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused-
3.6.4.3. Retrieving a release image digest
In order to update a cluster in a disconnected environment using the oc adm upgrade
command with the --to-image
option, you must reference the sha256 digest that corresponds to your targeted release image.
Procedure
Run the following command on a device that is connected to the internet:
$ oc adm release info -o 'jsonpath={.digest}{"\n"}' quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_VERSION}-${ARCHITECTURE}
For
{OCP_RELEASE_VERSION}
, specify the version of OpenShift Container Platform to which you want to update, such as4.10.16
.For
{ARCHITECTURE}
, specify the architecture of the cluster, such asx86_64
,aarch64
,s390x
, orppc64le
.Example output
sha256:a8bfba3b6dddd1a2fbbead7dac65fe4fb8335089e4e7cae327f3bad334add31d
- Copy the sha256 digest for use when updating your cluster.
3.6.4.4. Updating the disconnected cluster
Update the disconnected cluster to the OpenShift Container Platform version that you downloaded the release images for.
If you have a local OpenShift Update Service, you can update by using the connected web console or CLI instructions instead of this procedure.
Prerequisites
- You mirrored the images for the new release to your registry.
You applied the release image signature ConfigMap for the new release to your cluster.
NoteThe release image signature config map allows the Cluster Version Operator (CVO) to ensure the integrity of release images by verifying that the actual image signatures match the expected signatures.
- You obtained the sha256 digest for your targeted release image.
-
You installed the OpenShift CLI (
oc
). -
You paused all
MachineHealthCheck
resources.
Procedure
Update the cluster:
$ oc adm upgrade --allow-explicit-upgrade --to-image <defined_registry>/<defined_repository>@<digest>
Where:
<defined_registry>
- Specifies the name of the mirror registry you mirrored your images to.
<defined_repository>
- Specifies the name of the image repository you want to use on the mirror registry.
<digest>
-
Specifies the sha256 digest for the targeted release image, for example,
sha256:81154f5c03294534e1eaf0319bef7a601134f891689ccede5d705ef659aa8c92
.
Note- See "Mirroring OpenShift Container Platform images" to review how your mirror registry and repository names are defined.
-
If you used an
ImageContentSourcePolicy
orImageDigestMirrorSet
, you can use the canonical registry and repository names instead of the names you defined. The canonical registry name isquay.io
and the canonical repository name isopenshift-release-dev/ocp-release
. -
You can only configure global pull secrets for clusters that have an
ImageContentSourcePolicy
object. You cannot add a pull secret to a project.
Additional resources
3.6.4.5. Understanding image registry repository mirroring
Setting up container registry repository mirroring enables you to perform the following tasks:
- Configure your OpenShift Container Platform cluster to redirect requests to pull images from a repository on a source image registry and have it resolved by a repository on a mirrored image registry.
- Identify multiple mirrored repositories for each target repository, to make sure that if one mirror is down, another can be used.
Repository mirroring in OpenShift Container Platform includes the following attributes:
- Image pulls are resilient to registry downtimes.
- Clusters in disconnected environments can pull images from critical locations, such as quay.io, and have registries behind a company firewall provide the requested images.
- A particular order of registries is tried when an image pull request is made, with the permanent registry typically being the last one tried.
-
The mirror information you enter is added to the
/etc/containers/registries.conf
file on every node in the OpenShift Container Platform cluster. - When a node makes a request for an image from the source repository, it tries each mirrored repository in turn until it finds the requested content. If all mirrors fail, the cluster tries the source repository. If successful, the image is pulled to the node.
Setting up repository mirroring can be done in the following ways:
At OpenShift Container Platform installation:
By pulling container images needed by OpenShift Container Platform and then bringing those images behind your company’s firewall, you can install OpenShift Container Platform into a data center that is in a disconnected environment.
After OpenShift Container Platform installation:
If you did not configure mirroring during OpenShift Container Platform installation, you can do so postinstallation by using any of the following custom resource (CR) objects:
-
ImageDigestMirrorSet
(IDMS). This object allows you to pull images from a mirrored registry by using digest specifications. The IDMS CR enables you to set a fall back policy that allows or stops continued attempts to pull from the source registry if the image pull fails. -
ImageTagMirrorSet
(ITMS). This object allows you to pull images from a mirrored registry by using image tags. The ITMS CR enables you to set a fall back policy that allows or stops continued attempts to pull from the source registry if the image pull fails. -
ImageContentSourcePolicy
(ICSP). This object allows you to pull images from a mirrored registry by using digest specifications. The ICSP CR always falls back to the source registry if the mirrors do not work.
ImportantUsing an
ImageContentSourcePolicy
(ICSP) object to configure repository mirroring is a deprecated feature. Deprecated functionality is still included in OpenShift Container Platform and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments. If you have existing YAML files that you used to createImageContentSourcePolicy
objects, you can use theoc adm migrate icsp
command to convert those files to anImageDigestMirrorSet
YAML file. For more information, see "Converting ImageContentSourcePolicy (ICSP) files for image registry repository mirroring" in the following section.-
Each of these custom resource objects identify the following information:
- The source of the container image repository you want to mirror.
- A separate entry for each mirror repository you want to offer the content requested from the source repository.
For new clusters, you can use IDMS, ITMS, and ICSP CRs objects as desired. However, using IDMS and ITMS is recommended.
If you upgraded a cluster, any existing ICSP objects remain stable, and both IDMS and ICSP objects are supported. Workloads using ICSP objects continue to function as expected. However, if you want to take advantage of the fallback policies introduced in the IDMS CRs, you can migrate current workloads to IDMS objects by using the oc adm migrate icsp
command as shown in the Converting ImageContentSourcePolicy (ICSP) files for image registry repository mirroring section that follows. Migrating to IDMS objects does not require a cluster reboot.
If your cluster uses an ImageDigestMirrorSet
, ImageTagMirrorSet
, or ImageContentSourcePolicy
object to configure repository mirroring, you can use only global pull secrets for mirrored registries. You cannot add a pull secret to a project.
3.6.4.5.1. Configuring image registry repository mirroring
You can create postinstallation mirror configuration custom resources (CR) to redirect image pull requests from a source image registry to a mirrored image registry.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Configure mirrored repositories, by either:
- Setting up a mirrored repository with Red Hat Quay, as described in Red Hat Quay Repository Mirroring. Using Red Hat Quay allows you to copy images from one repository to another and also automatically sync those repositories repeatedly over time.
Using a tool such as
skopeo
to copy images manually from the source repository to the mirrored repository.For example, after installing the skopeo RPM package on a Red Hat Enterprise Linux (RHEL) 7 or RHEL 8 system, use the
skopeo
command as shown in this example:$ skopeo copy --all \ docker://registry.access.redhat.com/ubi9/ubi-minimal:latest@sha256:5cf... \ docker://example.io/example/ubi-minimal
In this example, you have a container image registry that is named
example.io
with an image repository namedexample
to which you want to copy theubi9/ubi-minimal
image fromregistry.access.redhat.com
. After you create the mirrored registry, you can configure your OpenShift Container Platform cluster to redirect requests made of the source repository to the mirrored repository.
Create a postinstallation mirror configuration CR, by using one of the following examples:
Create an
ImageDigestMirrorSet
orImageTagMirrorSet
CR, as needed, replacing the source and mirrors with your own registry and repository pairs and images:apiVersion: config.openshift.io/v1 1 kind: ImageDigestMirrorSet 2 metadata: name: ubi9repo spec: imageDigestMirrors: 3 - mirrors: - example.io/example/ubi-minimal 4 - example.com/example/ubi-minimal 5 source: registry.access.redhat.com/ubi9/ubi-minimal 6 mirrorSourcePolicy: AllowContactingSource 7 - mirrors: - mirror.example.com/redhat source: registry.example.com/redhat 8 mirrorSourcePolicy: AllowContactingSource - mirrors: - mirror.example.com source: registry.example.com 9 mirrorSourcePolicy: AllowContactingSource - mirrors: - mirror.example.net/image source: registry.example.com/example/myimage 10 mirrorSourcePolicy: AllowContactingSource - mirrors: - mirror.example.net source: registry.example.com/example 11 mirrorSourcePolicy: AllowContactingSource - mirrors: - mirror.example.net/registry-example-com source: registry.example.com 12 mirrorSourcePolicy: AllowContactingSource
- 1
- Indicates the API to use with this CR. This must be
config.openshift.io/v1
. - 2
- Indicates the kind of object according to the pull type:
-
ImageDigestMirrorSet
: Pulls a digest reference image. -
ImageTagMirrorSet
: Pulls a tag reference image.
-
- 3
- Indicates the type of image pull method, either:
-
imageDigestMirrors
: Use for anImageDigestMirrorSet
CR. -
imageTagMirrors
: Use for anImageTagMirrorSet
CR.
-
- 4
- Indicates the name of the mirrored image registry and repository.
- 5
- Optional: Indicates a secondary mirror repository for each target repository. If one mirror is down, the target repository can use the secondary mirror.
- 6
- Indicates the registry and repository source, which is the repository that is referred to in an image pull specification.
- 7
- Optional: Indicates the fallback policy if the image pull fails:
-
AllowContactingSource
: Allows continued attempts to pull the image from the source repository. This is the default. -
NeverContactSource
: Prevents continued attempts to pull the image from the source repository.
-
- 8
- Optional: Indicates a namespace inside a registry, which allows you to use any image in that namespace. If you use a registry domain as a source, the object is applied to all repositories from the registry.
- 9
- Optional: Indicates a registry, which allows you to use any image in that registry. If you specify a registry name, the object is applied to all repositories from a source registry to a mirror registry.
- 10
- Pulls the image
registry.example.com/example/myimage@sha256:…
from the mirrormirror.example.net/image@sha256:..
. - 11
- Pulls the image
registry.example.com/example/image@sha256:…
in the source registry namespace from the mirrormirror.example.net/image@sha256:…
. - 12
- Pulls the image
registry.example.com/myimage@sha256
from the mirror registryexample.net/registry-example-com/myimage@sha256:…
.
Create an
ImageContentSourcePolicy
custom resource, replacing the source and mirrors with your own registry and repository pairs and images:apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: mirror-ocp spec: repositoryDigestMirrors: - mirrors: - mirror.registry.com:443/ocp/release 1 source: quay.io/openshift-release-dev/ocp-release 2 - mirrors: - mirror.registry.com:443/ocp/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
Create the new object:
$ oc create -f registryrepomirror.yaml
After the object is created, the Machine Config Operator (MCO) drains the nodes for
ImageTagMirrorSet
objects only. The MCO does not drain the nodes forImageDigestMirrorSet
andImageContentSourcePolicy
objects.To check that the mirrored configuration settings are applied, do the following on one of the nodes.
List your nodes:
$ oc get node
Example output
NAME STATUS ROLES AGE VERSION ip-10-0-137-44.ec2.internal Ready worker 7m v1.29.4 ip-10-0-138-148.ec2.internal Ready master 11m v1.29.4 ip-10-0-139-122.ec2.internal Ready master 11m v1.29.4 ip-10-0-147-35.ec2.internal Ready worker 7m v1.29.4 ip-10-0-153-12.ec2.internal Ready worker 7m v1.29.4 ip-10-0-154-10.ec2.internal Ready master 11m v1.29.4
Start the debugging process to access the node:
$ oc debug node/ip-10-0-147-35.ec2.internal
Example output
Starting pod/ip-10-0-147-35ec2internal-debug ... To use host binaries, run `chroot /host`
Change your root directory to
/host
:sh-4.2# chroot /host
Check the
/etc/containers/registries.conf
file to make sure the changes were made:sh-4.2# cat /etc/containers/registries.conf
The following output represents a
registries.conf
file where postinstallation mirror configuration CRs were applied. The final two entries are markeddigest-only
andtag-only
respectively.Example output
unqualified-search-registries = ["registry.access.redhat.com", "docker.io"] short-name-mode = "" [[registry]] prefix = "" location = "registry.access.redhat.com/ubi9/ubi-minimal" 1 [[registry.mirror]] location = "example.io/example/ubi-minimal" 2 pull-from-mirror = "digest-only" 3 [[registry.mirror]] location = "example.com/example/ubi-minimal" pull-from-mirror = "digest-only" [[registry]] prefix = "" location = "registry.example.com" [[registry.mirror]] location = "mirror.example.net/registry-example-com" pull-from-mirror = "digest-only" [[registry]] prefix = "" location = "registry.example.com/example" [[registry.mirror]] location = "mirror.example.net" pull-from-mirror = "digest-only" [[registry]] prefix = "" location = "registry.example.com/example/myimage" [[registry.mirror]] location = "mirror.example.net/image" pull-from-mirror = "digest-only" [[registry]] prefix = "" location = "registry.example.com" [[registry.mirror]] location = "mirror.example.com" pull-from-mirror = "digest-only" [[registry]] prefix = "" location = "registry.example.com/redhat" [[registry.mirror]] location = "mirror.example.com/redhat" pull-from-mirror = "digest-only" [[registry]] prefix = "" location = "registry.access.redhat.com/ubi9/ubi-minimal" blocked = true 4 [[registry.mirror]] location = "example.io/example/ubi-minimal-tag" pull-from-mirror = "tag-only" 5
- 1
- Indicates the repository that is referred to in a pull spec.
- 2
- Indicates the mirror for that repository.
- 3
- Indicates that the image pull from the mirror is a digest reference image.
- 4
- Indicates that the
NeverContactSource
parameter is set for this repository. - 5
- Indicates that the image pull from the mirror is a tag reference image.
Pull an image to the node from the source and check if it is resolved by the mirror.
sh-4.2# podman pull --log-level=debug registry.access.redhat.com/ubi9/ubi-minimal@sha256:5cf...
Troubleshooting repository mirroring
If the repository mirroring procedure does not work as described, use the following information about how repository mirroring works to help troubleshoot the problem.
- The first working mirror is used to supply the pulled image.
- The main registry is only used if no other mirror works.
-
From the system context, the
Insecure
flags are used as fallback. -
The format of the
/etc/containers/registries.conf
file has changed recently. It is now version 2 and in TOML format.
3.6.4.5.2. Converting ImageContentSourcePolicy (ICSP) files for image registry repository mirroring
Using an ImageContentSourcePolicy
(ICSP) object to configure repository mirroring is a deprecated feature. This functionality is still included in OpenShift Container Platform and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.
ICSP objects are being replaced by ImageDigestMirrorSet
and ImageTagMirrorSet
objects to configure repository mirroring. If you have existing YAML files that you used to create ImageContentSourcePolicy
objects, you can use the oc adm migrate icsp
command to convert those files to an ImageDigestMirrorSet
YAML file. The command updates the API to the current version, changes the kind
value to ImageDigestMirrorSet
, and changes spec.repositoryDigestMirrors
to spec.imageDigestMirrors
. The rest of the file is not changed.
Because the migration does not change the registries.conf
file, the cluster does not need to reboot.
For more information about ImageDigestMirrorSet
or ImageTagMirrorSet
objects, see "Configuring image registry repository mirroring" in the previous section.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
Ensure that you have
ImageContentSourcePolicy
objects on your cluster.
Procedure
Use the following command to convert one or more
ImageContentSourcePolicy
YAML files to anImageDigestMirrorSet
YAML file:$ oc adm migrate icsp <file_name>.yaml <file_name>.yaml <file_name>.yaml --dest-dir <path_to_the_directory>
where:
<file_name>
-
Specifies the name of the source
ImageContentSourcePolicy
YAML. You can list multiple file names. --dest-dir
-
Optional: Specifies a directory for the output
ImageDigestMirrorSet
YAML. If unset, the file is written to the current directory.
For example, the following command converts the
icsp.yaml
andicsp-2.yaml
file and saves the new YAML files to theidms-files
directory.$ oc adm migrate icsp icsp.yaml icsp-2.yaml --dest-dir idms-files
Example output
wrote ImageDigestMirrorSet to idms-files/imagedigestmirrorset_ubi8repo.5911620242173376087.yaml wrote ImageDigestMirrorSet to idms-files/imagedigestmirrorset_ubi9repo.6456931852378115011.yaml
Create the CR object by running the following command:
$ oc create -f <path_to_the_directory>/<file-name>.yaml
where:
<path_to_the_directory>
-
Specifies the path to the directory, if you used the
--dest-dir
flag. <file_name>
-
Specifies the name of the
ImageDigestMirrorSet
YAML.
- Remove the ICSP objects after the IDMS objects are rolled out.
3.6.4.6. Widening the scope of the mirror image catalog to reduce the frequency of cluster node reboots
You can scope the mirrored image catalog at the repository level or the wider registry level. A widely scoped ImageContentSourcePolicy
resource reduces the number of times the nodes need to reboot in response to changes to the resource.
To widen the scope of the mirror image catalog in the ImageContentSourcePolicy
resource, perform the following procedure.
Prerequisites
-
Install the OpenShift Container Platform CLI
oc
. -
Log in as a user with
cluster-admin
privileges. - Configure a mirrored image catalog for use in your disconnected cluster.
Procedure
Run the following command, specifying values for
<local_registry>
,<pull_spec>
, and<pull_secret_file>
:$ oc adm catalog mirror <local_registry>/<pull_spec> <local_registry> -a <pull_secret_file> --icsp-scope=registry
where:
- <local_registry>
-
is the local registry you have configured for your disconnected cluster, for example,
local.registry:5000
. - <pull_spec>
-
is the pull specification as configured in your disconnected registry, for example,
redhat/redhat-operator-index:v4.16
- <pull_secret_file>
-
is the
registry.redhat.io
pull secret in.json
file format. You can download the pull secret from Red Hat OpenShift Cluster Manager.
The
oc adm catalog mirror
command creates a/redhat-operator-index-manifests
directory and generatesimageContentSourcePolicy.yaml
,catalogSource.yaml
, andmapping.txt
files.Apply the new
ImageContentSourcePolicy
resource to the cluster:$ oc apply -f imageContentSourcePolicy.yaml
Verification
Verify that
oc apply
successfully applied the change toImageContentSourcePolicy
:$ oc get ImageContentSourcePolicy -o yaml
Example output
apiVersion: v1 items: - apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"operator.openshift.io/v1alpha1","kind":"ImageContentSourcePolicy","metadata":{"annotations":{},"name":"redhat-operator-index"},"spec":{"repositoryDigestMirrors":[{"mirrors":["local.registry:5000"],"source":"registry.redhat.io"}]}} ...
After you update the ImageContentSourcePolicy
resource, OpenShift Container Platform deploys the new settings to each node and the cluster starts using the mirrored repository for requests to the source repository.
3.6.4.7. Additional resources
3.6.5. Uninstalling the OpenShift Update Service from a cluster
To remove a local copy of the OpenShift Update Service (OSUS) from your cluster, you must first delete the OSUS application and then uninstall the OSUS Operator.
3.6.5.1. Deleting an OpenShift Update Service application
You can delete an OpenShift Update Service application by using the OpenShift Container Platform web console or CLI.
3.6.5.1.1. Deleting an OpenShift Update Service application by using the web console
You can use the OpenShift Container Platform web console to delete an OpenShift Update Service application by using the OpenShift Update Service Operator.
Prerequisites
- The OpenShift Update Service Operator has been installed.
Procedure
-
In the web console, click Operators
Installed Operators. - Choose OpenShift Update Service from the list of installed Operators.
- Click the Update Service tab.
- From the list of installed OpenShift Update Service applications, select the application to be deleted and then click Delete UpdateService.
- From the Delete UpdateService? confirmation dialog, click Delete to confirm the deletion.
3.6.5.1.2. Deleting an OpenShift Update Service application by using the CLI
You can use the OpenShift CLI (oc
) to delete an OpenShift Update Service application.
Procedure
Get the OpenShift Update Service application name using the namespace the OpenShift Update Service application was created in, for example,
openshift-update-service
:$ oc get updateservice -n openshift-update-service
Example output
NAME AGE service 6s
Delete the OpenShift Update Service application using the
NAME
value from the previous step and the namespace the OpenShift Update Service application was created in, for example,openshift-update-service
:$ oc delete updateservice service -n openshift-update-service
Example output
updateservice.updateservice.operator.openshift.io "service" deleted
3.6.5.2. Uninstalling the OpenShift Update Service Operator
You can uninstall the OpenShift Update Service Operator by using the OpenShift Container Platform web console or CLI.
3.6.5.2.1. Uninstalling the OpenShift Update Service Operator by using the web console
You can use the OpenShift Container Platform web console to uninstall the OpenShift Update Service Operator.
Prerequisites
- All OpenShift Update Service applications have been deleted.
Procedure
-
In the web console, click Operators
Installed Operators. - Select OpenShift Update Service from the list of installed Operators and click Uninstall Operator.
- From the Uninstall Operator? confirmation dialog, click Uninstall to confirm the uninstallation.
3.6.5.2.2. Uninstalling the OpenShift Update Service Operator by using the CLI
You can use the OpenShift CLI (oc
) to uninstall the OpenShift Update Service Operator.
Prerequisites
- All OpenShift Update Service applications have been deleted.
Procedure
Change to the project containing the OpenShift Update Service Operator, for example,
openshift-update-service
:$ oc project openshift-update-service
Example output
Now using project "openshift-update-service" on server "https://example.com:6443".
Get the name of the OpenShift Update Service Operator operator group:
$ oc get operatorgroup
Example output
NAME AGE openshift-update-service-fprx2 4m41s
Delete the operator group, for example,
openshift-update-service-fprx2
:$ oc delete operatorgroup openshift-update-service-fprx2
Example output
operatorgroup.operators.coreos.com "openshift-update-service-fprx2" deleted
Get the name of the OpenShift Update Service Operator subscription:
$ oc get subscription
Example output
NAME PACKAGE SOURCE CHANNEL update-service-operator update-service-operator updateservice-index-catalog v1
Using the
Name
value from the previous step, check the current version of the subscribed OpenShift Update Service Operator in thecurrentCSV
field:$ oc get subscription update-service-operator -o yaml | grep " currentCSV"
Example output
currentCSV: update-service-operator.v0.0.1
Delete the subscription, for example,
update-service-operator
:$ oc delete subscription update-service-operator
Example output
subscription.operators.coreos.com "update-service-operator" deleted
Delete the CSV for the OpenShift Update Service Operator using the
currentCSV
value from the previous step:$ oc delete clusterserviceversion update-service-operator.v0.0.1
Example output
clusterserviceversion.operators.coreos.com "update-service-operator.v0.0.1" deleted
3.7. Updating hardware on nodes running on vSphere
You must ensure that your nodes running in vSphere are running on the hardware version supported by OpenShift Container Platform. Currently, hardware version 15 or later is supported for vSphere virtual machines in a cluster.
You can update your virtual hardware immediately or schedule an update in vCenter.
- Version 4.16 of OpenShift Container Platform requires VMware virtual hardware version 15 or later.
- Before upgrading OpenShift 4.12 to OpenShift 4.13, you must update vSphere to v7.0.2 or later; otherwise, the OpenShift 4.12 cluster is marked un-upgradeable.
3.7.1. Updating virtual hardware on vSphere
To update the hardware of your virtual machines (VMs) on VMware vSphere, update your virtual machines separately to reduce the risk of downtime for your cluster.
As of OpenShift Container Platform 4.13, VMware virtual hardware version 13 is no longer supported. You need to update to VMware version 15 or later for supporting functionality.
3.7.1.1. Updating the virtual hardware for control plane nodes on vSphere
To reduce the risk of downtime, it is recommended that control plane nodes be updated serially. This ensures that the Kubernetes API remains available and etcd retains quorum.
Prerequisites
- You have cluster administrator permissions to execute the required permissions in the vCenter instance hosting your OpenShift Container Platform cluster.
- Your vSphere ESXi hosts are version 7.0U2 or later.
Procedure
List the control plane nodes in your cluster.
$ oc get nodes -l node-role.kubernetes.io/master
Example output
NAME STATUS ROLES AGE VERSION control-plane-node-0 Ready master 75m v1.29.4 control-plane-node-1 Ready master 75m v1.29.4 control-plane-node-2 Ready master 75m v1.29.4
Note the names of your control plane nodes.
Mark the control plane node as unschedulable.
$ oc adm cordon <control_plane_node>
-
Shut down the virtual machine (VM) associated with the control plane node. Do this in the vSphere client by right-clicking the VM and selecting Power
Shut Down Guest OS. Do not shut down the VM using Power Off because it might not shut down safely. - Update the VM in the vSphere client. Follow Upgrade the Compatibility of a Virtual Machine Manually in the VMware documentation for more information.
- Power on the VM associated with the control plane node. Do this in the vSphere client by right-clicking the VM and selecting Power On.
Wait for the node to report as
Ready
:$ oc wait --for=condition=Ready node/<control_plane_node>
Mark the control plane node as schedulable again:
$ oc adm uncordon <control_plane_node>
- Repeat this procedure for each control plane node in your cluster.
3.7.1.2. Updating the virtual hardware for compute nodes on vSphere
To reduce the risk of downtime, it is recommended that compute nodes be updated serially.
Multiple compute nodes can be updated in parallel given workloads are tolerant of having multiple nodes in a NotReady
state. It is the responsibility of the administrator to ensure that the required compute nodes are available.
Prerequisites
- You have cluster administrator permissions to execute the required permissions in the vCenter instance hosting your OpenShift Container Platform cluster.
- Your vSphere ESXi hosts are version 7.0U2 or later.
Procedure
List the compute nodes in your cluster.
$ oc get nodes -l node-role.kubernetes.io/worker
Example output
NAME STATUS ROLES AGE VERSION compute-node-0 Ready worker 30m v1.29.4 compute-node-1 Ready worker 30m v1.29.4 compute-node-2 Ready worker 30m v1.29.4
Note the names of your compute nodes.
Mark the compute node as unschedulable:
$ oc adm cordon <compute_node>
Evacuate the pods from the compute node. There are several ways to do this. For example, you can evacuate all or selected pods on a node:
$ oc adm drain <compute_node> [--pod-selector=<pod_selector>]
See the "Understanding how to evacuate pods on nodes" section for other options to evacuate pods from a node.
-
Shut down the virtual machine (VM) associated with the compute node. Do this in the vSphere client by right-clicking the VM and selecting Power
Shut Down Guest OS. Do not shut down the VM using Power Off because it might not shut down safely. - Update the VM in the vSphere client. Follow Upgrade the Compatibility of a Virtual Machine Manually in the VMware documentation for more information.
- Power on the VM associated with the compute node. Do this in the vSphere client by right-clicking the VM and selecting Power On.
Wait for the node to report as
Ready
:$ oc wait --for=condition=Ready node/<compute_node>
Mark the compute node as schedulable again:
$ oc adm uncordon <compute_node>
- Repeat this procedure for each compute node in your cluster.
3.7.1.3. Updating the virtual hardware for template on vSphere
Prerequisites
- You have cluster administrator permissions to execute the required permissions in the vCenter instance hosting your OpenShift Container Platform cluster.
- Your vSphere ESXi hosts are version 7.0U2 or later.
Procedure
If the RHCOS template is configured as a vSphere template follow Convert a Template to a Virtual Machine in the VMware documentation prior to the next step.
NoteOnce converted from a template, do not power on the virtual machine.
- Update the virtual machine (VM) in the VMware vSphere client. Complete the steps outlined in Upgrade the Compatibility of a Virtual Machine Manually (VMware vSphere documentation).
Convert the VM in the vSphere client to a template by right-clicking on the VM and then selecting Template
Convert to Template. ImportantThe steps for converting a VM to a template might change in future vSphere documentation versions.
Additional resources
3.7.2. Scheduling an update for virtual hardware on vSphere
Virtual hardware updates can be scheduled to occur when a virtual machine is powered on or rebooted. You can schedule your virtual hardware updates exclusively in vCenter by following Schedule a Compatibility Upgrade for a Virtual Machine in the VMware documentation.
When scheduling an update prior to performing an update of OpenShift Container Platform, the virtual hardware update occurs when the nodes are rebooted during the course of the OpenShift Container Platform update.
3.8. Migrating to a cluster with multi-architecture compute machines
You can migrate your current cluster with single-architecture compute machines to a cluster with multi-architecture compute machines by updating to a multi-architecture, manifest-listed payload. This allows you to add mixed architecture compute nodes to your cluster.
For information about configuring your multi-architecture compute machines, see "Configuring multi-architecture compute machines on an OpenShift Container Platform cluster".
Before migrating your single-architecture cluster to a cluster with multi-architecture compute machines, it is recommended to install the Multiarch Tuning Operator, and deploy a ClusterPodPlacementConfig
custom resource. For more information, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
Migration from a multi-architecture payload to a single-architecture payload is not supported. Once a cluster has transitioned to using a multi-architecture payload, it can no longer accept a single-architecture update payload.
3.8.1. Migrating to a cluster with multi-architecture compute machines using the CLI
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. Your OpenShift Container Platform version is up to date to at least version 4.13.0.
For more information on how to update your cluster version, see Updating a cluster using the web console or Updating a cluster using the CLI.
-
You have installed the OpenShift CLI (
oc
) that matches the version for your current cluster. -
Your
oc
client is updated to at least verion 4.13.0. Your OpenShift Container Platform cluster is installed on AWS, Azure, GCP, bare metal or IBM P/Z platforms.
For more information on selecting a supported platform for your cluster installation, see Selecting a cluster installation type.
Procedure
Verify that the
RetrievedUpdates
condition isTrue
in the Cluster Version Operator (CVO) by running the following command:$ oc get clusterversion/version -o=jsonpath="{.status.conditions[?(.type=='RetrievedUpdates')].status}"
If the
RetrievedUpates
condition isFalse
, you can find supplemental information regarding the failure by using the following command:$ oc adm upgrade
For more information about cluster version condition types, see Understanding cluster version condition types.
If the condition
RetrievedUpdates
isFalse
, change the channel tostable-<4.y>
orfast-<4.y>
with the following command:$ oc adm upgrade channel <channel>
After setting the channel, verify if
RetrievedUpdates
isTrue
.For more information about channels, see Understanding update channels and releases.
Migrate to the multi-architecture payload with following command:
$ oc adm upgrade --to-multi-arch
Verification
You can monitor the migration by running the following command:
$ oc adm upgrade
ImportantMachine launches may fail as the cluster settles into the new state. To notice and recover when machines fail to launch, we recommend deploying machine health checks. For more information about machine health checks and how to deploy them, see About machine health checks.
The migrations must be complete and all the cluster operators must be stable before you can add compute machine sets with different architectures to your cluster.
Additional resources
- Configuring multi-architecture compute machines on an OpenShift Container Platform cluster
- Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
- Updating a cluster using the web console
- Updating a cluster using the CLI
- Understanding cluster version condition types
- Understanding update channels and releases
- Selecting a cluster installation type
- About machine health checks
3.9. Updating hosted control planes
On hosted control planes for OpenShift Container Platform, updates are decoupled between the control plane and the nodes. Your service cluster provider, which is the user that hosts the cluster control planes, can manage the updates as needed. The hosted cluster handles control plane updates, and node pools handle node updates.
3.9.1. Updates for the hosted cluster
The spec.release
value dictates the version of the control plane. The HostedCluster
object transmits the intended spec.release
value to the HostedControlPlane.spec.release
value and runs the appropriate Control Plane Operator version.
The hosted control plane manages the rollout of the new version of the control plane components along with any OpenShift Container Platform components through the new version of the Cluster Version Operator (CVO).
3.9.2. Updates for node pools
With node pools, you can configure the software that is running in the nodes by exposing the spec.release
and spec.config
values. You can start a rolling node pool update in the following ways:
-
Changing the
spec.release
orspec.config
values. - Changing any platform-specific field, such as the AWS instance type. The result is a set of new instances with the new type.
- Changing the cluster configuration, if the change propagates to the node.
Node pools support replace updates and in-place updates. The nodepool.spec.release
value dictates the version of any particular node pool. A NodePool
object completes a replace or an in-place rolling update according to the .spec.management.upgradeType
value.
After you create a node pool, you cannot change the update type. If you want to change the update type, you must create a node pool and delete the other one.
3.9.2.1. Replace updates for node pools
A replace update creates instances in the new version while it removes old instances from the previous version. This update type is effective in cloud environments where this level of immutability is cost effective.
Replace updates do not preserve any manual changes because the node is entirely re-provisioned.
3.9.2.2. In place updates for node pools
An in-place update directly updates the operating systems of the instances. This type is suitable for environments where the infrastructure constraints are higher, such as bare metal.
In-place updates can preserve manual changes, but will report errors if you make manual changes to any file system or operating system configuration that the cluster directly manages, such as kubelet certificates.
3.9.3. Configuring node pools for hosted control planes
On hosted control planes, you can configure node pools by creating a MachineConfig
object inside of a config map in the management cluster.
Procedure
To create a
MachineConfig
object inside of a config map in the management cluster, enter the following information:apiVersion: v1 kind: ConfigMap metadata: name: <configmap-name> namespace: clusters data: config: | apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: <machineconfig-name> spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:... mode: 420 overwrite: true path: ${PATH} 1
- 1
- Sets the path on the node where the
MachineConfig
object is stored.
After you add the object to the config map, you can apply the config map to the node pool as follows:
$ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>
apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: # ... name: nodepool-1 namespace: clusters # ... spec: config: - name: ${configmap-name} # ...
3.10. Updating the boot loader on RHCOS nodes using bootupd
To update the boot loader on RHCOS nodes using bootupd
, you must either run the bootupctl update
command on RHCOS machines manually or provide a machine config with a systemd
unit.
Unlike grubby
or other boot loader tools, bootupd
does not manage kernel space configuration such as passing kernel arguments. To configure kernel arguments, see Adding kernel arguments to nodes.
You can use bootupd
to update the boot loader to protect against the BootHole vulnerability.
3.10.1. Updating the boot loader manually
You can manually inspect the status of the system and update the boot loader by using the bootupctl
command-line tool.
Inspect the system status:
# bootupctl status
Example output for
x86_64
Component EFI Installed: grub2-efi-x64-1:2.04-31.el8_4.1.x86_64,shim-x64-15-8.el8_1.x86_64 Update: At latest version
Example output for
aarch64
Component EFI Installed: grub2-efi-aa64-1:2.02-99.el8_4.1.aarch64,shim-aa64-15.4-2.el8_1.aarch64 Update: At latest version
OpenShift Container Platform clusters initially installed on version 4.4 and older require an explicit adoption phase.
If the system status is
Adoptable
, perform the adoption:# bootupctl adopt-and-update
Example output
Updated: grub2-efi-x64-1:2.04-31.el8_4.1.x86_64,shim-x64-15-8.el8_1.x86_64
If an update is available, apply the update so that the changes take effect on the next reboot:
# bootupctl update
Example output
Updated: grub2-efi-x64-1:2.04-31.el8_4.1.x86_64,shim-x64-15-8.el8_1.x86_64
3.10.2. Updating the bootloader automatically via a machine config
Another way to automatically update the boot loader with bootupd
is to create a systemd service unit that will update the boot loader as needed on every boot. This unit will run the bootupctl update
command during the boot process and will be installed on the nodes via a machine config.
This configuration is not enabled by default as unexpected interruptions of the update operation may lead to unbootable nodes. If you enable this configuration, make sure to avoid interrupting nodes during the boot process while the bootloader update is in progress. The boot loader update operation generally completes quickly thus the risk is low.
Create a Butane config file,
99-worker-bootupctl-update.bu
, including the contents of thebootupctl-update.service
systemd unit.NoteSee "Creating machine configs with Butane" for information about Butane.
Example output
variant: openshift version: 4.16.0 metadata: name: 99-worker-chrony 1 labels: machineconfiguration.openshift.io/role: worker 2 systemd: units: - name: bootupctl-update.service enabled: true contents: | [Unit] Description=Bootupd automatic update [Service] ExecStart=/usr/bin/bootupctl update RemainAfterExit=yes [Install] WantedBy=multi-user.target
Use Butane to generate a
MachineConfig
object file,99-worker-bootupctl-update.yaml
, containing the configuration to be delivered to the nodes:$ butane 99-worker-bootupctl-update.bu -o 99-worker-bootupctl-update.yaml
Apply the configurations in one of two ways:
-
If the cluster is not running yet, after you generate manifest files, add the
MachineConfig
object file to the<installation_directory>/openshift
directory, and then continue to create the cluster. If the cluster is already running, apply the file:
$ oc apply -f ./99-worker-bootupctl-update.yaml
-
If the cluster is not running yet, after you generate manifest files, add the