Chapter 15. Image-based upgrade for single-node OpenShift clusters
15.1. Understanding the image-based upgrade for single-node OpenShift clusters
From OpenShift Container Platform 4.14.13, the Lifecycle Agent provides you with an alternative way to upgrade the platform version of a single-node OpenShift cluster. The image-based upgrade is faster than the standard upgrade method and allows you to directly upgrade from OpenShift Container Platform <4.y> to <4.y+2>, and <4.y.z> to <4.y.z+n>.
This upgrade method utilizes a generated OCI image from a dedicated seed cluster that is installed on the target single-node OpenShift cluster as a new ostree
stateroot. A seed cluster is a single-node OpenShift cluster deployed with the target OpenShift Container Platform version, Day 2 Operators, and configurations that are common to all target clusters.
You can use the seed image, which is generated from the seed cluster, to upgrade the platform version on any single-node OpenShift cluster that has the same combination of hardware, Day 2 Operators, and cluster configuration as the seed cluster.
The image-based upgrade uses custom images that are specific to the hardware platform that the clusters are running on. Each different hardware platform requires a separate seed image.
The Lifecycle Agent uses two custom resources (CRs) on the participating clusters to orchestrate the upgrade:
-
On the seed cluster, the
SeedGenerator
CR allows for the seed image generation. This CR specifies the repository to push the seed image to. -
On the target cluster, the
ImageBasedUpgrade
CR specifies the seed image for the upgrade of the target cluster and the backup configurations for your workloads.
Example SeedGenerator CR
apiVersion: lca.openshift.io/v1 kind: SeedGenerator metadata: name: seedimage spec: seedImage: <seed_image>
Example ImageBasedUpgrade CR
apiVersion: lca.openshift.io/v1 kind: ImageBasedUpgrade metadata: name: upgrade spec: stage: Idle 1 seedImageRef: 2 version: <target_version> image: <seed_container_image> pullSecretRef: name: <seed_pull_secret> autoRollbackOnFailure: {} # initMonitorTimeoutSeconds: 1800 3 extraManifests: 4 - name: example-extra-manifests namespace: openshift-lifecycle-agent oadpContent: 5 - name: oadp-cm-example namespace: openshift-adp
- 1
- Defines the desired stage for the
ImageBasedUpgrade
CR. The value can beIdle
,Prep
,Upgrade
, orRollback
. - 2
- Defines the target platform version, the seed image to be used, and the secret required to access the image.
- 3
- (Optional) Specify the time frame in seconds to roll back when the upgrade does not complete within that time frame after the first reboot. If not defined or set to
0
, the default value of1800
seconds (30 minutes) is used. - 4
- (Optional) Specify the list of
ConfigMap
resources that contain your custom catalog sources to retain after the upgrade, and your extra manifests to apply to the target cluster that are not part of the seed image. - 5
- Specify the list of
ConfigMap
resources that contain the OADPBackup
andRestore
CRs.
15.1.1. Stages of the image-based upgrade
After generating the seed image on the seed cluster, you can move through the stages on the target cluster by setting the spec.stage
field to one of the following values in the ImageBasedUpgrade
CR:
-
Idle
-
Prep
-
Upgrade
-
Rollback
(Optional)
Figure 15.1. Stages of the image-based upgrade
15.1.1.1. Idle stage
The Lifecycle Agent creates an ImageBasedUpgrade
CR set to stage: Idle
when the Operator is first deployed. This is the default stage. There is no ongoing upgrade and the cluster is ready to move to the Prep
stage.
Figure 15.2. Transition from Idle stage
You also move to the Idle
stage to do one of the following steps:
- Finalize a successful upgrade
- Finalize a rollback
-
Cancel an ongoing upgrade until the pre-pivot phase in the
Upgrade
stage
Moving to the Idle
stage ensures that the Lifecycle Agent cleans up resources, so that the cluster is ready for upgrades again.
Figure 15.3. Transitions to Idle stage
If using RHACM when you cancel an upgrade, you must remove the import.open-cluster-management.io/disable-auto-import
annotation from the target managed cluster to re-enable the automatic import of the cluster.
15.1.1.2. Prep stage
You can complete this stage before a scheduled maintenance window.
For the Prep
stage, you specify the following upgrade details in the ImageBasedUpgrade
CR:
- seed image to use
- resources to back up
- extra manifests to apply and custom catalog sources to retain after the upgrade, if any
Then, based on what you specify, the Lifecycle Agent prepares for the upgrade without impacting the current running version. During this stage, the Lifecycle Agent ensures that the target cluster is ready to proceed to the Upgrade
stage by checking if it meets certain conditions. The Operator pulls the seed image to the target cluster with additional container images specified in the seed image. The Lifecycle Agent checks if there is enough space on the container storage disk and if necessary, the Operator deletes unpinned images until the disk usage is below the specified threshold. For more information about how to configure or disable the cleaning up of the container storage disk, see "Configuring the automatic image cleanup of the container storage disk".
You also prepare backup resources with the OADP Operator’s Backup
and Restore
CRs. These CRs are used in the Upgrade
stage to reconfigure the cluster, register the cluster with RHACM, and restore application artifacts.
In addition to the OADP Operator, the Lifecycle Agent uses the ostree
versioning system to create a backup, which allows complete cluster reconfiguration after both upgrade and rollback.
After the Prep
stage finishes, you can cancel the upgrade process by moving to the Idle
stage or you can start the upgrade by moving to the Upgrade
stage in the ImageBasedUpgrade
CR. If you cancel the upgrade, the Operator performs cleanup operations.
Figure 15.4. Transition from Prep stage
15.1.1.3. Upgrade stage
The Upgrade
stage consists of two phases:
- pre-pivot
-
Just before pivoting to the new stateroot, the Lifecycle Agent collects the required cluster specific artifacts and stores them in the new stateroot. The backup of your cluster resources specified in the
Prep
stage are created on a compatible Object storage solution. The Lifecycle Agent exports CRs specified in theextraManifests
field in theImageBasedUpgrade
CR or the CRs described in the ZTP policies that are bound to the target cluster. After pre-pivot phase has completed, the Lifecycle Agent sets the new stateroot deployment as the default boot entry and reboots the node. - post-pivot
- After booting from the new stateroot, the Lifecycle Agent also regenerates the seed image’s cluster cryptography. This ensures that each single-node OpenShift cluster upgraded with the same seed image has unique and valid cryptographic objects. The Operator then reconfigures the cluster by applying cluster-specific artifacts that were collected in the pre-pivot phase. The Operator applies all saved CRs, and restores the backups.
After the upgrade has completed and you are satisfied with the changes, you can finalize the upgrade by moving to the Idle
stage.
When you finalize the upgrade, you cannot roll back to the original release.
Figure 15.5. Transitions from Upgrade stage
If you want to cancel the upgrade, you can do so until the pre-pivot phase of the Upgrade
stage. If you encounter issues after the upgrade, you can move to the Rollback
stage for a manual rollback.
15.1.1.4. Rollback stage
The Rollback
stage can be initiated manually or automatically upon failure. During the Rollback
stage, the Lifecycle Agent sets the original ostree
stateroot deployment as default. Then, the node reboots with the previous release of OpenShift Container Platform and application configurations.
If you move to the Idle
stage after a rollback, the Lifecycle Agent cleans up resources that can be used to troubleshoot a failed upgrade.
The Lifecycle Agent initiates an automatic rollback if the upgrade does not complete within a specified time limit. For more information about the automatic rollback, see the "Moving to the Rollback stage with Lifecycle Agent" or "Moving to the Rollback stage with Lifecycle Agent and GitOps ZTP" sections.
Figure 15.6. Transition from Rollback stage
Additional resources
- Configuring the automatic image cleanup of the container storage disk
- Performing an image-based upgrade for single-node OpenShift clusters with Lifecycle Agent
- Performing an image-based upgrade for single-node OpenShift clusters using GitOps ZTP
- Moving to the Rollback stage of the image-based upgrade with Lifecycle Agent
- Moving to the Rollback stage of the image-based upgrade with Lifecycle Agent and GitOps ZTP
15.1.2. Guidelines for the image-based upgrade
For a successful image-based upgrade, your deployments must meet certain requirements.
There are different deployment methods in which you can perform the image-based upgrade:
- GitOps ZTP
- You use the GitOps Zero Touch Provisioning (ZTP) to deploy and configure your clusters.
- Non-GitOps
- You manually deploy and configure your clusters.
You can perform an image-based upgrade in disconnected environments. For more information about how to mirror images for a disconnected environment, see "Mirroring images for a disconnected installation".
Additional resources
15.1.2.1. Minimum software version of components
Depending on your deployment method, the image-based upgrade requires the following minimum software versions.
Component | Software version | Required |
---|---|---|
Lifecycle Agent | 4.16 | Yes |
OADP Operator | 1.4.1 | Yes |
Managed cluster version | 4.14.13 | Yes |
Hub cluster version | 4.16 | No |
RHACM | 2.10.2 | No |
GitOps ZTP plugin | 4.16 | Only for GitOps ZTP deployment method |
Red Hat OpenShift GitOps | 1.12 | Only for GitOps ZTP deployment method |
Topology Aware Lifecycle Manager (TALM) | 4.16 | Only for GitOps ZTP deployment method |
Local Storage Operator [1] | 4.14 | Yes |
Logical Volume Manager (LVM) Storage [1] | 4.14.2 | Yes |
- The persistent storage must be provided by either the LVM Storage or the Local Storage Operator, not both.
15.1.2.2. Hub cluster guidelines
If you are using Red Hat Advanced Cluster Management (RHACM), your hub cluster needs to meet the following conditions:
- To avoid including any RHACM resources in your seed image, you need to disable all optional RHACM add-ons before generating the seed image.
- Your hub cluster must be upgraded to at least the target version before performing an image-based upgrade on a target single-node OpenShift cluster.
15.1.2.3. Seed image guidelines
The seed image targets a set of single-node OpenShift clusters with the same hardware and similar configuration. This means that the seed cluster must match the configuration of the target clusters for the following items:
CPU topology
- Number of CPU cores
- Tuned performance configuration, such as number of reserved CPUs
-
MachineConfig
resources for the target cluster IP version
NoteDual-stack networking is not supported in this release.
- Set of Day 2 Operators, including the Lifecycle Agent and the OADP Operator
- Disconnected registry
- FIPS configuration
The following configurations only have to partially match on the participating clusters:
- If the target cluster has a proxy configuration, the seed cluster must have a proxy configuration too but the configuration does not have to be the same.
-
A dedicated partition on the primary disk for container storage is required on all participating clusters. However, the size and start of the partition does not have to be the same. Only the
spec.config.storage.disks.partitions.label: varlibcontainers
label in theMachineConfig
CR must match on both the seed and target clusters. For more information about how to create the disk partition, see "Configuring a shared container partition between ostree stateroots" or "Configuring a shared container partition between ostree stateroots when using GitOps ZTP".
For more information about what to include in the seed image, see "Seed image configuration" and "Seed image configuration using the RAN DU profile".
15.1.2.4. OADP backup and restore guidelines
With the OADP Operator, you can back up and restore your applications on your target clusters by using Backup
and Restore
CRs wrapped in ConfigMap
objects. The application must work on the current and the target OpenShift Container Platform versions so that they can be restored after the upgrade. The backups must include resources that were initially created.
The following resources must be excluded from the backup:
-
pods
-
endpoints
-
controllerrevision
-
podmetrics
-
packagemanifest
-
replicaset
-
localvolume
, if using Local Storage Operator (LSO)
There are two local storage implementations for single-node OpenShift:
- Local Storage Operator (LSO)
-
The Lifecycle Agent automatically backs up and restores the required artifacts, including
localvolume
resources and their associatedStorageClass
resources. You must exclude thepersistentvolumes
resource in the applicationBackup
CR. - LVM Storage
-
You must create the
Backup
andRestore
CRs for LVM Storage artifacts. You must include thepersistentVolumes
resource in the applicationBackup
CR.
For the image-based upgrade, only one Operator is supported on a given target cluster.
For both Operators, you must not apply the Operator CRs as extra manifests through the ImageBasedUpgrade
CR.
The persistent volume contents are preserved and used after the pivot. When you are configuring the DataProtectionApplication
CR, you must ensure that the .spec.configuration.restic.enable
is set to false
for an image-based upgrade. This disables Container Storage Interface integration.
15.1.2.4.1. lca.openshift.io/apply-wave guidelines
The lca.openshift.io/apply-wave
annotation determines the apply order of Backup
or Restore
CRs. The value of the annotation must be a string number. If you define the lca.openshift.io/apply-wave
annotation in the Backup
or Restore
CRs, they are applied in increasing order based on the annotation value. If you do not define the annotation, they are applied together.
The lca.openshift.io/apply-wave
annotation must be numerically lower in your platform Restore
CRs, for example RHACM and LVM Storage artifacts, than that of the application. This way, the platform artifacts are restored before your applications.
If your application includes cluster-scoped resources, you must create separate Backup
and Restore
CRs to scope the backup to the specific cluster-scoped resources created by the application. The Restore
CR for the cluster-scoped resources must be restored before the remaining application Restore
CR(s).
15.1.2.4.2. lca.openshift.io/apply-label guidelines
You can back up specific resources exclusively with the lca.openshift.io/apply-label
annotation. Based on which resources you define in the annotation, the Lifecycle Agent applies the lca.openshift.io/backup: <backup_name>
label and adds the labelSelector.matchLabels.lca.openshift.io/backup: <backup_name>
label selector to the specified resources when creating the Backup
CRs.
To use the lca.openshift.io/apply-label
annotation for backing up specific resources, the resources listed in the annotation must also be included in the spec
section. If the lca.openshift.io/apply-label
annotation is used in the Backup
CR, only the resources listed in the annotation are backed up, even if other resource types are specified in the spec
section or not.
Example CR
apiVersion: velero.io/v1
kind: Backup
metadata:
name: acm-klusterlet
namespace: openshift-adp
annotations:
lca.openshift.io/apply-label: rbac.authorization.k8s.io/v1/clusterroles/klusterlet,apps/v1/deployments/open-cluster-management-agent/klusterlet 1
labels:
velero.io/storage-location: default
spec:
includedNamespaces:
- open-cluster-management-agent
includedClusterScopedResources:
- clusterroles
includedNamespaceScopedResources:
- deployments
- 1
- The value must be a list of comma-separated objects in
group/version/resource/name
format for cluster-scoped resources orgroup/version/resource/namespace/name
format for namespace-scoped resources, and it must be attached to the relatedBackup
CR.
15.1.2.5. Extra manifest guidelines
The Lifecycle Agent uses extra manifests to restore your target clusters after rebooting with the new stateroot deployment and before restoring application artifacts.
Different deployment methods require a different way to apply the extra manifests:
- GitOps ZTP
You use the
lca.openshift.io/target-ocp-version: <target_ocp_version>
label to mark the extra manifests that the Lifecycle Agent must extract and apply after the pivot. You can specify the number of manifests labeled withlca.openshift.io/target-ocp-version
by using thelca.openshift.io/target-ocp-version-manifest-count
annotation in theImageBasedUpgrade
CR. If specified, the Lifecycle Agent verifies that the number of manifests extracted from policies matches the number provided in the annotation during the prep and upgrade stages.Example for the lca.openshift.io/target-ocp-version-manifest-count annotation
apiVersion: lca.openshift.io/v1 kind: ImageBasedUpgrade metadata: annotations: lca.openshift.io/target-ocp-version-manifest-count: "5" name: upgrade
- Non-Gitops
-
You mark your extra manifests with the
lca.openshift.io/apply-wave
annotation to determine the apply order. The labeled extra manifests are wrapped inConfigMap
objects and referenced in theImageBasedUpgrade
CR that the Lifecycle Agent uses after the pivot.
If the target cluster uses custom catalog sources, you must include them as extra manifests that point to the correct release version.
You cannot apply the following items as extra manifests:
-
MachineConfig
objects - OLM Operator subscriptions
Additional resources
- Performing an image-based upgrade for single-node OpenShift clusters with Lifecycle Agent
- Performing an image-based upgrade for single-node OpenShift clusters using GitOps ZTP
- Preparing the hub cluster for ZTP
- Creating ConfigMap objects for the image-based upgrade with Lifecycle Agent
- Creating ConfigMap objects for the image-based upgrade with GitOps ZTP
- About installing OADP
15.2. Preparing for an image-based upgrade for single-node OpenShift clusters
15.2.2. Installing Operators for the image-based upgrade
Prepare your clusters for the upgrade by installing the Lifecycle Agent and the OADP Operator.
To install the OADP Operator with the non-GitOps method, see "Installing the OADP Operator".
Additional resources
15.2.2.1. Installing the Lifecycle Agent by using the CLI
You can use the OpenShift CLI (oc
) to install the Lifecycle Agent.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
object YAML file for the Lifecycle Agent, for examplelcao-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-lifecycle-agent annotations: workload.openshift.io/allowed: management
Create the
Namespace
CR by running the following command:$ oc create -f lcao-namespace.yaml
Create an
OperatorGroup
object YAML file for the Lifecycle Agent, for examplelcao-operatorgroup.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-lifecycle-agent namespace: openshift-lifecycle-agent spec: targetNamespaces: - openshift-lifecycle-agent
Create the
OperatorGroup
CR by running the following command:$ oc create -f lcao-operatorgroup.yaml
Create a
Subscription
CR, for example,lcao-subscription.yaml
:apiVersion: operators.coreos.com/v1 kind: Subscription metadata: name: openshift-lifecycle-agent-subscription namespace: openshift-lifecycle-agent spec: channel: "stable" name: lifecycle-agent source: redhat-operators sourceNamespace: openshift-marketplace
Create the
Subscription
CR by running the following command:$ oc create -f lcao-subscription.yaml
Verification
To verify that the installation succeeded, inspect the CSV resource by running the following command:
$ oc get csv -n openshift-lifecycle-agent
Example output
NAME DISPLAY VERSION REPLACES PHASE lifecycle-agent.v4.16.0 Openshift Lifecycle Agent 4.16.0 Succeeded
Verify that the Lifecycle Agent is up and running by running the following command:
$ oc get deploy -n openshift-lifecycle-agent
Example output
NAME READY UP-TO-DATE AVAILABLE AGE lifecycle-agent-controller-manager 1/1 1 1 14s
15.2.2.2. Installing the Lifecycle Agent by using the web console
You can use the OpenShift Container Platform web console to install the Lifecycle Agent.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
OperatorHub. - Search for the Lifecycle Agent from the list of available Operators, and then click Install.
- On the Install Operator page, under A specific namespace on the cluster select openshift-lifecycle-agent.
- Click Install.
Verification
To confirm that the installation is successful:
-
Click Operators
Installed Operators. Ensure that the Lifecycle Agent is listed in the openshift-lifecycle-agent project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
-
Click Operators
If the Operator is not installed successfully:
-
Click Operators
Installed Operators, and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status. -
Click Workloads
Pods, and check the logs for pods in the openshift-lifecycle-agent project.
15.2.2.3. Installing the Lifecycle Agent with GitOps ZTP
Install the Lifecycle Agent with GitOps Zero Touch Provisioning (ZTP) to do an image-based upgrade.
Procedure
Extract the following CRs from the
ztp-site-generate
container image and push them to thesource-cr
directory:Example
LcaSubscriptionNS.yaml
fileapiVersion: v1 kind: Namespace metadata: name: openshift-lifecycle-agent annotations: workload.openshift.io/allowed: management ran.openshift.io/ztp-deploy-wave: "2" labels: kubernetes.io/metadata.name: openshift-lifecycle-agent
Example
LcaSubscriptionOperGroup.yaml
fileapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: lifecycle-agent-operatorgroup namespace: openshift-lifecycle-agent annotations: ran.openshift.io/ztp-deploy-wave: "2" spec: targetNamespaces: - openshift-lifecycle-agent
Example
LcaSubscription.yaml
fileapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: lifecycle-agent namespace: openshift-lifecycle-agent annotations: ran.openshift.io/ztp-deploy-wave: "2" spec: channel: "stable" name: lifecycle-agent source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
Example directory structure
├── kustomization.yaml ├── sno │ ├── example-cnf.yaml │ ├── common-ranGen.yaml │ ├── group-du-sno-ranGen.yaml │ ├── group-du-sno-validator-ranGen.yaml │ └── ns.yaml ├── source-crs │ ├── LcaSubscriptionNS.yaml │ ├── LcaSubscriptionOperGroup.yaml │ ├── LcaSubscription.yaml
Add the CRs to your common
PolicyGenTemplate
:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "example-common-latest" namespace: "ztp-common" spec: bindingRules: common: "true" du-profile: "latest" sourceFiles: - fileName: LcaSubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: LcaSubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: LcaSubscription.yaml policyName: "subscriptions-policy" [...]
15.2.2.4. Installing and configuring the OADP Operator with GitOps ZTP
Install and configure the OADP Operator with GitOps ZTP before starting the upgrade.
Procedure
Extract the following CRs from the
ztp-site-generate
container image and push them to thesource-cr
directory:Example
OadpSubscriptionNS.yaml
fileapiVersion: v1 kind: Namespace metadata: name: openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "2" labels: kubernetes.io/metadata.name: openshift-adp
Example
OadpSubscriptionOperGroup.yaml
fileapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: redhat-oadp-operator namespace: openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "2" spec: targetNamespaces: - openshift-adp
Example
OadpSubscription.yaml
fileapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: redhat-oadp-operator namespace: openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "2" spec: channel: stable-1.4 name: redhat-oadp-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
Example
OadpOperatorStatus.yaml
fileapiVersion: operators.coreos.com/v1 kind: Operator metadata: name: redhat-oadp-operator.openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "2" status: components: refs: - kind: Subscription namespace: openshift-adp conditions: - type: CatalogSourcesUnhealthy status: "False" - kind: InstallPlan namespace: openshift-adp conditions: - type: Installed status: "True" - kind: ClusterServiceVersion namespace: openshift-adp conditions: - type: Succeeded status: "True" reason: InstallSucceeded
Example directory structure
├── kustomization.yaml ├── sno │ ├── example-cnf.yaml │ ├── common-ranGen.yaml │ ├── group-du-sno-ranGen.yaml │ ├── group-du-sno-validator-ranGen.yaml │ └── ns.yaml ├── source-crs │ ├── OadpSubscriptionNS.yaml │ ├── OadpSubscriptionOperGroup.yaml │ ├── OadpSubscription.yaml │ ├── OadpOperatorStatus.yaml
Add the CRs to your common
PolicyGenTemplate
:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "example-common-latest" namespace: "ztp-common" spec: bindingRules: common: "true" du-profile: "latest" sourceFiles: - fileName: OadpSubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: OadpSubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: OadpSubscription.yaml policyName: "subscriptions-policy" - fileName: OadpOperatorStatus.yaml policyName: "subscriptions-policy" [...]
Create the
DataProtectionApplication
CR and the S3 secret only for the target cluster:Extract the following CRs from the
ztp-site-generate
container image and push them to thesource-cr
directory:Example
DataProtectionApplication.yaml
fileapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: dataprotectionapplication namespace: openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "100" spec: configuration: restic: enable: false 1 velero: defaultPlugins: - aws - openshift resourceTimeout: 10m backupLocations: - velero: config: profile: "default" region: minio s3Url: $url insecureSkipTLSVerify: "true" s3ForcePathStyle: "true" provider: aws default: true credential: key: cloud name: cloud-credentials objectStorage: bucket: $bucketName 2 prefix: $prefixName 3 status: conditions: - reason: Complete status: "True" type: Reconciled
- 1
- The
spec.configuration.restic.enable
field must be set tofalse
for an image-based upgrade because persistent volume contents are retained and reused after the upgrade. - 2 3
- The bucket defines the bucket name that is created in S3 backend. The prefix defines the name of the subdirectory that will be automatically created in the bucket. The combination of bucket and prefix must be unique for each target cluster to avoid interference between them. To ensure a unique storage directory for each target cluster, you can use the RHACM hub template function, for example,
prefix: {{hub .ManagedClusterName hub}}
.
Example
OadpSecret.yaml
fileapiVersion: v1 kind: Secret metadata: name: cloud-credentials namespace: openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "100" type: Opaque
Example
OadpBackupStorageLocationStatus.yaml
fileapiVersion: velero.io/v1 kind: BackupStorageLocation metadata: namespace: openshift-adp annotations: ran.openshift.io/ztp-deploy-wave: "100" status: phase: Available
The
OadpBackupStorageLocationStatus.yaml
CR verifies the availability of backup storage locations created by OADP.Add the CRs to your site
PolicyGenTemplate
with overrides:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "example-cnf" namespace: "ztp-site" spec: bindingRules: sites: "example-cnf" du-profile: "latest" mcp: "master" sourceFiles: ... - fileName: OadpSecret.yaml policyName: "config-policy" data: cloud: <your_credentials> 1 - fileName: DataProtectionApplication.yaml policyName: "config-policy" spec: backupLocations: - velero: config: region: minio s3Url: <your_S3_URL> 2 profile: "default" insecureSkipTLSVerify: "true" s3ForcePathStyle: "true" provider: aws default: true credential: key: cloud name: cloud-credentials objectStorage: bucket: <your_bucket_name> 3 prefix: <cluster_name> 4 - fileName: OadpBackupStorageLocationStatus.yaml policyName: "config-policy"
- 1
- Specify your credentials for your S3 storage backend.
- 2
- Specify the URL for your S3-compatible bucket.
- 3 4
- The
bucket
defines the bucket name that is created in S3 backend. Theprefix
defines the name of the subdirectory that will be automatically created in thebucket
. The combination ofbucket
andprefix
must be unique for each target cluster to avoid interference between them. To ensure a unique storage directory for each target cluster, you can use the RHACM hub template function, for example,prefix: {{hub .ManagedClusterName hub}}
.
15.2.3. Generating a seed image for the image-based upgrade with the Lifecycle Agent
Use the Lifecycle Agent to generate the seed image with the SeedGenerator
custom resource (CR).
15.2.3.1. Seed image configuration
The seed image targets a set of single-node OpenShift clusters with the same hardware and similar configuration. This means that the seed image must have all of the components and configuration that the seed cluster shares with the target clusters. Therefore, the seed image generated from the seed cluster cannot contain any cluster-specific configuration.
The following table lists the components, resources, and configurations that you must and must not include in your seed image:
Cluster configuration | Include in seed image |
---|---|
Performance profile | Yes |
| Yes |
IP version [1] | Yes |
Set of Day 2 Operators, including the Lifecycle Agent and the OADP Operator | Yes |
Disconnected registry configuration [2] | Yes |
Valid proxy configuration [3] | Yes |
FIPS configuration | Yes |
Dedicated partition on the primary disk for container storage that matches the size of the target clusters | Yes |
Local volumes
| No |
OADP | No |
- Dual-stack networking is not supported in this release.
- If the seed cluster is installed in a disconnected environment, the target clusters must also be installed in a disconnected environment.
- The proxy configuration does not have to be the same.
15.2.3.1.1. Seed image configuration using the RAN DU profile
The following table lists the components, resources, and configurations that you must and must not include in the seed image when using the RAN DU profile:
Resource | Include in seed image |
---|---|
All extra manifests that are applied as part of Day 0 installation | Yes |
All Day 2 Operator subscriptions | Yes |
| Yes |
| Yes |
| Yes |
| Yes |
| Yes |
| Yes |
|
No, if it is used in |
| No |
| No |
Resource | Apply as extra manifest |
---|---|
| Yes |
| Yes |
| Yes |
| Yes |
| Yes |
| If the interfaces of the target cluster are common with the seed cluster, you can include them in the seed image. Otherwise, apply it as extra manifests. |
| If the configuration, including namespaces, is exactly the same on both the seed and target cluster, you can include them in the seed image. Otherwise, apply them as extra manifests. |
15.2.3.2. Generating a seed image with the Lifecycle Agent
Use the Lifecycle Agent to generate the seed image with the SeedGenerator
CR. The Operator checks for required system configurations, performs any necessary system cleanup before generating the seed image, and launches the image generation. The seed image generation includes the following tasks:
- Stopping cluster Operators
- Preparing the seed image configuration
-
Generating and pushing the seed image to the image repository specified in the
SeedGenerator
CR - Restoring cluster Operators
- Expiring seed cluster certificates
- Generating new certificates for the seed cluster
-
Restoring and updating the
SeedGenerator
CR on the seed cluster
Prerequisites
- You have configured a shared container directory on the seed cluster.
- You have installed the minimum version of the OADP Operator and the Lifecycle Agent on the seed cluster.
- Ensure that persistent volumes are not configured on the seed cluster.
-
Ensure that the
LocalVolume
CR does not exist on the seed cluster if the Local Storage Operator is used. -
Ensure that the
LVMCluster
CR does not exist on the seed cluster if LVM Storage is used. -
Ensure that the
DataProtectionApplication
CR does not exist on the seed cluster if OADP is used.
Procedure
Detach the cluster from the hub to delete any RHACM-specific resources from the seed cluster that must not be in the seed image:
Manually detach the seed cluster by running the following command:
$ oc delete managedcluster sno-worker-example
-
Wait until the
ManagedCluster
CR is removed. After the CR is removed, create the properSeedGenerator
CR. The Lifecycle Agent cleans up the RHACM artifacts.
-
Wait until the
If you are using GitOps ZTP, detach your cluster by removing the seed cluster’s
SiteConfig
CR from thekustomization.yaml
.If you have a
kustomization.yaml
file that references multipleSiteConfig
CRs, remove your seed cluster’sSiteConfig
CR from thekustomization.yaml
:apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization generators: #- example-seed-sno1.yaml - example-target-sno2.yaml - example-target-sno3.yaml
If you have a
kustomization.yaml
that references oneSiteConfig
CR, remove your seed cluster’sSiteConfig
CR from thekustomization.yaml
and add thegenerators: {}
line:apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization generators: {}
Commit the
kustomization.yaml
changes in your Git repository and push the changes to your repository.The ArgoCD pipeline detects the changes and removes the managed cluster.
Create the
Secret
object so that you can push the seed image to your registry.Create the authentication file by running the following commands:
$ MY_USER=myuserid $ AUTHFILE=/tmp/my-auth.json $ podman login --authfile ${AUTHFILE} -u ${MY_USER} quay.io/${MY_USER}
$ base64 -w 0 ${AUTHFILE} ; echo
Copy the output into the
seedAuth
field in theSecret
YAML file namedseedgen
in theopenshift-lifecycle-agent
namespace:apiVersion: v1 kind: Secret metadata: name: seedgen 1 namespace: openshift-lifecycle-agent type: Opaque data: seedAuth: <encoded_AUTHFILE> 2
Apply the
Secret
by running the following command:$ oc apply -f secretseedgenerator.yaml
Create the
SeedGenerator
CR:apiVersion: lca.openshift.io/v1 kind: SeedGenerator metadata: name: seedimage 1 spec: seedImage: <seed_container_image> 2
Generate the seed image by running the following command:
$ oc apply -f seedgenerator.yaml
ImportantThe cluster reboots and loses API capabilities while the Lifecycle Agent generates the seed image. Applying the
SeedGenerator
CR stops thekubelet
and the CRI-O operations, then it starts the image generation.
If you want to generate more seed images, you must provision a new seed cluster with the version that you want to generate a seed image from.
Verification
After the cluster recovers and it is available, you can check the status of the
SeedGenerator
CR by running the following command:$ oc get seedgenerator -o yaml
Example output
status: conditions: - lastTransitionTime: "2024-02-13T21:24:26Z" message: Seed Generation completed observedGeneration: 1 reason: Completed status: "False" type: SeedGenInProgress - lastTransitionTime: "2024-02-13T21:24:26Z" message: Seed Generation completed observedGeneration: 1 reason: Completed status: "True" type: SeedGenCompleted 1 observedGeneration: 1
- 1
- The seed image generation is complete.
15.2.4. Creating ConfigMap objects for the image-based upgrade with the Lifecycle Agent
The Lifecycle Agent needs all your OADP resources, extra manifests, and custom catalog sources wrapped in a ConfigMap
object to process them for the image-based upgrade.
15.2.4.1. Creating OADP ConfigMap objects for the image-based upgrade with Lifecycle Agent
Create your OADP resources that are used to back up and restore your resources during the upgrade.
Prerequisites
- Generate a seed image from a compatible seed cluster.
- Create OADP backup and restore resources.
- Create a separate partition on the target cluster for the container images that is shared between stateroots. For more information about, see "Configuring a shared container partition for the image-based upgrade".
- Deploy a version of Lifecycle Agent that is compatible with the version used with the seed image.
-
Install the OADP Operator, the
DataProtectionApplication
CR, and its secret on the target cluster. - Create an S3-compatible storage solution and a ready-to-use bucket with proper credentials configured. For more information, see "About installing OADP".
Procedure
Create the OADP
Backup
andRestore
CRs for platform artifacts in the same namespace where the OADP Operator is installed, which isopenshift-adp
.If the target cluster is managed by RHACM, add the following YAML file for backing up and restoring RHACM artifacts:
PlatformBackupRestore.yaml for RHACM
apiVersion: velero.io/v1 kind: Backup metadata: name: acm-klusterlet annotations: lca.openshift.io/apply-label: "apps/v1/deployments/open-cluster-management-agent/klusterlet,v1/secrets/open-cluster-management-agent/bootstrap-hub-kubeconfig,rbac.authorization.k8s.io/v1/clusterroles/klusterlet,v1/serviceaccounts/open-cluster-management-agent/klusterlet,scheduling.k8s.io/v1/priorityclasses/klusterlet-critical,rbac.authorization.k8s.io/v1/clusterroles/open-cluster-management:klusterlet-admin-aggregate-clusterrole,rbac.authorization.k8s.io/v1/clusterrolebindings/klusterlet,operator.open-cluster-management.io/v1/klusterlets/klusterlet,apiextensions.k8s.io/v1/customresourcedefinitions/klusterlets.operator.open-cluster-management.io,v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials" 1 labels: velero.io/storage-location: default namespace: openshift-adp spec: includedNamespaces: - open-cluster-management-agent includedClusterScopedResources: - klusterlets.operator.open-cluster-management.io - clusterroles.rbac.authorization.k8s.io - clusterrolebindings.rbac.authorization.k8s.io - priorityclasses.scheduling.k8s.io includedNamespaceScopedResources: - deployments - serviceaccounts - secrets excludedNamespaceScopedResources: [] --- apiVersion: velero.io/v1 kind: Restore metadata: name: acm-klusterlet namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "1" spec: backupName: acm-klusterlet
- 1
- If your
multiclusterHub
CR does not have.spec.imagePullSecret
defined and the secret does not exist on theopen-cluster-management-agent
namespace in your hub cluster, removev1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials
.
If you created persistent volumes on your cluster through LVM Storage, add the following YAML file for LVM Storage artifacts:
PlatformBackupRestoreLvms.yaml for LVM Storage
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: lvmcluster namespace: openshift-adp spec: includedNamespaces: - openshift-storage includedNamespaceScopedResources: - lvmclusters - lvmvolumegroups - lvmvolumegroupnodestatuses --- apiVersion: velero.io/v1 kind: Restore metadata: name: lvmcluster namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "2" 1 spec: backupName: lvmcluster
- 1
- The
lca.openshift.io/apply-wave
value must be lower than the values specified in the applicationRestore
CRs.
If you need to restore applications after the upgrade, create the OADP
Backup
andRestore
CRs for your application in theopenshift-adp
namespace.Create the OADP CRs for cluster-scoped application artifacts in the
openshift-adp
namespace.Example OADP CRs for cluster-scoped application artifacts for LSO and LVM Storage
apiVersion: velero.io/v1 kind: Backup metadata: annotations: lca.openshift.io/apply-label: "apiextensions.k8s.io/v1/customresourcedefinitions/test.example.com,security.openshift.io/v1/securitycontextconstraints/test,rbac.authorization.k8s.io/v1/clusterroles/test-role,rbac.authorization.k8s.io/v1/clusterrolebindings/system:openshift:scc:test" 1 name: backup-app-cluster-resources labels: velero.io/storage-location: default namespace: openshift-adp spec: includedClusterScopedResources: - customresourcedefinitions - securitycontextconstraints - clusterrolebindings - clusterroles excludedClusterScopedResources: - Namespace --- apiVersion: velero.io/v1 kind: Restore metadata: name: test-app-cluster-resources namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "3" 2 spec: backupName: backup-app-cluster-resources
Create the OADP CRs for your namespace-scoped application artifacts.
Example OADP CRs namespace-scoped application artifacts when LSO is used
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: backup-app namespace: openshift-adp spec: includedNamespaces: - test includedNamespaceScopedResources: - secrets - persistentvolumeclaims - deployments - statefulsets - configmaps - cronjobs - services - job - poddisruptionbudgets - <application_custom_resources> 1 excludedClusterScopedResources: - persistentVolumes --- apiVersion: velero.io/v1 kind: Restore metadata: name: test-app namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "4" spec: backupName: backup-app
- 1
- Define custom resources for your application.
Example OADP CRs namespace-scoped application artifacts when LVM Storage is used
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: backup-app namespace: openshift-adp spec: includedNamespaces: - test includedNamespaceScopedResources: - secrets - persistentvolumeclaims - deployments - statefulsets - configmaps - cronjobs - services - job - poddisruptionbudgets - <application_custom_resources> 1 includedClusterScopedResources: - persistentVolumes 2 - logicalvolumes.topolvm.io 3 - volumesnapshotcontents 4 --- apiVersion: velero.io/v1 kind: Restore metadata: name: test-app namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "4" spec: backupName: backup-app restorePVs: true restoreStatus: includedResources: - logicalvolumes 5
ImportantThe same version of the applications must function on both the current and the target release of OpenShift Container Platform.
Create the
ConfigMap
object for your OADP CRs by running the following command:$ oc create configmap oadp-cm-example --from-file=example-oadp-resources.yaml=<path_to_oadp_crs> -n openshift-adp
Patch the
ImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade \ -p='{"spec": {"oadpContent": [{"name": "oadp-cm-example", "namespace": "openshift-adp"}]}}' \ --type=merge -n openshift-lifecycle-agent
Additional resources
15.2.4.2. Creating ConfigMap objects of extra manifests for the image-based upgrade with Lifecycle Agent
Create additional manifests that you want to apply to the target cluster.
Procedure
Create a YAML file that contains your extra manifests, such as SR-IOV.
Example SR-IOV resources
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: "example-sriov-node-policy" namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci isRdma: false nicSelector: pfNames: [ens1f0] nodeSelector: node-role.kubernetes.io/master: "" mtu: 1500 numVfs: 8 priority: 99 resourceName: example-sriov-node-policy --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: "example-sriov-network" namespace: openshift-sriov-network-operator spec: ipam: |- { } linkState: auto networkNamespace: sriov-namespace resourceName: example-sriov-node-policy spoofChk: "on" trust: "off"
Create the
ConfigMap
object by running the following command:$ oc create configmap example-extra-manifests-cm --from-file=example-extra-manifests.yaml=<path_to_extramanifest> -n openshift-lifecycle-agent
Patch the
ImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade \ -p='{"spec": {"extraManifests": [{"name": "example-extra-manifests-cm", "namespace": "openshift-lifecycle-agent"}]}}' \ --type=merge -n openshift-lifecycle-agent
15.2.4.3. Creating ConfigMap objects of custom catalog sources for the image-based upgrade with Lifecycle Agent
You can keep your custom catalog sources after the upgrade by generating a ConfigMap
object for your catalog sources and adding them to the spec.extraManifest
field in the ImageBasedUpgrade
CR. For more information about catalog sources, see "Catalog source".
Procedure
Create a YAML file that contains the
CatalogSource
CR:apiVersion: operators.coreos.com/v1 kind: CatalogSource metadata: name: example-catalogsources namespace: openshift-marketplace spec: sourceType: grpc displayName: disconnected-redhat-operators image: quay.io/example-org/example-catalog:v1
Create the
ConfigMap
object by running the following command:$ oc create configmap example-catalogsources-cm --from-file=example-catalogsources.yaml=<path_to_catalogsource_cr> -n openshift-lifecycle-agent
Patch the
ImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade \ -p='{"spec": {"extraManifests": [{"name": "example-catalogsources-cm", "namespace": "openshift-lifecycle-agent"}]}}' \ --type=merge -n openshift-lifecycle-agent
15.2.5. Creating ConfigMap objects for the image-based upgrade with the Lifecycle Agent using GitOps ZTP
Create your OADP resources, extra manifests, and custom catalog sources wrapped in a ConfigMap
object to prepare for the image-based upgrade.
15.2.5.1. Creating OADP resources for the image-based upgrade with GitOps ZTP
Prepare your OADP resources to restore your application after an upgrade.
Prerequisites
- Provision one or more managed clusters with GitOps ZTP.
-
Log in as a user with
cluster-admin
privileges. - Generate a seed image from a compatible seed cluster.
- Create a separate partition on the target cluster for the container images that is shared between stateroots. For more information, see "Configuring a shared container partition between ostree stateroots when using GitOps ZTP".
- Deploy a version of Lifecycle Agent that is compatible with the version used with the seed image.
-
Install the OADP Operator, the
DataProtectionApplication
CR, and its secret on the target cluster. - Create an S3-compatible storage solution and a ready-to-use bucket with proper credentials configured. For more information, see "Installing and configuring the OADP Operator with GitOps ZTP".
Procedure
Ensure that your Git repository that you use with the ArgoCD policies application contains the following directory structure:
├── source-crs/ │ ├── ibu/ │ │ ├── ImageBasedUpgrade.yaml │ │ ├── PlatformBackupRestore.yaml │ │ ├── PlatformBackupRestoreLvms.yaml ├── ... ├── ibu-upgrade-ranGen.yaml ├── kustomization.yaml
ImportantThe
kustomization.yaml
file must be located in the same directory structure as previously shown to reference theibu-upgrade-ranGen.yaml
manifest.The
source-crs/ibu/PlatformBackupRestore.yaml
file is provided in the ZTP container image.PlatformBackupRestore.yaml
apiVersion: velero.io/v1 kind: Backup metadata: name: acm-klusterlet annotations: lca.openshift.io/apply-label: "apps/v1/deployments/open-cluster-management-agent/klusterlet,v1/secrets/open-cluster-management-agent/bootstrap-hub-kubeconfig,rbac.authorization.k8s.io/v1/clusterroles/klusterlet,v1/serviceaccounts/open-cluster-management-agent/klusterlet,scheduling.k8s.io/v1/priorityclasses/klusterlet-critical,rbac.authorization.k8s.io/v1/clusterroles/open-cluster-management:klusterlet-admin-aggregate-clusterrole,rbac.authorization.k8s.io/v1/clusterrolebindings/klusterlet,operator.open-cluster-management.io/v1/klusterlets/klusterlet,apiextensions.k8s.io/v1/customresourcedefinitions/klusterlets.operator.open-cluster-management.io,v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials" 1 labels: velero.io/storage-location: default namespace: openshift-adp spec: includedNamespaces: - open-cluster-management-agent includedClusterScopedResources: - klusterlets.operator.open-cluster-management.io - clusterroles.rbac.authorization.k8s.io - clusterrolebindings.rbac.authorization.k8s.io - priorityclasses.scheduling.k8s.io includedNamespaceScopedResources: - deployments - serviceaccounts - secrets excludedNamespaceScopedResources: [] --- apiVersion: velero.io/v1 kind: Restore metadata: name: acm-klusterlet namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "1" spec: backupName: acm-klusterlet
- 1
- If your
multiclusterHub
CR does not have.spec.imagePullSecret
defined and the secret does not exist on theopen-cluster-management-agent
namespace in your hub cluster, removev1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials
.
If you use LVM Storage to create persistent volumes, you can use the
source-crs/ibu/PlatformBackupRestoreLvms.yaml
provided in the ZTP container image to back up your LVM Storage resources.PlatformBackupRestoreLvms.yaml
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: lvmcluster namespace: openshift-adp spec: includedNamespaces: - openshift-storage includedNamespaceScopedResources: - lvmclusters - lvmvolumegroups - lvmvolumegroupnodestatuses --- apiVersion: velero.io/v1 kind: Restore metadata: name: lvmcluster namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "2" 1 spec: backupName: lvmcluster
- 1
- The
lca.openshift.io/apply-wave
value must be lower than the values specified in the applicationRestore
CRs.
If you need to restore applications after the upgrade, create the OADP
Backup
andRestore
CRs for your application in theopenshift-adp
namespace:Create the OADP CRs for cluster-scoped application artifacts in the
openshift-adp
namespace:Example OADP CRs for cluster-scoped application artifacts for LSO and LVM Storage
apiVersion: velero.io/v1 kind: Backup metadata: annotations: lca.openshift.io/apply-label: "apiextensions.k8s.io/v1/customresourcedefinitions/test.example.com,security.openshift.io/v1/securitycontextconstraints/test,rbac.authorization.k8s.io/v1/clusterroles/test-role,rbac.authorization.k8s.io/v1/clusterrolebindings/system:openshift:scc:test" 1 name: backup-app-cluster-resources labels: velero.io/storage-location: default namespace: openshift-adp spec: includedClusterScopedResources: - customresourcedefinitions - securitycontextconstraints - clusterrolebindings - clusterroles excludedClusterScopedResources: - Namespace --- apiVersion: velero.io/v1 kind: Restore metadata: name: test-app-cluster-resources namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "3" 2 spec: backupName: backup-app-cluster-resources
Create the OADP CRs for your namespace-scoped application artifacts in the
source-crs/custom-crs
directory:Example OADP CRs namespace-scoped application artifacts when LSO is used
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: backup-app namespace: openshift-adp spec: includedNamespaces: - test includedNamespaceScopedResources: - secrets - persistentvolumeclaims - deployments - statefulsets - configmaps - cronjobs - services - job - poddisruptionbudgets - <application_custom_resources> 1 excludedClusterScopedResources: - persistentVolumes --- apiVersion: velero.io/v1 kind: Restore metadata: name: test-app namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "4" spec: backupName: backup-app
- 1
- Define custom resources for your application.
Example OADP CRs namespace-scoped application artifacts when LVM Storage is used
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: backup-app namespace: openshift-adp spec: includedNamespaces: - test includedNamespaceScopedResources: - secrets - persistentvolumeclaims - deployments - statefulsets - configmaps - cronjobs - services - job - poddisruptionbudgets - <application_custom_resources> 1 includedClusterScopedResources: - persistentVolumes 2 - logicalvolumes.topolvm.io 3 - volumesnapshotcontents 4 --- apiVersion: velero.io/v1 kind: Restore metadata: name: test-app namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "4" spec: backupName: backup-app restorePVs: true restoreStatus: includedResources: - logicalvolumes 5
ImportantThe same version of the applications must function on both the current and the target release of OpenShift Container Platform.
Create the
oadp-cm
ConfigMap
object through theoadp-cm-policy
in a newPolicyGenTemplate
calledibu-upgrade-ranGen.yaml
:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: example-group-ibu namespace: "ztp-group" spec: bindingRules: group-du-sno: "" mcp: "master" evaluationInterval: compliant: 10s noncompliant: 10s sourceFiles: - fileName: ConfigMapGeneric.yaml complianceType: mustonlyhave policyName: "oadp-cm-policy" metadata: name: oadp-cm namespace: openshift-adp
Create a
kustomization.yaml
with the following content:apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization generators: 1 - ibu-upgrade-ranGen.yaml configMapGenerator: 2 - files: - source-crs/ibu/PlatformBackupRestore.yaml #- source-crs/custom-crs/ApplicationClusterScopedBackupRestore.yaml #- source-crs/custom-crs/ApplicationApplicationBackupRestoreLso.yaml name: oadp-cm namespace: ztp-group generatorOptions: disableNameSuffixHash: true patches: 3 - target: group: policy.open-cluster-management.io version: v1 kind: Policy name: example-group-ibu-oadp-cm-policy patch: |- - op: replace path: /spec/policy-templates/0/objectDefinition/spec/object-templates/0/objectDefinition/data value: '{{hub copyConfigMapData "ztp-group" "oadp-cm" hub}}'
- Push the changes to your Git repository.
15.2.5.2. Labeling extra manifests for the image-based upgrade with GitOps ZTP
Label your extra manifests so that the Lifecycle Agent can extract resources that are labeled with the lca.openshift.io/target-ocp-version: <target_version>
label.
Prerequisites
- Provision one or more managed clusters with GitOps ZTP.
-
Log in as a user with
cluster-admin
privileges. - Generate a seed image from a compatible seed cluster.
- Create a separate partition on the target cluster for the container images that is shared between stateroots. For more information, see "Configuring a shared container directory between ostree stateroots when using GitOps ZTP".
- Deploy a version of Lifecycle Agent that is compatible with the version used with the seed image.
Procedure
Label your required extra manifests with the
lca.openshift.io/target-ocp-version: <target_version>
label in your existingPolicyGenTemplate
CR:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: example-sno spec: bindingRules: sites: "example-sno" du-profile: "4.15" mcp: "master" sourceFiles: - fileName: SriovNetwork.yaml policyName: "config-policy" metadata: name: "sriov-nw-du-fh" labels: lca.openshift.io/target-ocp-version: "4.15" 1 spec: resourceName: du_fh vlan: 140 - fileName: SriovNetworkNodePolicy.yaml policyName: "config-policy" metadata: name: "sriov-nnp-du-fh" labels: lca.openshift.io/target-ocp-version: "4.15" spec: deviceType: netdevice isRdma: false nicSelector: pfNames: ["ens5f0"] numVfs: 8 priority: 10 resourceName: du_fh - fileName: SriovNetwork.yaml policyName: "config-policy" metadata: name: "sriov-nw-du-mh" labels: lca.openshift.io/target-ocp-version: "4.15" spec: resourceName: du_mh vlan: 150 - fileName: SriovNetworkNodePolicy.yaml policyName: "config-policy" metadata: name: "sriov-nnp-du-mh" labels: lca.openshift.io/target-ocp-version: "4.15" spec: deviceType: vfio-pci isRdma: false nicSelector: pfNames: ["ens7f0"] numVfs: 8 priority: 10 resourceName: du_mh - fileName: DefaultCatsrc.yaml 2 policyName: "config-policy" metadata: name: default-cat-source namespace: openshift-marketplace labels: lca.openshift.io/target-ocp-version: "4.15" spec: displayName: default-cat-source image: quay.io/example-org/example-catalog:v1
- 1
- Ensure that the
lca.openshift.io/target-ocp-version
label matches either the y-stream or the z-stream of the target OpenShift Container Platform version that is specified in thespec.seedImageRef.version
field of theImageBasedUpgrade
CR. The Lifecycle Agent only applies the CRs that match the specified version. - 2
- If you do not want to use custom catalog sources, remove this entry.
- Push the changes to your Git repository.
15.2.6. Configuring the automatic image cleanup of the container storage disk
Configure when the Lifecycle Agent cleans up unpinned images in the Prep
stage by setting a minimum threshold for available storage space through annotations. The default container storage disk usage threshold is 50%.
The Lifecycle Agent does not delete images that are pinned in CRI-O or are currently used. The Operator selects the images for deletion by starting with dangling images and then sorting the images from oldest to newest that is determined by the image Created
timestamp.
15.2.6.1. Configuring the automatic image cleanup of the container storage disk
Configure the minimum threshold for available storage space through annotations.
Prerequisites
-
Create an
ImageBasedUpgrade
CR.
Procedure
Increase the threshold to 65% by running the following command:
$ oc -n openshift-lifecycle-agent annotate ibu upgrade image-cleanup.lca.openshift.io/disk-usage-threshold-percent='65'
(Optional) Remove the threshold override by running the following command:
$ oc -n openshift-lifecycle-agent annotate ibu upgrade image-cleanup.lca.openshift.io/disk-usage-threshold-percent-
15.2.6.2. Disable the automatic image cleanup of the container storage disk
Disable the automatic image cleanup threshold.
Procedure
Disable the automatic image cleanup by running the following command:
$ oc -n openshift-lifecycle-agent annotate ibu upgrade image-cleanup.lca.openshift.io/on-prep='Disabled'
(Optional) Enable automatic image cleanup again by running the following command:
$ oc -n openshift-lifecycle-agent annotate ibu upgrade image-cleanup.lca.openshift.io/on-prep-
15.3. Performing an image-based upgrade for single-node OpenShift clusters with the Lifecycle Agent
You can use the Lifecycle Agent to do a manual image-based upgrade of a single-node OpenShift cluster.
When you deploy the Lifecycle Agent on a cluster, an ImageBasedUpgrade
CR is automatically created. You update this CR to specify the image repository of the seed image and to move through the different stages.
15.3.1. Moving to the Prep stage of the image-based upgrade with Lifecycle Agent
When you deploy the Lifecycle Agent on a cluster, an ImageBasedUpgrade
custom resource (CR) is automatically created.
After you created all the resources that you need during the upgrade, you can move on to the Prep
stage. For more information, see the "Creating ConfigMap objects for the image-based upgrade with Lifecycle Agent" section.
Prerequisites
- You have created resources to back up and restore your clusters.
Procedure
Check that you have patched your
ImageBasedUpgrade
CR:apiVersion: lca.openshift.io/v1 kind: ImageBasedUpgrade metadata: name: upgrade spec: stage: Idle seedImageRef: version: 4.15.2 1 image: <seed_container_image> 2 pullSecretRef: <seed_pull_secret> 3 autoRollbackOnFailure: {} # initMonitorTimeoutSeconds: 1800 4 extraManifests: 5 - name: example-extra-manifests-cm namespace: openshift-lifecycle-agent - name: example-catalogsources-cm namespace: openshift-lifecycle-agent oadpContent: 6 - name: oadp-cm-example namespace: openshift-adp
- 1
- Specify the target platform version. The value must match the version of the seed image.
- 2
- Specify the repository where the target cluster can pull the seed image from.
- 3
- Specify the reference to a secret with credentials to pull container images if the images are in a private registry.
- 4
- (Optional) Specify the time frame in seconds to roll back if the upgrade does not complete within that time frame after the first reboot. If not defined or set to
0
, the default value of1800
seconds (30 minutes) is used. - 5
- (Optional) Specify the list of
ConfigMap
resources that contain your custom catalog sources to retain after the upgrade and your extra manifests to apply to the target cluster that are not part of the seed image. - 6
- Add the
oadpContent
section with the OADPConfigMap
information.
To start the
Prep
stage, change the value of thestage
field toPrep
in theImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Prep"}}' --type=merge -n openshift-lifecycle-agent
If you provide
ConfigMap
objects for OADP resources and extra manifests, Lifecycle Agent validates the specifiedConfigMap
objects during thePrep
stage. You might encounter the following issues:-
Validation warnings or errors if the Lifecycle Agent detects any issues with the
extraManifests
parameters. -
Validation errors if the Lifecycle Agent detects any issues with the
oadpContent
parameters.
Validation warnings do not block the
Upgrade
stage but you must decide if it is safe to proceed with the upgrade. These warnings, for example missing CRDs, namespaces, or dry run failures, update thestatus.conditions
for thePrep
stage andannotation
fields in theImageBasedUpgrade
CR with details about the warning.Example validation warning
[...] metadata: annotations: extra-manifest.lca.openshift.io/validation-warning: '...' [...]
However, validation errors, such as adding
MachineConfig
or Operator manifests to extra manifests, cause thePrep
stage to fail and block theUpgrade
stage.When the validations pass, the cluster creates a new
ostree
stateroot, which involves pulling and unpacking the seed image, and running host-level commands. Finally, all the required images are precached on the target cluster.-
Validation warnings or errors if the Lifecycle Agent detects any issues with the
Verification
Check the status of the
ImageBasedUpgrade
CR by running the following command:$ oc get ibu -o yaml
Example output
conditions: - lastTransitionTime: "2024-01-01T09:00:00Z" message: In progress observedGeneration: 13 reason: InProgress status: "False" type: Idle - lastTransitionTime: "2024-01-01T09:00:00Z" message: Prep completed observedGeneration: 13 reason: Completed status: "False" type: PrepInProgress - lastTransitionTime: "2024-01-01T09:00:00Z" message: Prep stage completed successfully observedGeneration: 13 reason: Completed status: "True" type: PrepCompleted observedGeneration: 13 validNextStages: - Idle - Upgrade
Additional resources
15.3.2. Moving to the Upgrade stage of the image-based upgrade with Lifecycle Agent
After you generate the seed image and complete the Prep
stage, you can upgrade the target cluster. During the upgrade process, the OADP Operator creates a backup of the artifacts specified in the OADP custom resources (CRs), then the Lifecycle Agent upgrades the cluster.
If the upgrade fails or stops, an automatic rollback is initiated. If you have an issue after the upgrade, you can initiate a manual rollback. For more information about manual rollback, see "Moving to the Rollback stage of the image-based upgrade with Lifecycle Agent".
Prerequisites
-
Complete the
Prep
stage.
Procedure
To move to the
Upgrade
stage, change the value of thestage
field toUpgrade
in theImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Upgrade"}}' --type=merge
Check the status of the
ImageBasedUpgrade
CR by running the following command:$ oc get ibu -o yaml
Example output
status: conditions: - lastTransitionTime: "2024-01-01T09:00:00Z" message: In progress observedGeneration: 5 reason: InProgress status: "False" type: Idle - lastTransitionTime: "2024-01-01T09:00:00Z" message: Prep completed observedGeneration: 5 reason: Completed status: "False" type: PrepInProgress - lastTransitionTime: "2024-01-01T09:00:00Z" message: Prep completed successfully observedGeneration: 5 reason: Completed status: "True" type: PrepCompleted - lastTransitionTime: "2024-01-01T09:00:00Z" message: |- Waiting for system to stabilize: one or more health checks failed - one or more ClusterOperators not yet ready: authentication - one or more MachineConfigPools not yet ready: master - one or more ClusterServiceVersions not yet ready: sriov-fec.v2.8.0 observedGeneration: 1 reason: InProgress status: "True" type: UpgradeInProgress observedGeneration: 1 rollbackAvailabilityExpiration: "2024-05-19T14:01:52Z" validNextStages: - Rollback
The OADP Operator creates a backup of the data specified in the OADP
Backup
andRestore
CRs and the target cluster reboots.Monitor the status of the CR by running the following command:
$ oc get ibu -o yaml
If you are satisfied with the upgrade, finalize the changes by patching the value of the
stage
field toIdle
in theImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge
ImportantYou cannot roll back the changes once you move to the
Idle
stage after an upgrade.The Lifecycle Agent deletes all resources created during the upgrade process.
- You can remove the OADP Operator and its configuration files after a successful upgrade. For more information, see "Deleting Operators from a cluster".
Verification
Check the status of the
ImageBasedUpgrade
CR by running the following command:$ oc get ibu -o yaml
Example output
status: conditions: - lastTransitionTime: "2024-01-01T09:00:00Z" message: In progress observedGeneration: 5 reason: InProgress status: "False" type: Idle - lastTransitionTime: "2024-01-01T09:00:00Z" message: Prep completed observedGeneration: 5 reason: Completed status: "False" type: PrepInProgress - lastTransitionTime: "2024-01-01T09:00:00Z" message: Prep completed successfully observedGeneration: 5 reason: Completed status: "True" type: PrepCompleted - lastTransitionTime: "2024-01-01T09:00:00Z" message: Upgrade completed observedGeneration: 1 reason: Completed status: "False" type: UpgradeInProgress - lastTransitionTime: "2024-01-01T09:00:00Z" message: Upgrade completed observedGeneration: 1 reason: Completed status: "True" type: UpgradeCompleted observedGeneration: 1 rollbackAvailabilityExpiration: "2024-01-01T09:00:00Z" validNextStages: - Idle - Rollback
Check the status of the cluster restoration by running the following command:
$ oc get restores -n openshift-adp -o custom-columns=NAME:.metadata.name,Status:.status.phase,Reason:.status.failureReason
Example output
NAME Status Reason acm-klusterlet Completed <none> 1 apache-app Completed <none> localvolume Completed <none>
- 1
- The
acm-klusterlet
is specific to RHACM environments only.
15.3.3. Moving to the Rollback stage of the image-based upgrade with Lifecycle Agent
An automatic rollback is initiated if the upgrade does not complete within the time frame specified in the initMonitorTimeoutSeconds
field after rebooting.
Example ImageBasedUpgrade CR
apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
name: upgrade
spec:
stage: Idle
seedImageRef:
version: 4.15.2
image: <seed_container_image>
autoRollbackOnFailure: {}
# initMonitorTimeoutSeconds: 1800 1
[...]
- 1
- (Optional) Specify the time frame in seconds to roll back if the upgrade does not complete within that time frame after the first reboot. If not defined or set to
0
, the default value of1800
seconds (30 minutes) is used.
You can manually roll back the changes if you encounter unresolvable issues after an upgrade.
Prerequisites
-
Log in to the hub cluster as a user with
cluster-admin
privileges. - Ensure that the control plane certificates on the original stateroot are valid. If the certificates expired, see "Recovering from expired control plane certificates".
Procedure
To move to the rollback stage, patch the value of the
stage
field toRollback
in theImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Rollback"}}' --type=merge
The Lifecycle Agent reboots the cluster with the previously installed version of OpenShift Container Platform and restores the applications.
If you are satisfied with the changes, finalize the rollback by patching the value of the
stage
field toIdle
in theImageBasedUpgrade
CR by running the following command:$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge -n openshift-lifecycle-agent
WarningIf you move to the
Idle
stage after a rollback, the Lifecycle Agent cleans up resources that can be used to troubleshoot a failed upgrade.
Additional resources
15.3.4. Troubleshooting image-based upgrades with Lifecycle Agent
You can encounter issues during the image-based upgrade.
15.3.4.1. Collecting logs
You can use the oc adm must-gather
CLI to collect information for debugging and troubleshooting.
Procedure
Collect data about the Operators by running the following command:
$ oc adm must-gather \ --dest-dir=must-gather/tmp \ --image=$(oc -n openshift-lifecycle-agent get deployment.apps/lifecycle-agent-controller-manager -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \ --image=quay.io/konveyor/oadp-must-gather:latest \1 --image=quay.io/openshift/origin-must-gather:latest 2
15.3.4.2. AbortFailed or FinalizeFailed error
- Issue
During the finalize stage or when you stop the process at the
Prep
stage, Lifecycle Agent cleans up the following resources:- Stateroot that is no longer required
- Precaching resources
- OADP CRs
-
ImageBasedUpgrade
CR
If the Lifecycle Agent fails to perform the above steps, it transitions to the
AbortFailed
orFinalizeFailed
states. The condition message and log show which steps failed.Example error message
message: failed to delete all the backup CRs. Perform cleanup manually then add 'lca.openshift.io/manual-cleanup-done' annotation to ibu CR to transition back to Idle observedGeneration: 5 reason: AbortFailed status: "False" type: Idle
- Resolution
- Inspect the logs to determine why the failure occurred.
To prompt Lifecycle Agent to retry the cleanup, add the
lca.openshift.io/manual-cleanup-done
annotation to theImageBasedUpgrade
CR.After observing this annotation, Lifecycle Agent retries the cleanup and, if it is successful, the
ImageBasedUpgrade
stage transitions toIdle
.If the cleanup fails again, you can manually clean up the resources.
15.3.4.2.1. Cleaning up stateroot manually
- Issue
-
Stopping at the
Prep
stage, Lifecycle Agent cleans up the new stateroot. When finalizing after a successful upgrade or a rollback, Lifecycle Agent cleans up the old stateroot. If this step fails, it is recommended that you inspect the logs to determine why the failure occurred. - Resolution
Check if there are any existing deployments in the stateroot by running the following command:
$ ostree admin status
If there are any, clean up the existing deployment by running the following command:
$ ostree admin undeploy <index_of_deployment>
After cleaning up all the deployments of the stateroot, wipe the stateroot directory by running the following commands:
WarningEnsure that the booted deployment is not in this stateroot.
$ stateroot="<stateroot_to_delete>"
$ unshare -m /bin/sh -c "mount -o remount,rw /sysroot && rm -rf /sysroot/ostree/deploy/${stateroot}"
15.3.4.2.2. Cleaning up OADP resources manually
- Issue
-
Automatic cleanup of OADP resources can fail due to connection issues between Lifecycle Agent and the S3 backend. By restoring the connection and adding the
lca.openshift.io/manual-cleanup-done
annotation, the Lifecycle Agent can successfully cleanup backup resources. - Resolution
Check the backend connectivity by running the following command:
$ oc get backupstoragelocations.velero.io -n openshift-adp
Example output
NAME PHASE LAST VALIDATED AGE DEFAULT dataprotectionapplication-1 Available 33s 8d true
-
Remove all backup resources and then add the
lca.openshift.io/manual-cleanup-done
annotation to theImageBasedUpgrade
CR.
15.3.4.3. LVM Storage volume contents not restored
When LVM Storage is used to provide dynamic persistent volume storage, LVM Storage might not restore the persistent volume contents if it is configured incorrectly.
15.3.4.3.1. Missing LVM Storage-related fields in Backup CR
- Issue
Your
Backup
CRs might be missing fields that are needed to restore your persistent volumes. You can check for events in your application pod to determine if you have this issue by running the following:$ oc describe pod <your_app_name>
Example output showing missing LVM Storage-related fields in Backup CR
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 58s (x2 over 66s) default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. Normal Scheduled 56s default-scheduler Successfully assigned default/db-1234 to sno1.example.lab Warning FailedMount 24s (x7 over 55s) kubelet MountVolume.SetUp failed for volume "pvc-1234" : rpc error: code = Unknown desc = VolumeID is not found
- Resolution
You must include
logicalvolumes.topolvm.io
in the applicationBackup
CR. Without this resource, the application restores its persistent volume claims and persistent volume manifests correctly, however, thelogicalvolume
associated with this persistent volume is not restored properly after pivot.Example Backup CR
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: small-app namespace: openshift-adp spec: includedNamespaces: - test includedNamespaceScopedResources: - secrets - persistentvolumeclaims - deployments - statefulsets includedClusterScopedResources: 1 - persistentVolumes - volumesnapshotcontents - logicalvolumes.topolvm.io
- 1
- To restore the persistent volumes for your application, you must configure this section as shown.
15.3.4.3.2. Missing LVM Storage-related fields in Restore CR
- Issue
The expected resources for the applications are restored but the persistent volume contents are not preserved after upgrading.
List the persistent volumes for you applications by running the following command before pivot:
$ oc get pv,pvc,logicalvolumes.topolvm.io -A
Example output before pivot
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-1234 1Gi RWO Retain Bound default/pvc-db lvms-vg1 4h45m NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE default persistentvolumeclaim/pvc-db Bound pvc-1234 1Gi RWO lvms-vg1 4h45m NAMESPACE NAME AGE logicalvolume.topolvm.io/pvc-1234 4h45m
List the persistent volumes for you applications by running the following command after pivot:
$ oc get pv,pvc,logicalvolumes.topolvm.io -A
Example output after pivot
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-1234 1Gi RWO Delete Bound default/pvc-db lvms-vg1 19s NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE default persistentvolumeclaim/pvc-db Bound pvc-1234 1Gi RWO lvms-vg1 19s NAMESPACE NAME AGE logicalvolume.topolvm.io/pvc-1234 18s
- Resolution
The reason for this issue is that the
logicalvolume
status is not preserved in theRestore
CR. This status is important because it is required for Velero to reference the volumes that must be preserved after pivoting. You must include the following fields in the applicationRestore
CR:Example Restore CR
apiVersion: velero.io/v1 kind: Restore metadata: name: sample-vote-app namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "3" spec: backupName: sample-vote-app restorePVs: true 1 restoreStatus: 2 includedResources: - logicalvolumes
15.3.4.4. Debugging failed Backup and Restore CRs
- Issue
- The backup or restoration of artifacts failed.
- Resolution
You can debug
Backup
andRestore
CRs and retrieve logs with the Velero CLI tool. The Velero CLI tool provides more detailed information than the OpenShift CLI tool.Describe the
Backup
CR that contains errors by running the following command:$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe backup -n openshift-adp backup-acm-klusterlet --details
Describe the
Restore
CR that contains errors by running the following command:$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe restore -n openshift-adp restore-acm-klusterlet --details
Download the backed up resources to a local directory by running the following command:
$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero backup download -n openshift-adp backup-acm-klusterlet -o ~/backup-acm-klusterlet.tar.gz
15.4. Performing an image-based upgrade for single-node OpenShift clusters using GitOps ZTP
You can upgrade your managed single-node OpenShift cluster with the image-based upgrade through GitOps Zero Touch Provisioning (ZTP).
When you deploy the Lifecycle Agent on a cluster, an ImageBasedUpgrade
CR is automatically created. You update this CR to specify the image repository of the seed image and to move through the different stages.
15.4.1. Moving to the Prep stage of the image-based upgrade with Lifecycle Agent and GitOps ZTP
When you deploy the Lifecycle Agent on a cluster, an ImageBasedUpgrade
CR is automatically created. You update this CR to specify the image repository of the seed image and to move through the different stages.
ImageBasedUpgrade CR in the ztp-site-generate container
apiVersion: lca.openshift.io/v1 kind: ImageBasedUpgrade metadata: name: upgrade spec: stage: Idle
Prerequisites
-
Create policies and
ConfigMap
objects for resources used in the image-based upgrade. For more information, see "Creating ConfigMap objects for the image-based upgrade with GitOps ZTP.
Procedure
Add policies for the
Prep
,Upgrade
, andIdle
stages to your existing groupPolicyGenTemplate
calledibu-upgrade-ranGen.yaml
:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: example-group-ibu namespace: "ztp-group" spec: bindingRules: group-du-sno: "" mcp: "master" evaluationInterval: 1 compliant: 10s noncompliant: 10s sourceFiles: - fileName: ConfigMapGeneric.yaml complianceType: mustonlyhave policyName: "oadp-cm-policy" metadata: name: oadp-cm namespace: openshift-adp - fileName: ibu/ImageBasedUpgrade.yaml policyName: "prep-stage-policy" spec: stage: Prep seedImageRef: 2 version: "4.15.0" image: "quay.io/user/lca-seed:4.15.0" pullSecretRef: name: "<seed_pull_secret>" oadpContent: 3 - name: "oadp-cm" namespace: "openshift-adp" status: conditions: - reason: Completed status: "True" type: PrepCompleted message: "Prep stage completed successfully" - fileName: ibu/ImageBasedUpgrade.yaml policyName: "upgrade-stage-policy" spec: stage: Upgrade status: conditions: - reason: Completed status: "True" type: UpgradeCompleted - fileName: ibu/ImageBasedUpgrade.yaml policyName: "finalize-stage-policy" complianceType: mustonlyhave spec: stage: Idle - fileName: ibu/ImageBasedUpgrade.yaml policyName: "finalize-stage-policy" status: conditions: - reason: Idle status: "True" type: Idle
- 1
- The policy evaluation interval for compliant and non-compliant policies. Set them to
10s
to ensure that the policies status accurately reflects the current upgrade status. - 2
- Define the seed image, OpenShift Container Platform version, and pull secret for the upgrade in the Prep stage.
- 3
- Define the OADP
ConfigMap
resources required for backup and restore.
Verify that the policies required for an image-based upgrade are created by running the following command:
$ oc get policies -n spoke1 | grep -E "example-group-ibu"
Example output
ztp-group.example-group-ibu-oadp-cm-policy inform NonCompliant 31h ztp-group.example-group-ibu-prep-stage-policy inform NonCompliant 31h ztp-group.example-group-ibu-upgrade-stage-policy inform NonCompliant 31h ztp-group.example-group-ibu-finalize-stage-policy inform NonCompliant 31h ztp-group.example-group-ibu-rollback-stage-policy inform NonCompliant 31h
Update the
du-profile
cluster label to the target platform version or the corresponding policy-binding label in theSiteConfig
CR.apiVersion: ran.openshift.io/v1 kind: SiteConfig [...] spec: [...] clusterLabels: du-profile: "4.15.0"
ImportantUpdating the labels to the target platform version unbinds the existing set of policies.
-
Commit and push the updated
SiteConfig
CR to the Git repository. When you are ready to move to the
Prep
stage, create theClusterGroupUpgrade
CR on the target hub cluster with thePrep
and OADPConfigMap
policies:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-ibu-prep namespace: default spec: clusters: - spoke1 enable: true managedPolicies: - example-group-ibu-oadp-cm-policy - example-group-ibu-prep-stage-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 1 timeout: 240
Apply the
Prep
policy by running the following command:$ oc apply -f cgu-ibu-prep.yml
If you provide
ConfigMap
objects for OADP resources and extra manifests, Lifecycle Agent validates the specifiedConfigMap
objects during thePrep
stage. You might encounter the following issues:-
Validation warnings or errors if the Lifecycle Agent detects any issues with
extraManifests
-
Validation errors if the Lifecycle Agent detects any issues with
oadpContent
Validation warnings do not block the
Upgrade
stage but you must decide if it is safe to proceed with the upgrade. These warnings, for example missing CRDs, namespaces or dry run failures, update thestatus.conditions
in thePrep
stage andannotation
fields in theImageBasedUpgrade
CR with details about the warning.Example validation warning
[...] metadata: annotations: extra-manifest.lca.openshift.io/validation-warning: '...' [...]
However, validation errors, such as adding
MachineConfig
or Operator manifests to extra manifests, cause thePrep
stage to fail and block theUpgrade
stage.When the validations pass, the cluster creates a new
ostree
stateroot, which involves pulling and unpacking the seed image, and running host level commands. Finally, all the required images are precached on the target cluster.-
Validation warnings or errors if the Lifecycle Agent detects any issues with
Monitor the status and wait for the
cgu-ibu-prep
ClusterGroupUpgrade
to reportCompleted
by running the following command:$ oc get cgu -n default
Example output
NAME AGE STATE DETAILS cgu-ibu-prep 31h Completed All clusters are compliant with all the managed policies
Additional resources
- Preparing the GitOps ZTP site configuration repository for version independence
- Creating ConfigMap objects for the image-based upgrade with Lifecycle Agent using GitOps ZTP
- Configuring a shared container partition between ostree stateroots when using GitOps ZTP
- About backup and snapshot locations and their secrets
- Creating a Backup CR
- Creating a Restore CR
15.4.2. Moving to the Upgrade stage of the image-based upgrade with Lifecycle Agent and GitOps ZTP
After you completed the Prep
stage, you can upgrade the target cluster. During the upgrade process, the OADP Operator creates a backup of the artifacts specified in the OADP CRs, then the Lifecycle Agent upgrades the cluster.
If the upgrade fails or stops, an automatic rollback is initiated. If you have an issue after the upgrade, you can initiate a manual rollback. For more information about manual rollback, see "(Optional) Initiating a rollback with Lifecycle Agent and GitOps ZTP".
Prerequisites
-
Complete the
Prep
stage.
Procedure
When you are ready to move to the
Upgrade
stage, create theClusterGroupUpgrade
CR on the target hub cluster that references theUpgrade
policy:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-ibu-upgrade namespace: default spec: actions: beforeEnable: addClusterAnnotations: import.open-cluster-management.io/disable-auto-import: "true" 1 afterCompletion: removeClusterAnnotations: - import.open-cluster-management.io/disable-auto-import 2 clusters: - spoke1 enable: true managedPolicies: - example-group-ibu-upgrade-stage-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 1 timeout: 240
- 1
- Applies the
disable-auto-import
annotation to the managed cluster before starting the upgrade. This annotation ensures the automatic importing of managed cluster is disabled during the upgrade stage until the cluster is ready. - 2
- Removes the
disable-auto-import
annotation after the upgrade is complete.
Apply the
Upgrade
policy by running the following command:$ oc apply -f cgu-ibu-upgrade.yml
Monitor the status by running the following command and wait for the
cgu-ibu-upgrade
ClusterGroupUpgrade
to reportCompleted
:$ oc get cgu -n default
Example output
NAME AGE STATE DETAILS cgu-ibu-prep 31h Completed All clusters are compliant with all the managed policies cgu-ibu-upgrade 31h Completed All clusters are compliant with all the managed policies
When you are satisfied with the changes and ready to finalize the upgrade, create a
ClusterGroupUpgrade
CR on target hub cluster that references the policy that finalizes the upgrade:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-ibu-finalize namespace: default spec: actions: beforeEnable: removeClusterAnnotations: - import.open-cluster-management.io/disable-auto-import clusters: - spoke1 enable: true managedPolicies: - example-group-ibu-finalize-stage-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 1 timeout: 240
ImportantEnsure that no other
ClusterGroupUpgrade
CRs are in progress because this causes TALM to continuously reconcile them. Delete all"In-Progress"
ClusterGroupUpgrade
CRs before applying thecgu-ibu-finalize.yaml
.Apply the policy by running the following command:
$ oc apply -f cgu-ibu-finalize.yaml
Monitor the status and wait for the
cgu-ibu-finalize
ClusterGroupUpgrade
to reportCompleted
by running the following command:$ oc get cgu -n default
Example output
NAME AGE STATE DETAILS cgu-ibu-finalize 30h Completed All clusters are compliant with all the managed policies cgu-ibu-prep 31h Completed All clusters are compliant with all the managed policies cgu-ibu-upgrade 31h Completed All clusters are compliant with all the managed policies
You can remove the OADP Operator and its configuration files after a successful upgrade.
Change the
complianceType
tomustnothave
for the OADP Operator namespace, Operator group, and subscription in thecommon-ranGen.yaml
file.[...] - fileName: OadpSubscriptionNS.yaml policyName: "subscriptions-policy" complianceType: mustnothave - fileName: OadpSubscriptionOperGroup.yaml policyName: "subscriptions-policy" complianceType: mustnothave - fileName: OadpSubscription.yaml policyName: "subscriptions-policy" complianceType: mustnothave - fileName: OadpOperatorStatus.yaml policyName: "subscriptions-policy" complianceType: mustnothave [...]
Change the
complianceType
tomustnothave
for the OADP Operator namespace, Operator group, and subscription in the sitePolicyGenTemplate
file.- fileName: OadpSecret.yaml policyName: "config-policy" complianceType: mustnothave - fileName: OadpBackupStorageLocationStatus.yaml policyName: "config-policy" complianceType: mustnothave - fileName: DataProtectionApplication.yaml policyName: "config-policy" complianceType: mustnothave
-
Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The status of the
common-subscriptions-policy
and theexample-cnf-config-policy
policies change toNon-Compliant
. - Apply the change to your target clusters by using the Topology Aware Lifecycle Manager. For more information about rolling out configuration changes, see "Update policies on managed clusters".
Monitor the process. When the status of the
common-subscriptions-policy
and theexample-cnf-config-policy
policies for a target cluster areCompliant
, the OADP Operator has been removed from the cluster. Get the status of the policies by running the following commands:$ oc get policy -n ztp-common common-subscriptions-policy
$ oc get policy -n ztp-site example-cnf-config-policy
-
Delete the OADP Operator namespace, Operator group and subscription, and configuration CRs from
spec.sourceFiles
in thecommon-ranGen.yaml
and the sitePolicyGenTemplate
files. - Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The policy remains compliant.
15.4.3. Moving to the Rollback stage of the image-based upgrade with Lifecycle Agent and GitOps ZTP
If you encounter an issue after upgrade, you can start a manual rollback.
Prerequisites
- Ensure that the control plane certificates on the original stateroot are valid. If the certificates expired, see "Recovering from expired control plane certificates".
Procedure
Revert the
du-profile
or the corresponding policy-binding label to the original platform version in theSiteConfig
CR:apiVersion: ran.openshift.io/v1 kind: SiteConfig [...] spec: [...] clusterLabels: du-profile: "4.14.x"
When you are ready to initiate the rollback, add the
Rollback
policy to your existing groupPolicyGenTemplate
CR:[...] - fileName: ibu/ImageBasedUpgrade.yaml policyName: "rollback-stage-policy" spec: stage: Rollback status: conditions: - message: Rollback completed reason: Completed status: "True" type: RollbackCompleted
Create a
ClusterGroupUpgrade
CR on target hub cluster that references theRollback
policy:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-ibu-rollback namespace: default spec: actions: beforeEnable: removeClusterAnnotations: - import.open-cluster-management.io/disable-auto-import clusters: - spoke1 enable: true managedPolicies: - example-group-ibu-rollback-stage-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 1 timeout: 240
Apply the
Rollback
policy by running the following command:$ oc apply -f cgu-ibu-rollback.yml
When you are satisfied with the changes and you are ready to finalize the rollback, create a
ClusterGroupUpgrade
CR on target hub cluster that references the policy that finalizes the rollback:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-ibu-finalize namespace: default spec: actions: beforeEnable: removeClusterAnnotations: - import.open-cluster-management.io/disable-auto-import clusters: - spoke1 enable: true managedPolicies: - example-group-ibu-finalize-stage-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 1 timeout: 240
Apply the policy by running the following command:
$ oc apply -f cgu-ibu-finalize.yml
Additional resources
15.4.4. Troubleshooting image-based upgrades with Lifecycle Agent
You can encounter issues during the image-based upgrade.
15.4.4.1. Collecting logs
You can use the oc adm must-gather
CLI to collect information for debugging and troubleshooting.
Procedure
Collect data about the Operators by running the following command:
$ oc adm must-gather \ --dest-dir=must-gather/tmp \ --image=$(oc -n openshift-lifecycle-agent get deployment.apps/lifecycle-agent-controller-manager -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \ --image=quay.io/konveyor/oadp-must-gather:latest \1 --image=quay.io/openshift/origin-must-gather:latest 2
15.4.4.2. AbortFailed or FinalizeFailed error
- Issue
During the finalize stage or when you stop the process at the
Prep
stage, Lifecycle Agent cleans up the following resources:- Stateroot that is no longer required
- Precaching resources
- OADP CRs
-
ImageBasedUpgrade
CR
If the Lifecycle Agent fails to perform the above steps, it transitions to the
AbortFailed
orFinalizeFailed
states. The condition message and log show which steps failed.Example error message
message: failed to delete all the backup CRs. Perform cleanup manually then add 'lca.openshift.io/manual-cleanup-done' annotation to ibu CR to transition back to Idle observedGeneration: 5 reason: AbortFailed status: "False" type: Idle
- Resolution
- Inspect the logs to determine why the failure occurred.
To prompt Lifecycle Agent to retry the cleanup, add the
lca.openshift.io/manual-cleanup-done
annotation to theImageBasedUpgrade
CR.After observing this annotation, Lifecycle Agent retries the cleanup and, if it is successful, the
ImageBasedUpgrade
stage transitions toIdle
.If the cleanup fails again, you can manually clean up the resources.
15.4.4.2.1. Cleaning up stateroot manually
- Issue
-
Stopping at the
Prep
stage, Lifecycle Agent cleans up the new stateroot. When finalizing after a successful upgrade or a rollback, Lifecycle Agent cleans up the old stateroot. If this step fails, it is recommended that you inspect the logs to determine why the failure occurred. - Resolution
Check if there are any existing deployments in the stateroot by running the following command:
$ ostree admin status
If there are any, clean up the existing deployment by running the following command:
$ ostree admin undeploy <index_of_deployment>
After cleaning up all the deployments of the stateroot, wipe the stateroot directory by running the following commands:
WarningEnsure that the booted deployment is not in this stateroot.
$ stateroot="<stateroot_to_delete>"
$ unshare -m /bin/sh -c "mount -o remount,rw /sysroot && rm -rf /sysroot/ostree/deploy/${stateroot}"
15.4.4.2.2. Cleaning up OADP resources manually
- Issue
-
Automatic cleanup of OADP resources can fail due to connection issues between Lifecycle Agent and the S3 backend. By restoring the connection and adding the
lca.openshift.io/manual-cleanup-done
annotation, the Lifecycle Agent can successfully cleanup backup resources. - Resolution
Check the backend connectivity by running the following command:
$ oc get backupstoragelocations.velero.io -n openshift-adp
Example output
NAME PHASE LAST VALIDATED AGE DEFAULT dataprotectionapplication-1 Available 33s 8d true
-
Remove all backup resources and then add the
lca.openshift.io/manual-cleanup-done
annotation to theImageBasedUpgrade
CR.
15.4.4.3. LVM Storage volume contents not restored
When LVM Storage is used to provide dynamic persistent volume storage, LVM Storage might not restore the persistent volume contents if it is configured incorrectly.
15.4.4.3.1. Missing LVM Storage-related fields in Backup CR
- Issue
Your
Backup
CRs might be missing fields that are needed to restore your persistent volumes. You can check for events in your application pod to determine if you have this issue by running the following:$ oc describe pod <your_app_name>
Example output showing missing LVM Storage-related fields in Backup CR
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 58s (x2 over 66s) default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. Normal Scheduled 56s default-scheduler Successfully assigned default/db-1234 to sno1.example.lab Warning FailedMount 24s (x7 over 55s) kubelet MountVolume.SetUp failed for volume "pvc-1234" : rpc error: code = Unknown desc = VolumeID is not found
- Resolution
You must include
logicalvolumes.topolvm.io
in the applicationBackup
CR. Without this resource, the application restores its persistent volume claims and persistent volume manifests correctly, however, thelogicalvolume
associated with this persistent volume is not restored properly after pivot.Example Backup CR
apiVersion: velero.io/v1 kind: Backup metadata: labels: velero.io/storage-location: default name: small-app namespace: openshift-adp spec: includedNamespaces: - test includedNamespaceScopedResources: - secrets - persistentvolumeclaims - deployments - statefulsets includedClusterScopedResources: 1 - persistentVolumes - volumesnapshotcontents - logicalvolumes.topolvm.io
- 1
- To restore the persistent volumes for your application, you must configure this section as shown.
15.4.4.3.2. Missing LVM Storage-related fields in Restore CR
- Issue
The expected resources for the applications are restored but the persistent volume contents are not preserved after upgrading.
List the persistent volumes for you applications by running the following command before pivot:
$ oc get pv,pvc,logicalvolumes.topolvm.io -A
Example output before pivot
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-1234 1Gi RWO Retain Bound default/pvc-db lvms-vg1 4h45m NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE default persistentvolumeclaim/pvc-db Bound pvc-1234 1Gi RWO lvms-vg1 4h45m NAMESPACE NAME AGE logicalvolume.topolvm.io/pvc-1234 4h45m
List the persistent volumes for you applications by running the following command after pivot:
$ oc get pv,pvc,logicalvolumes.topolvm.io -A
Example output after pivot
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-1234 1Gi RWO Delete Bound default/pvc-db lvms-vg1 19s NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE default persistentvolumeclaim/pvc-db Bound pvc-1234 1Gi RWO lvms-vg1 19s NAMESPACE NAME AGE logicalvolume.topolvm.io/pvc-1234 18s
- Resolution
The reason for this issue is that the
logicalvolume
status is not preserved in theRestore
CR. This status is important because it is required for Velero to reference the volumes that must be preserved after pivoting. You must include the following fields in the applicationRestore
CR:Example Restore CR
apiVersion: velero.io/v1 kind: Restore metadata: name: sample-vote-app namespace: openshift-adp labels: velero.io/storage-location: default annotations: lca.openshift.io/apply-wave: "3" spec: backupName: sample-vote-app restorePVs: true 1 restoreStatus: 2 includedResources: - logicalvolumes
15.4.4.4. Debugging failed Backup and Restore CRs
- Issue
- The backup or restoration of artifacts failed.
- Resolution
You can debug
Backup
andRestore
CRs and retrieve logs with the Velero CLI tool. The Velero CLI tool provides more detailed information than the OpenShift CLI tool.Describe the
Backup
CR that contains errors by running the following command:$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe backup -n openshift-adp backup-acm-klusterlet --details
Describe the
Restore
CR that contains errors by running the following command:$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe restore -n openshift-adp restore-acm-klusterlet --details
Download the backed up resources to a local directory by running the following command:
$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero backup download -n openshift-adp backup-acm-klusterlet -o ~/backup-acm-klusterlet.tar.gz