Questo contenuto non è disponibile nella lingua selezionata.

Backup and restore


Red Hat OpenShift Service on AWS 4

Backing up and restoring your Red Hat OpenShift Service on AWS cluster

Red Hat OpenShift Documentation Team

Abstract

This document provides instructions for backing up your cluster's data and for recovering from various disaster scenarios.

Chapter 1. OADP Application backup and restore

1.1. Introduction to {oadp-full}

The OpenShift API for Data Protection (OADP) product safeguards customer applications on Red Hat OpenShift Service on AWS. It offers comprehensive disaster recovery protection, covering Red Hat OpenShift Service on AWS applications, application-related cluster resources, persistent volumes, and internal images. OADP is also capable of backing up both containerized applications and virtual machines (VMs).

OADP support is provided to customer workload namespaces, and cluster scope resources.

Full cluster backup and restore are not supported.

1.1.1. OpenShift API for Data Protection APIs

OpenShift API for Data Protection (OADP) provides APIs that enable multiple approaches to customizing backups and preventing the inclusion of unnecessary or inappropriate resources.

OADP provides the following APIs:

1.1.1.1. Support for OpenShift API for Data Protection
Table 1.1. Supported versions of OADP

Version

OCP version

General availability

Full support ends

Maintenance ends

Extended Update Support (EUS)

Extended Update Support Term 2 (EUS Term 2)

1.4

  • 4.14
  • 4.15
  • 4.16
  • 4.17

10 Jul 2024

Release of 1.5

Release of 1.6

27 Jun 2026

EUS must be on OCP 4.16

27 Jun 2027

EUS Term 2 must be on OCP 4.16

1.3

  • 4.12
  • 4.13
  • 4.14
  • 4.15

29 Nov 2023

10 Jul 2024

Release of 1.5

31 Oct 2025

EUS must be on OCP 4.14

31 Oct 2026

EUS Term 2 must be on OCP 4.14

1.1.1.1.1. Unsupported versions of the OADP Operator
Table 1.2. Previous versions of the OADP Operator which are no longer supported

Version

General availability

Full support ended

Maintenance ended

1.2

14 Jun 2023

29 Nov 2023

10 Jul 2024

1.1

01 Sep 2022

14 Jun 2023

29 Nov 2023

1.0

09 Feb 2022

01 Sep 2022

14 Jun 2023

For more details about EUS, see Extended Update Support.

For more details about EUS Term 2, see Extended Update Support Term 2.

1.2. OADP release notes

1.2.1. OADP 1.4 release notes

The release notes for OpenShift API for Data Protection (OADP) describe new features and enhancements, deprecated features, product recommendations, known issues, and resolved issues.

Note

For additional information about OADP, see OpenShift API for Data Protection (OADP) FAQs

1.2.1.1. OADP 1.4.3 release notes

The OpenShift API for Data Protection (OADP) 1.4.3 release notes lists the following new feature.

1.2.1.1.1. New features

Notable changes in the kubevirt velero plugin in version 0.7.1

With this release, the kubevirt velero plugin has been updated to version 0.7.1. Notable improvements include the following bug fix and new features:

  • Virtual machine instances (VMIs) are no longer ignored from backup when the owner VM is excluded.
  • Object graphs now include all extra objects during backup and restore operations.
  • Optionally generated labels are now added to new firmware Universally Unique Identifiers (UUIDs) during restore operations.
  • Switching VM run strategies during restore operations is now possible.
  • Clearing a MAC address by label is now supported.
  • The restore-specific checks during the backup operation are now skipped.
  • The VirtualMachineClusterInstancetype and VirtualMachineClusterPreference custom resource definitions (CRDs) are now supported.
1.2.1.2. OADP 1.4.2 release notes

The OpenShift API for Data Protection (OADP) 1.4.2 release notes lists new features, resolved issues and bugs, and known issues.

1.2.1.2.1. New features

Backing up different volumes in the same namespace by using the VolumePolicy feature is now possible

With this release, Velero provides resource policies to back up different volumes in the same namespace by using the VolumePolicy feature. The supported VolumePolicy feature to back up different volumes includes skip, snapshot, and fs-backup actions. OADP-1071

File system backup and data mover can now use short-term credentials

File system backup and data mover can now use short-term credentials such as AWS Security Token Service (STS) and GCP WIF. With this support, backup is successfully completed without any PartiallyFailed status. OADP-5095

1.2.1.2.2. Resolved issues

DPA now reports errors if VSL contains an incorrect provider value

Previously, if the provider of a Volume Snapshot Location (VSL) spec was incorrect, the Data Protection Application (DPA) reconciled successfully. With this update, DPA reports errors and requests for a valid provider value. OADP-5044

Data Mover restore is successful irrespective of using different OADP namespaces for backup and restore

Previously, when backup operation was executed by using OADP installed in one namespace but was restored by using OADP installed in a different namespace, the Data Mover restore failed. With this update, Data Mover restore is now successful. OADP-5460

SSE-C backup works with the calculated MD5 of the secret key

Previously, backup failed with the following error:

Requests specifying Server Side Encryption with Customer provided keys must provide the client calculated MD5 of the secret key.

With this update, missing Server-Side Encryption with Customer-Provided Keys (SSE-C) base64 and MD5 hash are now fixed. As a result, SSE-C backup works with the calculated MD5 of the secret key. In addition, incorrect errorhandling for the customerKey size is also fixed. OADP-5388

For a complete list of all issues resolved in this release, see the list of OADP 1.4.2 resolved issues in Jira.

1.2.1.2.3. Known issues

The nodeSelector spec is not supported for the Data Mover restore action

When a Data Protection Application (DPA) is created with the nodeSelector field set in the nodeAgent parameter, Data Mover restore partially fails instead of completing the restore operation. OADP-5260

The S3 storage does not use proxy environment when TLS skip verify is specified

In the image registry backup, the S3 storage does not use the proxy environment when the insecureSkipTLSVerify parameter is set to true. OADP-3143

Kopia does not delete artifacts after backup expiration

Even after you delete a backup, Kopia does not delete the volume artifacts from the ${bucket_name}/kopia/$openshift-adp on the S3 location after backup expired. For more information, see "About Kopia repository maintenance". OADP-5131

1.2.1.3. OADP 1.4.1 release notes

The OpenShift API for Data Protection (OADP) 1.4.1 release notes lists new features, resolved issues and bugs, and known issues.

1.2.1.3.1. New features

New DPA fields to update client qps and burst

You can now change Velero Server Kubernetes API queries per second and burst values by using the new Data Protection Application (DPA) fields. The new DPA fields are spec.configuration.velero.client-qps and spec.configuration.velero.client-burst, which both default to 100. OADP-4076

Enabling non-default algorithms with Kopia

With this update, you can now configure the hash, encryption, and splitter algorithms in Kopia to select non-default options to optimize performance for different backup workloads.

To configure these algorithms, set the env variable of a velero pod in the podConfig section of the DataProtectionApplication (DPA) configuration. If this variable is not set, or an unsupported algorithm is chosen, Kopia will default to its standard algorithms. OADP-4640

1.2.1.3.2. Resolved issues

Restoring a backup without pods is now successful

Previously, restoring a backup without pods and having StorageClass VolumeBindingMode set as WaitForFirstConsumer, resulted in the PartiallyFailed status with an error: fail to patch dynamic PV, err: context deadline exceeded. With this update, patching dynamic PV is skipped and restoring a backup is successful without any PartiallyFailed status. OADP-4231

PodVolumeBackup CR now displays correct message

Previously, the PodVolumeBackup custom resource (CR) generated an incorrect message, which was: get a podvolumebackup with status "InProgress" during the server starting, mark it as "Failed". With this update, the message produced is now:

found a podvolumebackup with status "InProgress" during the server starting,
mark it as "Failed".

OADP-4224

Overriding imagePullPolicy is now possible with DPA

Previously, OADP set the imagePullPolicy parameter to Always for all images. With this update, OADP checks if each image contains sha256 or sha512 digest, then it sets imagePullPolicy to IfNotPresent; otherwise imagePullPolicy is set to Always. You can now override this policy by using the new spec.containerImagePullPolicy DPA field. OADP-4172

OADP Velero can now retry updating the restore status if initial update fails

Previously, OADP Velero failed to update the restored CR status. This left the status at InProgress indefinitely. Components which relied on the backup and restore CR status to determine the completion would fail. With this update, the restore CR status for a restore correctly proceeds to the Completed or Failed status. OADP-3227

Restoring BuildConfig Build from a different cluster is successful without any errors

Previously, when performing a restore of the BuildConfig Build resource from a different cluster, the application generated an error on TLS verification to the internal image registry. The resulting error was failed to verify certificate: x509: certificate signed by unknown authority error. With this update, the restore of the BuildConfig build resources to a different cluster can proceed successfully without generating the failed to verify certificate error. OADP-4692

Restoring an empty PVC is successful

Previously, downloading data failed while restoring an empty persistent volume claim (PVC). It failed with the following error:

data path restore failed: Failed to run kopia restore: Unable to load
    snapshot : snapshot not found

With this update, the downloading of data proceeds to correct conclusion when restoring an empty PVC and the error message is not generated. OADP-3106

There is no Velero memory leak in CSI and DataMover plugins

Previously, a Velero memory leak was caused by using the CSI and DataMover plugins. When the backup ended, the Velero plugin instance was not deleted and the memory leak consumed memory until an Out of Memory (OOM) condition was generated in the Velero pod. With this update, there is no resulting Velero memory leak when using the CSI and DataMover plugins. OADP-4448

Post-hook operation does not start before the related PVs are released

Previously, due to the asynchronous nature of the Data Mover operation, a post-hook might be attempted before the Data Mover persistent volume claim (PVC) releases the persistent volumes (PVs) of the related pods. This problem would cause the backup to fail with a PartiallyFailed status. With this update, the post-hook operation is not started until the related PVs are released by the Data Mover PVC, eliminating the PartiallyFailed backup status. OADP-3140

Deploying a DPA works as expected in namespaces with more than 37 characters

When you install the OADP Operator in a namespace with more than 37 characters to create a new DPA, labeling the "cloud-credentials" Secret fails and the DPA reports the following error:

The generated label name is too long.

With this update, creating a DPA does not fail in namespaces with more than 37 characters in the name. OADP-3960

Restore is successfully completed by overriding the timeout error

Previously, in a large scale environment, the restore operation would result in a Partiallyfailed status with the error: fail to patch dynamic PV, err: context deadline exceeded. With this update, the resourceTimeout Velero server argument is used to override this timeout error resulting in a successful restore. OADP-4344

For a complete list of all issues resolved in this release, see the list of OADP 1.4.1 resolved issues in Jira.

1.2.1.3.3. Known issues

Cassandra application pods enter into the CrashLoopBackoff status after restoring OADP

After OADP restores, the Cassandra application pods might enter CrashLoopBackoff status. To work around this problem, delete the StatefulSet pods that are returning the error CrashLoopBackoff state after restoring OADP. The StatefulSet controller then recreates these pods and it runs normally. OADP-4407

Deployment referencing ImageStream is not restored properly leading to corrupted pod and volume contents

During a File System Backup (FSB) restore operation, a Deployment resource referencing an ImageStream is not restored properly. The restored pod that runs the FSB, and the postHook is terminated prematurely.

During the restore operation, the OpenShift Container Platform controller updates the spec.template.spec.containers[0].image field in the Deployment resource with an updated ImageStreamTag hash. The update triggers the rollout of a new pod, terminating the pod on which velero runs the FSB along with the post-hook. For more information about image stream trigger, see Triggering updates on image stream changes.

The workaround for this behavior is a two-step restore process:

  1. Perform a restore excluding the Deployment resources, for example:

    $ velero restore create <RESTORE_NAME> \
      --from-backup <BACKUP_NAME> \
      --exclude-resources=deployment.apps
  2. Once the first restore is successful, perform a second restore by including these resources, for example:

    $ velero restore create <RESTORE_NAME> \
      --from-backup <BACKUP_NAME> \
      --include-resources=deployment.apps

    OADP-3954

1.2.1.4. OADP 1.4.0 release notes

The OpenShift API for Data Protection (OADP) 1.4.0 release notes lists resolved issues and known issues.

1.2.1.4.1. Resolved issues

Restore works correctly in Red Hat OpenShift Service on AWS 4.16

Previously, while restoring the deleted application namespace, the restore operation partially failed with the resource name may not be empty error in Red Hat OpenShift Service on AWS 4.16. With this update, restore works as expected in Red Hat OpenShift Service on AWS 4.16. OADP-4075

Data Mover backups work properly in the Red Hat OpenShift Service on AWS 4.16 cluster

Previously, Velero was using the earlier version of SDK where the Spec.SourceVolumeMode field did not exist. As a consequence, Data Mover backups failed in the Red Hat OpenShift Service on AWS 4.16 cluster on the external snapshotter with version 4.2. With this update, external snapshotter is upgraded to version 7.0 and later. As a result, backups do not fail in the Red Hat OpenShift Service on AWS 4.16 cluster. OADP-3922

For a complete list of all issues resolved in this release, see the list of OADP 1.4.0 resolved issues in Jira.

1.2.1.4.2. Known issues

Backup fails when checksumAlgorithm is not set for MCG

While performing a backup of any application with Noobaa as the backup location, if the checksumAlgorithm configuration parameter is not set, backup fails. To fix this problem, if you do not provide a value for checksumAlgorithm in the Backup Storage Location (BSL) configuration, an empty value is added. The empty value is only added for BSLs that are created using Data Protection Application (DPA) custom resource (CR), and this value is not added if BSLs are created using any other method. OADP-4274

For a complete list of all known issues in this release, see the list of OADP 1.4.0 known issues in Jira.

1.2.1.4.3. Upgrade notes
Note

Always upgrade to the next minor version. Do not skip versions. To update to a later version, upgrade only one channel at a time. For example, to upgrade from OpenShift API for Data Protection (OADP) 1.1 to 1.3, upgrade first to 1.2, and then to 1.3.

1.2.1.4.3.1. Changes from OADP 1.3 to 1.4

The Velero server has been updated from version 1.12 to 1.14. Note that there are no changes in the Data Protection Application (DPA).

This changes the following:

  • The velero-plugin-for-csi code is now available in the Velero code, which means an init container is no longer required for the plugin.
  • Velero changed client Burst and QPS defaults from 30 and 20 to 100 and 100, respectively.
  • The velero-plugin-for-aws plugin updated default value of the spec.config.checksumAlgorithm field in BackupStorageLocation objects (BSLs) from "" (no checksum calculation) to the CRC32 algorithm. The checksum algorithm types are known to work only with AWS. Several S3 providers require the md5sum to be disabled by setting the checksum algorithm to "". Confirm md5sum algorithm support and configuration with your storage provider.

    In OADP 1.4, the default value for BSLs created within DPA for this configuration is "". This default value means that the md5sum is not checked, which is consistent with OADP 1.3. For BSLs created within DPA, update it by using the spec.backupLocations[].velero.config.checksumAlgorithm field in the DPA. If your BSLs are created outside DPA, you can update this configuration by using spec.config.checksumAlgorithm in the BSLs.

1.2.1.4.3.2. Backing up the DPA configuration

You must back up your current DataProtectionApplication (DPA) configuration.

Procedure

  • Save your current DPA configuration by running the following command:

    Example command

    $ oc get dpa -n openshift-adp -o yaml > dpa.orig.backup

1.2.1.4.3.3. Upgrading the OADP Operator

Use the following procedure when upgrading the OpenShift API for Data Protection (OADP) Operator.

Procedure

  1. Change your subscription channel for the OADP Operator from stable-1.3 to stable-1.4.
  2. Wait for the Operator and containers to update and restart.

Additional resources

1.2.1.4.4. Converting DPA to the new version

To upgrade from OADP 1.3 to 1.4, no Data Protection Application (DPA) changes are required.

1.2.1.4.5. Verifying the upgrade

Use the following procedure to verify the upgrade.

Procedure

  1. Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:

    $ oc get all -n openshift-adp

    Example output

    NAME                                                     READY   STATUS    RESTARTS   AGE
    pod/oadp-operator-controller-manager-67d9494d47-6l8z8    2/2     Running   0          2m8s
    pod/restic-9cq4q                                         1/1     Running   0          94s
    pod/restic-m4lts                                         1/1     Running   0          94s
    pod/restic-pv4kr                                         1/1     Running   0          95s
    pod/velero-588db7f655-n842v                              1/1     Running   0          95s
    
    NAME                                                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
    service/oadp-operator-controller-manager-metrics-service   ClusterIP   172.30.70.140    <none>        8443/TCP   2m8s
    
    NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    daemonset.apps/restic   3         3         3       3            3           <none>          96s
    
    NAME                                                READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/oadp-operator-controller-manager    1/1     1            1           2m9s
    deployment.apps/velero                              1/1     1            1           96s
    
    NAME                                                           DESIRED   CURRENT   READY   AGE
    replicaset.apps/oadp-operator-controller-manager-67d9494d47    1         1         1       2m9s
    replicaset.apps/velero-588db7f655                              1         1         1       96s

  2. Verify that the DataProtectionApplication (DPA) is reconciled by running the following command:

    $ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'

    Example output

    {"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}

  3. Verify the type is set to Reconciled.
  4. Verify the backup storage location and confirm that the PHASE is Available by running the following command:

    $ oc get backupstoragelocations.velero.io -n openshift-adp

    Example output

    NAME           PHASE       LAST VALIDATED   AGE     DEFAULT
    dpa-sample-1   Available   1s               3d16h   true

1.3. OADP performance

1.4. OADP features and plugins

OpenShift API for Data Protection (OADP) features provide options for backing up and restoring applications.

The default plugins enable Velero to integrate with certain cloud providers and to back up and restore Red Hat OpenShift Service on AWS resources.

1.4.1. OADP features

OpenShift API for Data Protection (OADP) supports the following features:

Backup

You can use OADP to back up all applications on the OpenShift Platform, or you can filter the resources by type, namespace, or label.

OADP backs up Kubernetes objects and internal images by saving them as an archive file on object storage. OADP backs up persistent volumes (PVs) by creating snapshots with the native cloud snapshot API or with the Container Storage Interface (CSI). For cloud providers that do not support snapshots, OADP backs up resources and PV data with Restic.

Note

You must exclude Operators from the backup of an application for backup and restore to succeed.

Restore

You can restore resources and PVs from a backup. You can restore all objects in a backup or filter the objects by namespace, PV, or label.

Note

You must exclude Operators from the backup of an application for backup and restore to succeed.

Schedule
You can schedule backups at specified intervals.
Hooks
You can use hooks to run commands in a container on a pod, for example, fsfreeze to freeze a file system. You can configure a hook to run before or after a backup or restore. Restore hooks can run in an init container or in the application container.

1.4.2. OADP plugins

The OpenShift API for Data Protection (OADP) provides default Velero plugins that are integrated with storage providers to support backup and snapshot operations. You can create custom plugins based on the Velero plugins.

OADP also provides plugins for Red Hat OpenShift Service on AWS resource backups, OpenShift Virtualization resource backups, and Container Storage Interface (CSI) snapshots.

Table 1.3. OADP plugins
OADP pluginFunctionStorage location

aws

Backs up and restores Kubernetes objects.

AWS S3

Backs up and restores volumes with snapshots.

AWS EBS

openshift

Backs up and restores Red Hat OpenShift Service on AWS resources. [1]

Object store

kubevirt

Backs up and restores OpenShift Virtualization resources. [2]

Object store

csi

Backs up and restores volumes with CSI snapshots. [3]

Cloud storage that supports CSI snapshots

vsm

VolumeSnapshotMover relocates snapshots from the cluster into an object store to be used during a restore process to recover stateful applications, in situations such as cluster deletion. [4]

Object store

  1. Mandatory.
  2. Virtual machine disks are backed up with CSI snapshots or Restic.
  3. The csi plugin uses the Kubernetes CSI snapshot API.

    • OADP 1.1 or later uses snapshot.storage.k8s.io/v1
    • OADP 1.0 uses snapshot.storage.k8s.io/v1beta1
  4. OADP 1.2 only.

1.4.3. About OADP Velero plugins

You can configure two types of plugins when you install Velero:

  • Default cloud provider plugins
  • Custom plugins

Both types of plugin are optional, but most users configure at least one cloud provider plugin.

1.4.3.1. Default Velero cloud provider plugins

You can install any of the following default Velero cloud provider plugins when you configure the oadp_v1alpha1_dpa.yaml file during deployment:

  • aws (Amazon Web Services)
  • openshift (OpenShift Velero plugin)
  • csi (Container Storage Interface)
  • kubevirt (KubeVirt)

You specify the desired default plugins in the oadp_v1alpha1_dpa.yaml file during deployment.

Example file

The following .yaml file installs the openshift, aws, azure, and gcp plugins:

 apiVersion: oadp.openshift.io/v1alpha1
 kind: DataProtectionApplication
 metadata:
   name: dpa-sample
 spec:
   configuration:
     velero:
       defaultPlugins:
       - openshift
       - aws
       - azure
       - gcp
1.4.3.2. Custom Velero plugins

You can install a custom Velero plugin by specifying the plugin image and name when you configure the oadp_v1alpha1_dpa.yaml file during deployment.

You specify the desired custom plugins in the oadp_v1alpha1_dpa.yaml file during deployment.

Example file

The following .yaml file installs the default openshift, azure, and gcp plugins and a custom plugin that has the name custom-plugin-example and the image quay.io/example-repo/custom-velero-plugin:

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
 name: dpa-sample
spec:
 configuration:
   velero:
     defaultPlugins:
     - openshift
     - azure
     - gcp
     customPlugins:
     - name: custom-plugin-example
       image: quay.io/example-repo/custom-velero-plugin
1.4.3.3. Velero plugins returning "received EOF, stopping recv loop" message
Note

Velero plugins are started as separate processes. After the Velero operation has completed, either successfully or not, they exit. Receiving a received EOF, stopping recv loop message in the debug logs indicates that a plugin operation has completed. It does not mean that an error has occurred.

1.4.4. OADP plugins known issues

The following section describes known issues in OpenShift API for Data Protection (OADP) plugins:

1.4.4.1. Velero plugin panics during imagestream backups due to a missing secret

When the backup and the Backup Storage Location (BSL) are managed outside the scope of the Data Protection Application (DPA), the OADP controller, meaning the DPA reconciliation does not create the relevant oadp-<bsl_name>-<bsl_provider>-registry-secret.

When the backup is run, the OpenShift Velero plugin panics on the imagestream backup, with the following panic error:

024-02-27T10:46:50.028951744Z time="2024-02-27T10:46:50Z" level=error msg="Error backing up item"
backup=openshift-adp/<backup name> error="error executing custom action (groupResource=imagestreams.image.openshift.io,
namespace=<BSL Name>, name=postgres): rpc error: code = Aborted desc = plugin panicked:
runtime error: index out of range with length 1, stack trace: goroutine 94…
1.4.4.1.1. Workaround to avoid the panic error

To avoid the Velero plugin panic error, perform the following steps:

  1. Label the custom BSL with the relevant label:

    $ oc label backupstoragelocations.velero.io <bsl_name> app.kubernetes.io/component=bsl
  2. After the BSL is labeled, wait until the DPA reconciles.

    Note

    You can force the reconciliation by making any minor change to the DPA itself.

  3. When the DPA reconciles, confirm that the relevant oadp-<bsl_name>-<bsl_provider>-registry-secret has been created and that the correct registry data has been populated into it:

    $ oc -n openshift-adp get secret/oadp-<bsl_name>-<bsl_provider>-registry-secret -o json | jq -r '.data'
1.4.4.2. OpenShift ADP Controller segmentation fault

If you configure a DPA with both cloudstorage and restic enabled, the openshift-adp-controller-manager pod crashes and restarts indefinitely until the pod fails with a crash loop segmentation fault.

You can have either velero or cloudstorage defined, because they are mutually exclusive fields.

  • If you have both velero and cloudstorage defined, the openshift-adp-controller-manager fails.
  • If you have neither velero nor cloudstorage defined, the openshift-adp-controller-manager fails.

For more information about this issue, see OADP-1054.

1.4.4.2.1. OpenShift ADP Controller segmentation fault workaround

You must define either velero or cloudstorage when you configure a DPA. If you define both APIs in your DPA, the openshift-adp-controller-manager pod fails with a crash loop segmentation fault.

1.5. OADP use cases

1.5.1. Backing up workloads on OADP with ROSA STS

1.5.1.1. Performing a backup with OADP and ROSA STS

The following example hello-world application has no persistent volumes (PVs) attached. Perform a backup with OpenShift API for Data Protection (OADP) with Red Hat OpenShift Service on AWS (ROSA) STS.

Either Data Protection Application (DPA) configuration will work.

  1. Create a workload to back up by running the following commands:

    $ oc create namespace hello-world
    $ oc new-app -n hello-world --image=docker.io/openshift/hello-openshift
  2. Expose the route by running the following command:

    $ oc expose service/hello-openshift -n hello-world
  3. Check that the application is working by running the following command:

    $ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`

    Example output

    Hello OpenShift!

  4. Back up the workload by running the following command:

    $ cat << EOF | oc create -f -
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        name: hello-world
        namespace: openshift-adp
      spec:
        includedNamespaces:
        - hello-world
        storageLocation: ${CLUSTER_NAME}-dpa-1
        ttl: 720h0m0s
    EOF
  5. Wait until the backup is completed and then run the following command:

    $ watch "oc -n openshift-adp get backup hello-world -o json | jq .status"

    Example output

    {
      "completionTimestamp": "2022-09-07T22:20:44Z",
      "expiration": "2022-10-07T22:20:22Z",
      "formatVersion": "1.1.0",
      "phase": "Completed",
      "progress": {
        "itemsBackedUp": 58,
        "totalItems": 58
      },
      "startTimestamp": "2022-09-07T22:20:22Z",
      "version": 1
    }

  6. Delete the demo workload by running the following command:

    $ oc delete ns hello-world
  7. Restore the workload from the backup by running the following command:

    $ cat << EOF | oc create -f -
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: hello-world
        namespace: openshift-adp
      spec:
        backupName: hello-world
    EOF
  8. Wait for the Restore to finish by running the following command:

    $ watch "oc -n openshift-adp get restore hello-world -o json | jq .status"

    Example output

    {
      "completionTimestamp": "2022-09-07T22:25:47Z",
      "phase": "Completed",
      "progress": {
        "itemsRestored": 38,
        "totalItems": 38
      },
      "startTimestamp": "2022-09-07T22:25:28Z",
      "warnings": 9
    }

  9. Check that the workload is restored by running the following command:

    $ oc -n hello-world get pods

    Example output

    NAME                              READY   STATUS    RESTARTS   AGE
    hello-openshift-9f885f7c6-kdjpj   1/1     Running   0          90s

  10. Check the JSONPath by running the following command:

    $ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`

    Example output

    Hello OpenShift!

Note

For troubleshooting tips, see the OADP team’s troubleshooting documentation.

1.5.1.2. Cleaning up a cluster after a backup with OADP and ROSA STS

If you need to uninstall the OpenShift API for Data Protection (OADP) Operator together with the backups and the S3 bucket from this example, follow these instructions.

Procedure

  1. Delete the workload by running the following command:

    $ oc delete ns hello-world
  2. Delete the Data Protection Application (DPA) by running the following command:

    $ oc -n openshift-adp delete dpa ${CLUSTER_NAME}-dpa
  3. Delete the cloud storage by running the following command:

    $ oc -n openshift-adp delete cloudstorage ${CLUSTER_NAME}-oadp
    Warning

    If this command hangs, you might need to delete the finalizer by running the following command:

    $ oc -n openshift-adp patch cloudstorage ${CLUSTER_NAME}-oadp -p '{"metadata":{"finalizers":null}}' --type=merge
  4. If the Operator is no longer required, remove it by running the following command:

    $ oc -n openshift-adp delete subscription oadp-operator
  5. Remove the namespace from the Operator:

    $ oc delete ns openshift-adp
  6. If the backup and restore resources are no longer required, remove them from the cluster by running the following command:

    $ oc delete backups.velero.io hello-world
  7. To delete backup, restore and remote objects in AWS S3 run the following command:

    $ velero backup delete hello-world
  8. If you no longer need the Custom Resource Definitions (CRD), remove them from the cluster by running the following command:

    $ for CRD in `oc get crds | grep velero | awk '{print $1}'`; do oc delete crd $CRD; done
  9. Delete the AWS S3 bucket by running the following commands:

    $ aws s3 rm s3://${CLUSTER_NAME}-oadp --recursive
    $ aws s3api delete-bucket --bucket ${CLUSTER_NAME}-oadp
  10. Detach the policy from the role by running the following command:

    $ aws iam detach-role-policy --role-name "${ROLE_NAME}"  --policy-arn "${POLICY_ARN}"
  11. Delete the role by running the following command:

    $ aws iam delete-role --role-name "${ROLE_NAME}"

1.5.2. OpenShift API for Data Protection (OADP) restore use case

Following is a use case for using OADP to restore a backup to a different namespace.

1.5.2.1. Restoring an application to a different namespace using OADP

Restore a backup of an application by using OADP to a new target namespace, test-restore-application. To restore a backup, you create a restore custom resource (CR) as shown in the following example. In the restore CR, the source namespace refers to the application namespace that you included in the backup. You then verify the restore by changing your project to the new restored namespace and verifying the resources.

Prerequisites

  • You installed the OADP Operator.
  • You have the backup of an application to be restored.

Procedure

  1. Create a restore CR as shown in the following example:

    Example restore CR

    apiVersion: velero.io/v1
    kind: Restore
    metadata:
      name: test-restore 1
      namespace: openshift-adp
    spec:
      backupName: <backup_name> 2
      restorePVs: true
      namespaceMapping:
        <application_namespace>: test-restore-application 3

    1
    The name of the restore CR.
    2
    Specify the name of the backup.
    3
    namespaceMapping maps the source application namespace to the target application namespace. Specify the application namespace that you backed up. test-restore-application is the target namespace where you want to restore the backup.
  2. Apply the restore CR by running the following command:

    $ oc apply -f <restore_cr_filename>

Verification

  1. Verify that the restore is in the Completed phase by running the following command:

    $ oc describe restores.velero.io <restore_name> -n openshift-adp
  2. Change to the restored namespace test-restore-application by running the following command:

    $ oc project test-restore-application
  3. Verify the restored resources such as persistent volume claim (pvc), service (svc), deployment, secret, and config map by running the following command:

    $ oc get pvc,svc,deployment,secret,configmap

    Example output

    NAME                          STATUS   VOLUME
    persistentvolumeclaim/mysql   Bound    pvc-9b3583db-...-14b86
    
    NAME               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
    service/mysql      ClusterIP   172....157     <none>        3306/TCP   2m56s
    service/todolist   ClusterIP   172.....15     <none>        8000/TCP   2m56s
    
    NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/mysql   0/1     1            0           2m55s
    
    NAME                                         TYPE                      DATA   AGE
    secret/builder-dockercfg-6bfmd               kubernetes.io/dockercfg   1      2m57s
    secret/default-dockercfg-hz9kz               kubernetes.io/dockercfg   1      2m57s
    secret/deployer-dockercfg-86cvd              kubernetes.io/dockercfg   1      2m57s
    secret/mysql-persistent-sa-dockercfg-rgp9b   kubernetes.io/dockercfg   1      2m57s
    
    NAME                                 DATA   AGE
    configmap/kube-root-ca.crt           1      2m57s
    configmap/openshift-service-ca.crt   1      2m57s

1.6. Installing and configuring OADP

1.6.1. Installing OADP

You can use OpenShift API for Data Protection (OADP) with Red Hat OpenShift Service on AWS (ROSA) clusters to back up and restore application data.

Before installing OpenShift API for Data Protection (OADP), you must set up role and policy credentials for OADP so that it can use the Amazon Web Services API.

This process is performed in the following two stages:

  1. Prepare AWS credentials
  2. Install the OADP Operator and give it an IAM role
1.6.1.1. Preparing AWS credentials for OADP

An Amazon Web Services account must be prepared and configured to accept an OpenShift API for Data Protection (OADP) installation.

Procedure

  1. Create the following environment variables by running the following commands:

    Important

    Change the cluster name to match your ROSA cluster, and ensure you are logged into the cluster as an administrator. Ensure that all fields are outputted correctly before continuing.

    $ export CLUSTER_NAME=my-cluster 1
      export ROSA_CLUSTER_ID=$(rosa describe cluster -c ${CLUSTER_NAME} --output json | jq -r .id)
      export REGION=$(rosa describe cluster -c ${CLUSTER_NAME} --output json | jq -r .region.id)
      export OIDC_ENDPOINT=$(oc get authentication.config.openshift.io cluster -o jsonpath='{.spec.serviceAccountIssuer}' | sed 's|^https://||')
      export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
      export CLUSTER_VERSION=$(rosa describe cluster -c ${CLUSTER_NAME} -o json | jq -r .version.raw_id | cut -f -2 -d '.')
      export ROLE_NAME="${CLUSTER_NAME}-openshift-oadp-aws-cloud-credentials"
      export SCRATCH="/tmp/${CLUSTER_NAME}/oadp"
      mkdir -p ${SCRATCH}
      echo "Cluster ID: ${ROSA_CLUSTER_ID}, Region: ${REGION}, OIDC Endpoint:
      ${OIDC_ENDPOINT}, AWS Account ID: ${AWS_ACCOUNT_ID}"
    1
    Replace my-cluster with your ROSA cluster name.
  2. On the AWS account, create an IAM policy to allow access to AWS S3:

    1. Check to see if the policy exists by running the following command:

      $ POLICY_ARN=$(aws iam list-policies --query "Policies[?PolicyName=='RosaOadpVer1'].{ARN:Arn}" --output text) 1
      1
      Replace RosaOadp with your policy name.
    2. Enter the following command to create the policy JSON file and then create the policy in ROSA:

      Note

      If the policy ARN is not found, the command creates the policy. If the policy ARN already exists, the if statement intentionally skips the policy creation.

      $ if [[ -z "${POLICY_ARN}" ]]; then
        cat << EOF > ${SCRATCH}/policy.json 1
        {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "s3:CreateBucket",
              "s3:DeleteBucket",
              "s3:PutBucketTagging",
              "s3:GetBucketTagging",
              "s3:PutEncryptionConfiguration",
              "s3:GetEncryptionConfiguration",
              "s3:PutLifecycleConfiguration",
              "s3:GetLifecycleConfiguration",
              "s3:GetBucketLocation",
              "s3:ListBucket",
              "s3:GetObject",
              "s3:PutObject",
              "s3:DeleteObject",
              "s3:ListBucketMultipartUploads",
              "s3:AbortMultipartUpload",
              "s3:ListMultipartUploadParts",
              "ec2:DescribeSnapshots",
              "ec2:DescribeVolumes",
              "ec2:DescribeVolumeAttribute",
              "ec2:DescribeVolumesModifications",
              "ec2:DescribeVolumeStatus",
              "ec2:CreateTags",
              "ec2:CreateVolume",
              "ec2:CreateSnapshot",
              "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
          }
        ]}
      EOF
      
        POLICY_ARN=$(aws iam create-policy --policy-name "RosaOadpVer1" \
        --policy-document file:///${SCRATCH}/policy.json --query Policy.Arn \
        --tags Key=rosa_openshift_version,Value=${CLUSTER_VERSION} Key=rosa_role_prefix,Value=ManagedOpenShift Key=operator_namespace,Value=openshift-oadp Key=operator_name,Value=openshift-oadp \
        --output text)
        fi
      1
      SCRATCH is a name for a temporary directory created for the environment variables.
    3. View the policy ARN by running the following command:

      $ echo ${POLICY_ARN}
  3. Create an IAM role trust policy for the cluster:

    1. Create the trust policy file by running the following command:

      $ cat <<EOF > ${SCRATCH}/trust-policy.json
        {
            "Version": "2012-10-17",
            "Statement": [{
              "Effect": "Allow",
              "Principal": {
                "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT}"
              },
              "Action": "sts:AssumeRoleWithWebIdentity",
              "Condition": {
                "StringEquals": {
                  "${OIDC_ENDPOINT}:sub": [
                    "system:serviceaccount:openshift-adp:openshift-adp-controller-manager",
                    "system:serviceaccount:openshift-adp:velero"]
                }
              }
            }]
        }
      EOF
    2. Create the role by running the following command:

      $ ROLE_ARN=$(aws iam create-role --role-name \
        "${ROLE_NAME}" \
        --assume-role-policy-document file://${SCRATCH}/trust-policy.json \
        --tags Key=rosa_cluster_id,Value=${ROSA_CLUSTER_ID} \
               Key=rosa_openshift_version,Value=${CLUSTER_VERSION} \
               Key=rosa_role_prefix,Value=ManagedOpenShift \
               Key=operator_namespace,Value=openshift-adp \
               Key=operator_name,Value=openshift-oadp \
        --query Role.Arn --output text)
    3. View the role ARN by running the following command:

      $ echo ${ROLE_ARN}
  4. Attach the IAM policy to the IAM role by running the following command:

    $ aws iam attach-role-policy --role-name "${ROLE_NAME}" \
      --policy-arn ${POLICY_ARN}
1.6.1.2. Installing the OADP Operator and providing the IAM role

AWS Security Token Service (AWS STS) is a global web service that provides short-term credentials for IAM or federated users. Red Hat OpenShift Service on AWS (ROSA) with STS is the recommended credential mode for ROSA clusters. This document describes how to install OpenShift API for Data Protection (OADP) on ROSA with AWS STS.

Important

Restic is unsupported.

Kopia file system backup (FSB) is supported when backing up file systems that do not have Container Storage Interface (CSI) snapshotting support.

Example file systems include the following:

  • Amazon Elastic File System (EFS)
  • Network File System (NFS)
  • emptyDir volumes
  • Local volumes

For backing up volumes, OADP on ROSA with AWS STS supports only native snapshots and Container Storage Interface (CSI) snapshots.

In an Amazon ROSA cluster that uses STS authentication, restoring backed-up data in a different AWS region is not supported.

The Data Mover feature is not currently supported in ROSA clusters. You can use native AWS S3 tools for moving data.

Prerequisites

  • An Red Hat OpenShift Service on AWS ROSA cluster with the required access and tokens. For instructions, see the previous procedure Preparing AWS credentials for OADP. If you plan to use two different clusters for backing up and restoring, you must prepare AWS credentials, including ROLE_ARN, for each cluster.

Procedure

  1. Create an Red Hat OpenShift Service on AWS secret from your AWS token file by entering the following commands:

    1. Create the credentials file:

      $ cat <<EOF > ${SCRATCH}/credentials
        [default]
        role_arn = ${ROLE_ARN}
        web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
        region = <aws_region> 1
      EOF
      1
      Replace <aws_region> with the AWS region to use for the STS endpoint.
    2. Create a namespace for OADP:

      $ oc create namespace openshift-adp
    3. Create the Red Hat OpenShift Service on AWS secret:

      $ oc -n openshift-adp create secret generic cloud-credentials \
        --from-file=${SCRATCH}/credentials
      Note

      In Red Hat OpenShift Service on AWS versions 4.15 and later, the OADP Operator supports a new standardized STS workflow through the Operator Lifecycle Manager (OLM) and Cloud Credentials Operator (CCO). In this workflow, you do not need to create the above secret, you only need to supply the role ARN during the installation of OLM-managed operators using the Red Hat OpenShift Service on AWS web console, for more information see Installing from OperatorHub using the web console.

      The preceding secret is created automatically by CCO.

  2. Install the OADP Operator:

    1. In the Red Hat OpenShift Service on AWS web console, browse to OperatorsOperatorHub.
    2. Search for the OADP Operator.
    3. In the role_ARN field, paste the role_arn that you created previously and click Install.
  3. Create AWS cloud storage using your AWS credentials by entering the following command:

    $ cat << EOF | oc create -f -
      apiVersion: oadp.openshift.io/v1alpha1
      kind: CloudStorage
      metadata:
        name: ${CLUSTER_NAME}-oadp
        namespace: openshift-adp
      spec:
        creationSecret:
          key: credentials
          name: cloud-credentials
        enableSharedConfig: true
        name: ${CLUSTER_NAME}-oadp
        provider: aws
        region: $REGION
    EOF
  4. Check your application’s storage default storage class by entering the following command:

    $ oc get pvc -n <namespace>

    Example output

    NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    applog   Bound    pvc-351791ae-b6ab-4e8b-88a4-30f73caf5ef8   1Gi        RWO            gp3-csi        4d19h
    mysql    Bound    pvc-16b8e009-a20a-4379-accc-bc81fedd0621   1Gi        RWO            gp3-csi        4d19h

  5. Get the storage class by running the following command:

    $ oc get storageclass

    Example output

    NAME                PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    gp2                 kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   4d21h
    gp2-csi             ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4d21h
    gp3                 ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4d21h
    gp3-csi (default)   ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4d21h

    Note

    The following storage classes will work:

    • gp3-csi
    • gp2-csi
    • gp3
    • gp2

    If the application or applications that are being backed up are all using persistent volumes (PVs) with Container Storage Interface (CSI), it is advisable to include the CSI plugin in the OADP DPA configuration.

  6. Create the DataProtectionApplication resource to configure the connection to the storage where the backups and volume snapshots are stored:

    1. If you are using only CSI volumes, deploy a Data Protection Application by entering the following command:

      $ cat << EOF | oc create -f -
        apiVersion: oadp.openshift.io/v1alpha1
        kind: DataProtectionApplication
        metadata:
          name: ${CLUSTER_NAME}-dpa
          namespace: openshift-adp
        spec:
          backupImages: true 1
          features:
            dataMover:
              enable: false
          backupLocations:
          - bucket:
              cloudStorageRef:
                name: ${CLUSTER_NAME}-oadp
              credential:
                key: credentials
                name: cloud-credentials
              prefix: velero
              default: true
              config:
                region: ${REGION}
          configuration:
            velero:
              defaultPlugins:
              - openshift
              - aws
              - csi
            nodeAgent:  2
              enable: false
              uploaderType: kopia 3
      EOF
      1
      ROSA supports internal image backup. Set this field to false if you do not want to use image backup.
      2
      See the important note regarding the nodeAgent attribute.
      3
      The type of uploader. The possible values are restic or kopia. The built-in Data Mover uses Kopia as the default uploader mechanism regardless of the value of the uploaderType field.
  1. If you are using CSI or non-CSI volumes, deploy a Data Protection Application by entering the following command:

    $ cat << EOF | oc create -f -
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: ${CLUSTER_NAME}-dpa
        namespace: openshift-adp
      spec:
        backupImages: true 1
        backupLocations:
        - bucket:
            cloudStorageRef:
              name: ${CLUSTER_NAME}-oadp
            credential:
              key: credentials
              name: cloud-credentials
            prefix: velero
            default: true
            config:
              region: ${REGION}
        configuration:
          velero:
            defaultPlugins:
            - openshift
            - aws
          nodeAgent: 2
            enable: false
            uploaderType: restic
        snapshotLocations:
          - velero:
              config:
                credentialsFile: /tmp/credentials/openshift-adp/cloud-credentials-credentials 3
                enableSharedConfig: "true" 4
                profile: default 5
                region: ${REGION} 6
              provider: aws
    EOF
    1
    ROSA supports internal image backup. Set this field to false if you do not want to use image backup.
    2
    See the important note regarding the nodeAgent attribute.
    3
    The credentialsFile field is the mounted location of the bucket credential on the pod.
    4
    The enableSharedConfig field allows the snapshotLocations to share or reuse the credential defined for the bucket.
    5
    Use the profile name set in the AWS credentials file.
    6
    Specify region as your AWS region. This must be the same as the cluster region.

    You are now ready to back up and restore Red Hat OpenShift Service on AWS applications, as described in Backing up applications.

Important

The enable parameter of restic is set to false in this configuration, because OADP does not support Restic in ROSA environments.

If you use OADP 1.2, replace this configuration:

nodeAgent:
  enable: false
  uploaderType: restic

with the following configuration:

restic:
  enable: false

If you want to use two different clusters for backing up and restoring, the two clusters must have the same AWS S3 storage names in both the cloud storage CR and the OADP DataProtectionApplication configuration.

1.6.1.3. Updating the IAM role ARN in the OADP Operator subscription

While installing the OADP Operator on a ROSA Security Token Service (STS) cluster, if you provide an incorrect IAM role Amazon Resource Name (ARN), the openshift-adp-controller pod gives an error. The credential requests that are generated contain the wrong IAM role ARN. To update the credential requests object with the correct IAM role ARN, you can edit the OADP Operator subscription and patch the IAM role ARN with the correct value. By editing the OADP Operator subscription, you do not have to uninstall and reinstall OADP to update the IAM role ARN.

Prerequisites

  • You have a Red Hat OpenShift Service on AWS STS cluster with the required access and tokens.
  • You have installed OADP on the ROSA STS cluster.

Procedure

  1. To verify that the OADP subscription has the wrong IAM role ARN environment variable set, run the following command:

    $ oc get sub -o yaml redhat-oadp-operator

    Example subscription

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      annotations:
      creationTimestamp: "2025-01-15T07:18:31Z"
      generation: 1
      labels:
        operators.coreos.com/redhat-oadp-operator.openshift-adp: ""
      name: redhat-oadp-operator
      namespace: openshift-adp
      resourceVersion: "77363"
      uid: 5ba00906-5ad2-4476-ae7b-ffa90986283d
    spec:
      channel: stable-1.4
      config:
        env:
        - name: ROLEARN
          value: arn:aws:iam::11111111:role/wrong-role-arn 1
      installPlanApproval: Manual
      name: redhat-oadp-operator
      source: prestage-operators
      sourceNamespace: openshift-marketplace
      startingCSV: oadp-operator.v1.4.2

    1
    Verify the value of ROLEARN you want to update.
  2. Update the ROLEARN field of the subscription with the correct role ARN by running the following command:

    $ oc patch subscription redhat-oadp-operator -p '{"spec": {"config": {"env": [{"name": "ROLEARN", "value": "<role_arn>"}]}}}' --type='merge'

    where:

    <role_arn>
    Specifies the IAM role ARN to be updated. For example, arn:aws:iam::160…​..6956:role/oadprosa…​..8wlf.
  3. Verify that the secret object is updated with correct role ARN value by running the following command:

    $ oc get secret cloud-credentials -o jsonpath='{.data.credentials}' | base64 -d

    Example output

    [default]
    sts_regional_endpoints = regional
    role_arn = arn:aws:iam::160.....6956:role/oadprosa.....8wlf
    web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

  4. Configure the DataProtectionApplication custom resource (CR) manifest file as shown in the following example:

    apiVersion: oadp.openshift.io/v1alpha1
    kind: DataProtectionApplication
    metadata:
      name: test-rosa-dpa
      namespace: openshift-adp
    spec:
      backupLocations:
      - bucket:
          config:
            region: us-east-1
          cloudStorageRef:
            name: <cloud_storage> 1
          credential:
            name: cloud-credentials
            key: credentials
          prefix: velero
          default: true
      configuration:
        velero:
          defaultPlugins:
          - aws
          - openshift
    1
    Specify the CloudStorage CR.
  5. Create the DataProtectionApplication CR by running the following command:

    $ oc create -f <dpa_manifest_file>
  6. Verify that the DataProtectionApplication CR is reconciled and the status is set to "True" by running the following command:

    $  oc get dpa -n openshift-adp -o yaml

    Example DataProtectionApplication

    apiVersion: oadp.openshift.io/v1alpha1
    kind: DataProtectionApplication
    ...
    status:
        conditions:
        - lastTransitionTime: "2023-07-31T04:48:12Z"
          message: Reconcile complete
          reason: Complete
          status: "True"
          type: Reconciled

  7. Verify that the BackupStorageLocation CR is in an available state by running the following command:

    $ oc get backupstoragelocations.velero.io -n openshift-adp

    Example BackupStorageLocation

    NAME       PHASE       LAST VALIDATED   AGE   DEFAULT
    ts-dpa-1   Available   3s               6s    true

Additional resources

1.7. Uninstalling OADP

1.7.1. Uninstalling the OpenShift API for Data Protection

You uninstall the OpenShift API for Data Protection (OADP) by deleting the OADP Operator. See Deleting Operators from a cluster for details.

1.8. OADP backing up

1.8.1. Backing up applications

Frequent backups might consume storage on the backup storage location. Check the frequency of backups, retention time, and the amount of data of the persistent volumes (PVs) if using non-local backups, for example, S3 buckets. Because all taken backup remains until expired, also check the time to live (TTL) setting of the schedule.

You can back up applications by creating a Backup custom resource (CR). For more information, see Creating a Backup CR.

The Backup CR creates backup files for Kubernetes resources and internal images on S3 object storage.

1.8.1.1. Previewing resources before running backup and restore

OADP backs up application resources based on the type, namespace, or label. This means that you can view the resources after the backup is complete. Similarly, you can view the restored objects based on the namespace, persistent volume (PV), or label after a restore operation is complete. To preview the resources in advance, you can do a dry run of the backup and restore operations.

Prerequisites

  • You have installed the OADP Operator.

Procedure

  1. To preview the resources included in the backup before running the actual backup, run the following command:

    $ velero backup create <backup-name> --snapshot-volumes false 1
    1
    Specify the value of --snapshot-volumes parameter as false.
  2. To know more details about the backup resources, run the following command:

    $ velero describe backup <backup_name> --details 1
    1
    Specify the name of the backup.
  3. To preview the resources included in the restore before running the actual restore, run the following command:

    $ velero restore create --from-backup <backup-name> 1
    1
    Specify the name of the backup created to review the backup resources.
    Important

    The velero restore create command creates restore resources in the cluster. You must delete the resources created as part of the restore, after you review the resources.

  4. To know more details about the restore resources, run the following command:

    $ velero describe restore <restore_name> --details 1
    1
    Specify the name of the restore.

You can create backup hooks to run commands before or after the backup operation. See Creating backup hooks.

You can schedule backups by creating a Schedule CR instead of a Backup CR. See Scheduling backups using Schedule CR.

1.8.1.2. Known issues

Red Hat OpenShift Service on AWS 4 enforces a pod security admission (PSA) policy that can hinder the readiness of pods during a Restic restore process.

This issue has been resolved in the OADP 1.1.6 and OADP 1.2.2 releases, therefore it is recommended that users upgrade to these releases.

1.8.2. Creating a Backup CR

You back up Kubernetes resources, internal images, and persistent volumes (PVs) by creating a Backup custom resource (CR).

Prerequisites

  • You must install the OpenShift API for Data Protection (OADP) Operator.
  • The DataProtectionApplication CR must be in a Ready state.
  • Backup location prerequisites:

    • You must have S3 object storage configured for Velero.
    • You must have a backup location configured in the DataProtectionApplication CR.
  • Snapshot location prerequisites:

    • Your cloud provider must have a native snapshot API or support Container Storage Interface (CSI) snapshots.
    • For CSI snapshots, you must create a VolumeSnapshotClass CR to register the CSI driver.
    • You must have a volume location configured in the DataProtectionApplication CR.

Procedure

  1. Retrieve the backupStorageLocations CRs by entering the following command:

    $ oc get backupstoragelocations.velero.io -n openshift-adp

    Example output

    NAMESPACE       NAME              PHASE       LAST VALIDATED   AGE   DEFAULT
    openshift-adp   velero-sample-1   Available   11s              31m

  2. Create a Backup CR, as in the following example:

    apiVersion: velero.io/v1
    kind: Backup
    metadata:
      name: <backup>
      labels:
        velero.io/storage-location: default
      namespace: openshift-adp
    spec:
      hooks: {}
      includedNamespaces:
      - <namespace> 1
      includedResources: [] 2
      excludedResources: [] 3
      storageLocation: <velero-sample-1> 4
      ttl: 720h0m0s
      labelSelector: 5
        matchLabels:
          app: <label_1>
          app: <label_2>
          app: <label_3>
      orLabelSelectors: 6
      - matchLabels:
          app: <label_1>
          app: <label_2>
          app: <label_3>
    1
    Specify an array of namespaces to back up.
    2
    Optional: Specify an array of resources to include in the backup. Resources might be shortcuts (for example, 'po' for 'pods') or fully-qualified. If unspecified, all resources are included.
    3
    Optional: Specify an array of resources to exclude from the backup. Resources might be shortcuts (for example, 'po' for 'pods') or fully-qualified.
    4
    Specify the name of the backupStorageLocations CR.
    5
    Map of {key,value} pairs of backup resources that have all the specified labels.
    6
    Map of {key,value} pairs of backup resources that have one or more of the specified labels.
  3. Verify that the status of the Backup CR is Completed:

    $ oc get backups.velero.io -n openshift-adp <backup> -o jsonpath='{.status.phase}'

1.8.3. Creating backup hooks

When performing a backup, it is possible to specify one or more commands to execute in a container within a pod, based on the pod being backed up.

The commands can be configured to performed before any custom action processing (Pre hooks), or after all custom actions have been completed and any additional items specified by the custom action have been backed up (Post hooks).

You create backup hooks to run commands in a container in a pod by editing the Backup custom resource (CR).

Procedure

  • Add a hook to the spec.hooks block of the Backup CR, as in the following example:

    apiVersion: velero.io/v1
    kind: Backup
    metadata:
      name: <backup>
      namespace: openshift-adp
    spec:
      hooks:
        resources:
          - name: <hook_name>
            includedNamespaces:
            - <namespace> 1
            excludedNamespaces: 2
            - <namespace>
            includedResources: []
            - pods 3
            excludedResources: [] 4
            labelSelector: 5
              matchLabels:
                app: velero
                component: server
            pre: 6
              - exec:
                  container: <container> 7
                  command:
                  - /bin/uname 8
                  - -a
                  onError: Fail 9
                  timeout: 30s 10
            post: 11
    ...
    1
    Optional: You can specify namespaces to which the hook applies. If this value is not specified, the hook applies to all namespaces.
    2
    Optional: You can specify namespaces to which the hook does not apply.
    3
    Currently, pods are the only supported resource that hooks can apply to.
    4
    Optional: You can specify resources to which the hook does not apply.
    5
    Optional: This hook only applies to objects matching the label. If this value is not specified, the hook applies to all objects.
    6
    Array of hooks to run before the backup.
    7
    Optional: If the container is not specified, the command runs in the first container in the pod.
    8
    This is the entry point for the init container being added.
    9
    Allowed values for error handling are Fail and Continue. The default is Fail.
    10
    Optional: How long to wait for the commands to run. The default is 30s.
    11
    This block defines an array of hooks to run after the backup, with the same parameters as the pre-backup hooks.

1.8.4. Scheduling backups using Schedule CR

The schedule operation allows you to create a backup of your data at a particular time, specified by a Cron expression.

You schedule backups by creating a Schedule custom resource (CR) instead of a Backup CR.

Warning

Leave enough time in your backup schedule for a backup to finish before another backup is created.

For example, if a backup of a namespace typically takes 10 minutes, do not schedule backups more frequently than every 15 minutes.

Prerequisites

  • You must install the OpenShift API for Data Protection (OADP) Operator.
  • The DataProtectionApplication CR must be in a Ready state.

Procedure

  1. Retrieve the backupStorageLocations CRs:

    $ oc get backupStorageLocations -n openshift-adp

    Example output

    NAMESPACE       NAME              PHASE       LAST VALIDATED   AGE   DEFAULT
    openshift-adp   velero-sample-1   Available   11s              31m

  2. Create a Schedule CR, as in the following example:

    $ cat << EOF | oc apply -f -
    apiVersion: velero.io/v1
    kind: Schedule
    metadata:
      name: <schedule>
      namespace: openshift-adp
    spec:
      schedule: 0 7 * * * 1
      template:
        hooks: {}
        includedNamespaces:
        - <namespace> 2
        storageLocation: <velero-sample-1> 3
        defaultVolumesToFsBackup: true 4
        ttl: 720h0m0s
    EOF
1
cron expression to schedule the backup, for example, 0 7 * * * to perform a backup every day at 7:00.
Note

To schedule a backup at specific intervals, enter the <duration_in_minutes> in the following format:

  schedule: "*/10 * * * *"

Enter the minutes value between quotation marks (" ").

2
Array of namespaces to back up.
3
Name of the backupStorageLocations CR.
4
Optional: In OADP version 1.2 and later, add the defaultVolumesToFsBackup: true key-value pair to your configuration when performing backups of volumes with Restic. In OADP version 1.1, add the defaultVolumesToRestic: true key-value pair when you back up volumes with Restic.
  1. Verify that the status of the Schedule CR is Completed after the scheduled backup runs:

    $ oc get schedule -n openshift-adp <schedule> -o jsonpath='{.status.phase}'

1.8.5. Deleting backups

You can delete a backup by creating the DeleteBackupRequest custom resource (CR) or by running the velero backup delete command as explained in the following procedures.

The volume backup artifacts are deleted at different times depending on the backup method:

  • Restic: The artifacts are deleted in the next full maintenance cycle, after the backup is deleted.
  • Container Storage Interface (CSI): The artifacts are deleted immediately when the backup is deleted.
  • Kopia: The artifacts are deleted after three full maintenance cycles of the Kopia repository, after the backup is deleted.
1.8.5.1. Deleting a backup by creating a DeleteBackupRequest CR

You can delete a backup by creating a DeleteBackupRequest custom resource (CR).

Prerequisites

  • You have run a backup of your application.

Procedure

  1. Create a DeleteBackupRequest CR manifest file:

    apiVersion: velero.io/v1
    kind: DeleteBackupRequest
    metadata:
      name: deletebackuprequest
      namespace: openshift-adp
    spec:
      backupName: <backup_name> 1
    1
    Specify the name of the backup.
  2. Apply the DeleteBackupRequest CR to delete the backup:

    $ oc apply -f <deletebackuprequest_cr_filename>
1.8.5.2. Deleting a backup by using the Velero CLI

You can delete a backup by using the Velero CLI.

Prerequisites

  • You have run a backup of your application.
  • You downloaded the Velero CLI and can access the Velero binary in your cluster.

Procedure

  • To delete the backup, run the following Velero command:

    $ velero backup delete <backup_name> -n openshift-adp 1
    1
    Specify the name of the backup.
1.8.5.3. About Kopia repository maintenance

There are two types of Kopia repository maintenance:

Quick maintenance
  • Runs every hour to keep the number of index blobs (n) low. A high number of indexes negatively affects the performance of Kopia operations.
  • Does not delete any metadata from the repository without ensuring that another copy of the same metadata exists.
Full maintenance
  • Runs every 24 hours to perform garbage collection of repository contents that are no longer needed.
  • snapshot-gc, a full maintenance task, finds all files and directory listings that are no longer accessible from snapshot manifests and marks them as deleted.
  • A full maintenance is a resource-costly operation, as it requires scanning all directories in all snapshots that are active in the cluster.
1.8.5.3.1. Kopia maintenance in OADP

The repo-maintain-job jobs are executed in the namespace where OADP is installed, as shown in the following example:

pod/repo-maintain-job-173...2527-2nbls                             0/1     Completed   0          168m
pod/repo-maintain-job-173....536-fl9tm                             0/1     Completed   0          108m
pod/repo-maintain-job-173...2545-55ggx                             0/1     Completed   0          48m

You can check the logs of the repo-maintain-job for more details about the cleanup and the removal of artifacts in the backup object storage. You can find a note, as shown in the following example, in the repo-maintain-job when the next full cycle maintenance is due:

not due for full maintenance cycle until 2024-00-00 18:29:4
Important

Three successful executions of a full maintenance cycle are required for the objects to be deleted from the backup object storage. This means you can expect up to 72 hours for all the artifacts in the backup object storage to be deleted.

1.8.5.4. Deleting a backup repository

After you delete the backup, and after the Kopia repository maintenance cycles to delete the related artifacts are complete, the backup is no longer referenced by any metadata or manifest objects. You can then delete the backuprepository custom resource (CR) to complete the backup deletion process.

Prerequisites

  • You have deleted the backup of your application.
  • You have waited up to 72 hours after the backup is deleted. This time frame allows Kopia to run the repository maintenance cycles.

Procedure

  1. To get the name of the backup repository CR for a backup, run the following command:

    $ oc get backuprepositories.velero.io -n openshift-adp
  2. To delete the backup repository CR, run the following command:

    $ oc delete backuprepository <backup_repository_name> -n openshift-adp 1
    1
    Specify the name of the backup repository from the earlier step.

1.9. OADP restoring

1.9.1. Restoring applications

You restore application backups by creating a Restore custom resource (CR). See Creating a Restore CR.

You can create restore hooks to run commands in a container in a pod by editing the Restore CR. See Creating restore hooks.

1.9.1.1. Previewing resources before running backup and restore

OADP backs up application resources based on the type, namespace, or label. This means that you can view the resources after the backup is complete. Similarly, you can view the restored objects based on the namespace, persistent volume (PV), or label after a restore operation is complete. To preview the resources in advance, you can do a dry run of the backup and restore operations.

Prerequisites

  • You have installed the OADP Operator.

Procedure

  1. To preview the resources included in the backup before running the actual backup, run the following command:

    $ velero backup create <backup-name> --snapshot-volumes false 1
    1
    Specify the value of --snapshot-volumes parameter as false.
  2. To know more details about the backup resources, run the following command:

    $ velero describe backup <backup_name> --details 1
    1
    Specify the name of the backup.
  3. To preview the resources included in the restore before running the actual restore, run the following command:

    $ velero restore create --from-backup <backup-name> 1
    1
    Specify the name of the backup created to review the backup resources.
    Important

    The velero restore create command creates restore resources in the cluster. You must delete the resources created as part of the restore, after you review the resources.

  4. To know more details about the restore resources, run the following command:

    $ velero describe restore <restore_name> --details 1
    1
    Specify the name of the restore.
1.9.1.2. Creating a Restore CR

You restore a Backup custom resource (CR) by creating a Restore CR.

Prerequisites

  • You must install the OpenShift API for Data Protection (OADP) Operator.
  • The DataProtectionApplication CR must be in a Ready state.
  • You must have a Velero Backup CR.
  • The persistent volume (PV) capacity must match the requested size at backup time. Adjust the requested size if needed.

Procedure

  1. Create a Restore CR, as in the following example:

    apiVersion: velero.io/v1
    kind: Restore
    metadata:
      name: <restore>
      namespace: openshift-adp
    spec:
      backupName: <backup> 1
      includedResources: [] 2
      excludedResources:
      - nodes
      - events
      - events.events.k8s.io
      - backups.velero.io
      - restores.velero.io
      - resticrepositories.velero.io
      restorePVs: true 3
    1
    Name of the Backup CR.
    2
    Optional: Specify an array of resources to include in the restore process. Resources might be shortcuts (for example, po for pods) or fully-qualified. If unspecified, all resources are included.
    3
    Optional: The restorePVs parameter can be set to false to turn off restore of PersistentVolumes from VolumeSnapshot of Container Storage Interface (CSI) snapshots or from native snapshots when VolumeSnapshotLocation is configured.
  2. Verify that the status of the Restore CR is Completed by entering the following command:

    $ oc get restores.velero.io -n openshift-adp <restore> -o jsonpath='{.status.phase}'
  3. Verify that the backup resources have been restored by entering the following command:

    $ oc get all -n <namespace> 1
    1
    Namespace that you backed up.
  4. If you restore DeploymentConfig with volumes or if you use post-restore hooks, run the dc-post-restore.sh cleanup script by entering the following command:

    $ bash dc-restic-post-restore.sh -> dc-post-restore.sh
    Note

    During the restore process, the OADP Velero plug-ins scale down the DeploymentConfig objects and restore the pods as standalone pods. This is done to prevent the cluster from deleting the restored DeploymentConfig pods immediately on restore and to allow the restore and post-restore hooks to complete their actions on the restored pods. The cleanup script shown below removes these disconnected pods and scales any DeploymentConfig objects back up to the appropriate number of replicas.

    Example 1.1. dc-restic-post-restore.sh → dc-post-restore.sh cleanup script

    #!/bin/bash
    set -e
    
    # if sha256sum exists, use it to check the integrity of the file
    if command -v sha256sum >/dev/null 2>&1; then
      CHECKSUM_CMD="sha256sum"
    else
      CHECKSUM_CMD="shasum -a 256"
    fi
    
    label_name () {
        if [ "${#1}" -le "63" ]; then
    	echo $1
    	return
        fi
        sha=$(echo -n $1|$CHECKSUM_CMD)
        echo "${1:0:57}${sha:0:6}"
    }
    
    if [[ $# -ne 1 ]]; then
        echo "usage: ${BASH_SOURCE} restore-name"
        exit 1
    fi
    
    echo "restore: $1"
    
    label=$(label_name $1)
    echo "label:   $label"
    
    echo Deleting disconnected restore pods
    oc delete pods --all-namespaces -l oadp.openshift.io/disconnected-from-dc=$label
    
    for dc in $(oc get dc --all-namespaces -l oadp.openshift.io/replicas-modified=$label -o jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{","}{.metadata.annotations.oadp\.openshift\.io/original-replicas}{","}{.metadata.annotations.oadp\.openshift\.io/original-paused}{"\n"}')
    do
        IFS=',' read -ra dc_arr <<< "$dc"
        if [ ${#dc_arr[0]} -gt 0 ]; then
    	echo Found deployment ${dc_arr[0]}/${dc_arr[1]}, setting replicas: ${dc_arr[2]}, paused: ${dc_arr[3]}
    	cat <<EOF | oc patch dc  -n ${dc_arr[0]} ${dc_arr[1]} --patch-file /dev/stdin
    spec:
      replicas: ${dc_arr[2]}
      paused: ${dc_arr[3]}
    EOF
        fi
    done
1.9.1.3. Creating restore hooks

You create restore hooks to run commands in a container in a pod by editing the Restore custom resource (CR).

You can create two types of restore hooks:

  • An init hook adds an init container to a pod to perform setup tasks before the application container starts.

    If you restore a Restic backup, the restic-wait init container is added before the restore hook init container.

  • An exec hook runs commands or scripts in a container of a restored pod.

Procedure

  • Add a hook to the spec.hooks block of the Restore CR, as in the following example:

    apiVersion: velero.io/v1
    kind: Restore
    metadata:
      name: <restore>
      namespace: openshift-adp
    spec:
      hooks:
        resources:
          - name: <hook_name>
            includedNamespaces:
            - <namespace> 1
            excludedNamespaces:
            - <namespace>
            includedResources:
            - pods 2
            excludedResources: []
            labelSelector: 3
              matchLabels:
                app: velero
                component: server
            postHooks:
            - init:
                initContainers:
                - name: restore-hook-init
                  image: alpine:latest
                  volumeMounts:
                  - mountPath: /restores/pvc1-vm
                    name: pvc1-vm
                  command:
                  - /bin/ash
                  - -c
                timeout: 4
            - exec:
                container: <container> 5
                command:
                - /bin/bash 6
                - -c
                - "psql < /backup/backup.sql"
                waitTimeout: 5m 7
                execTimeout: 1m 8
                onError: Continue 9
    1
    Optional: Array of namespaces to which the hook applies. If this value is not specified, the hook applies to all namespaces.
    2
    Currently, pods are the only supported resource that hooks can apply to.
    3
    Optional: This hook only applies to objects matching the label selector.
    4
    Optional: Timeout specifies the maximum length of time Velero waits for initContainers to complete.
    5
    Optional: If the container is not specified, the command runs in the first container in the pod.
    6
    This is the entrypoint for the init container being added.
    7
    Optional: How long to wait for a container to become ready. This should be long enough for the container to start and for any preceding hooks in the same container to complete. If not set, the restore process waits indefinitely.
    8
    Optional: How long to wait for the commands to run. The default is 30s.
    9
    Allowed values for error handling are Fail and Continue:
    • Continue: Only command failures are logged.
    • Fail: No more restore hooks run in any container in any pod. The status of the Restore CR will be PartiallyFailed.
Important

During a File System Backup (FSB) restore operation, a Deployment resource referencing an ImageStream is not restored properly. The restored pod that runs the FSB, and the postHook is terminated prematurely.

This happens because, during the restore operation, OpenShift controller updates the spec.template.spec.containers[0].image field in the Deployment resource with an updated ImageStreamTag hash. The update triggers the rollout of a new pod, terminating the pod on which velero runs the FSB and the post restore hook. For more information about image stream trigger, see "Triggering updates on image stream changes".

The workaround for this behavior is a two-step restore process:

  1. First, perform a restore excluding the Deployment resources, for example:

    $ velero restore create <RESTORE_NAME> \
      --from-backup <BACKUP_NAME> \
      --exclude-resources=deployment.apps
  2. After the first restore is successful, perform a second restore by including these resources, for example:

    $ velero restore create <RESTORE_NAME> \
      --from-backup <BACKUP_NAME> \
      --include-resources=deployment.apps

Legal Notice

Copyright © 2024 Red Hat, Inc.

OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

Modified versions must remove all Red Hat trademarks.

Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.

Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

Red Hat logoGithubRedditYoutubeTwitter

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Aiutiamo gli utenti Red Hat a innovarsi e raggiungere i propri obiettivi con i nostri prodotti e servizi grazie a contenuti di cui possono fidarsi.

Rendiamo l’open source più inclusivo

Red Hat si impegna a sostituire il linguaggio problematico nel codice, nella documentazione e nelle proprietà web. Per maggiori dettagli, visita ilBlog di Red Hat.

Informazioni su Red Hat

Forniamo soluzioni consolidate che rendono più semplice per le aziende lavorare su piattaforme e ambienti diversi, dal datacenter centrale all'edge della rete.

© 2024 Red Hat, Inc.