Ce contenu n'est pas disponible dans la langue sélectionnée.
Backup and restore
Backing up and restoring your OpenShift Container Platform cluster
Abstract
Chapter 1. Backup and restore Copier lienLien copié sur presse-papiers!
1.1. Control plane backup and restore operations Copier lienLien copié sur presse-papiers!
As a cluster administrator, you might need to stop an OpenShift Container Platform cluster for a period and restart it later. Some reasons for restarting a cluster are that you need to perform maintenance on a cluster or want to reduce resource costs. In OpenShift Container Platform, you can perform a graceful shutdown of a cluster so that you can easily restart the cluster later.
You must back up etcd data before shutting down a cluster; etcd is the key-value store for OpenShift Container Platform, which persists the state of all resource objects. An etcd backup plays a crucial role in disaster recovery. In OpenShift Container Platform, you can also replace an unhealthy etcd member.
When you want to get your cluster running again, restart the cluster gracefully.
A cluster’s certificates expire one year after the installation date. You can shut down a cluster and expect it to restart gracefully while the certificates are still valid. Although the cluster automatically retrieves the expired control plane certificates, you must still approve the certificate signing requests (CSRs).
You might run into several situations where OpenShift Container Platform does not work as expected, such as:
- You have a cluster that is not functional after the restart because of unexpected conditions, such as node failure or network connectivity issues.
- You have deleted something critical in the cluster by mistake.
- You have lost the majority of your control plane hosts, leading to etcd quorum loss.
You can always recover from a disaster situation by restoring your cluster to its previous state using the saved etcd snapshots.
1.2. Application backup and restore operations Copier lienLien copié sur presse-papiers!
As a cluster administrator, you can back up and restore applications running on OpenShift Container Platform by using the OpenShift API for Data Protection (OADP).
OADP backs up and restores Kubernetes resources and internal images, at the granularity of a namespace, by using the version of Velero that is appropriate for the version of OADP you install, according to the table in Downloading the Velero CLI tool. OADP backs up and restores persistent volumes (PVs) by using snapshots or Restic. For details, see OADP features.
1.2.1. OADP requirements Copier lienLien copié sur presse-papiers!
OADP has the following requirements:
-
You must be logged in as a user with a role.
cluster-admin You must have object storage for storing backups, such as one of the following storage types:
- OpenShift Data Foundation
- Amazon Web Services
- Microsoft Azure
- Google Cloud
- S3-compatible object storage
- IBM Cloud® Object Storage S3
If you want to use CSI backup on OCP 4.11 and later, install OADP 1.1.x.
OADP 1.0.x does not support CSI backup on OCP 4.11 and later. OADP 1.0.x includes Velero 1.7.x and expects the API group
snapshot.storage.k8s.io/v1beta1
The
CloudStorage
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
To back up PVs with snapshots, you must have cloud storage that has a native snapshot API or supports Container Storage Interface (CSI) snapshots, such as the following providers:
- Amazon Web Services
- Microsoft Azure
- Google Cloud
- CSI snapshot-enabled cloud storage, such as Ceph RBD or Ceph FS
If you do not want to back up PVs by using snapshots, you can use Restic, which is installed by the OADP Operator by default.
1.2.2. Backing up and restoring applications Copier lienLien copié sur presse-papiers!
You back up applications by creating a
Backup
- Creating backup hooks to run commands before or after the backup operation
- Scheduling backups
- Backing up applications with File System Backup: Kopia or Restic
-
You restore application backups by creating a (CR). See Creating a Restore CR.
Restore - You can configure restore hooks to run commands in init containers or in the application container during the restore operation.
Chapter 2. Shutting down the cluster gracefully Copier lienLien copié sur presse-papiers!
This document describes the process to gracefully shut down your cluster. You might need to temporarily shut down your cluster for maintenance reasons, or to save on resource costs.
2.1. Prerequisites Copier lienLien copié sur presse-papiers!
Take an etcd backup prior to shutting down the cluster.
ImportantIt is important to take an etcd backup before performing this procedure so that your cluster can be restored if you encounter any issues when restarting the cluster.
For example, the following conditions can cause the restarted cluster to malfunction:
- etcd data corruption during shutdown
- Node failure due to hardware
- Network connectivity issues
If your cluster fails to recover, follow the steps to restore to a previous cluster state.
2.2. Shutting down the cluster Copier lienLien copié sur presse-papiers!
You can shut down your cluster in a graceful manner so that it can be restarted at a later date.
You can shut down a cluster until a year from the installation date and expect it to restart gracefully. After a year from the installation date, the cluster certificates expire. However, you might need to manually approve the pending certificate signing requests (CSRs) to recover kubelet certificates when the cluster restarts.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin - You have taken an etcd backup.
Procedure
If you are shutting the cluster down for an extended period, determine the date on which certificates expire and run the following command:
$ oc -n openshift-kube-apiserver-operator get secret kube-apiserver-to-kubelet-signer -o jsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}'Example output
2022-08-05T14:37:50Zuser@user:~ $1 - 1
- To ensure that the cluster can restart gracefully, plan to restart it on or before the specified date. As the cluster restarts, the process might require you to manually approve the pending certificate signing requests (CSRs) to recover kubelet certificates.
Mark all the nodes in the cluster as unschedulable. You can do this from your cloud provider’s web console, or by running the following loop:
$ for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc adm cordon ${node} ; doneExample output
ci-ln-mgdnf4b-72292-n547t-master-0 node/ci-ln-mgdnf4b-72292-n547t-master-0 cordoned ci-ln-mgdnf4b-72292-n547t-master-1 node/ci-ln-mgdnf4b-72292-n547t-master-1 cordoned ci-ln-mgdnf4b-72292-n547t-master-2 node/ci-ln-mgdnf4b-72292-n547t-master-2 cordoned ci-ln-mgdnf4b-72292-n547t-worker-a-s7ntl node/ci-ln-mgdnf4b-72292-n547t-worker-a-s7ntl cordoned ci-ln-mgdnf4b-72292-n547t-worker-b-cmc9k node/ci-ln-mgdnf4b-72292-n547t-worker-b-cmc9k cordoned ci-ln-mgdnf4b-72292-n547t-worker-c-vcmtn node/ci-ln-mgdnf4b-72292-n547t-worker-c-vcmtn cordonedEvacuate the pods using the following method:
$ for node in $(oc get nodes -l node-role.kubernetes.io/worker -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc adm drain ${node} --delete-emptydir-data --ignore-daemonsets=true --timeout=15s --force ; doneShut down all of the nodes in the cluster. You can do this from the web console for your cloud provider web console, or by running the following loop. Shutting down the nodes by using one of these methods allows pods to terminate gracefully, which reduces the chance for data corruption.
NoteEnsure that the control plane node with the API VIP assigned is the last node processed in the loop. Otherwise, the shutdown command fails.
$ for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do oc debug node/${node} -- chroot /host shutdown -h 1; done1 - 1
-h 1indicates how long, in minutes, this process lasts before the control plane nodes are shut down. For large-scale clusters with 10 nodes or more, set to-h 10or longer to make sure all the compute nodes have time to shut down first.
Example output
Starting pod/ip-10-0-130-169us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Shutdown scheduled for Mon 2021-09-13 09:36:17 UTC, use 'shutdown -c' to cancel. Removing debug pod ... Starting pod/ip-10-0-150-116us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Shutdown scheduled for Mon 2021-09-13 09:36:29 UTC, use 'shutdown -c' to cancel.NoteIt is not necessary to drain control plane nodes of the standard pods that ship with OpenShift Container Platform prior to shutdown. Cluster administrators are responsible for ensuring a clean restart of their own workloads after the cluster is restarted. If you drained control plane nodes prior to shutdown because of custom workloads, you must mark the control plane nodes as schedulable before the cluster will be functional again after restart.
Shut off any cluster dependencies that are no longer needed, such as external storage or an LDAP server. Be sure to consult your vendor’s documentation before doing so.
ImportantIf you deployed your cluster on a cloud-provider platform, do not shut down, suspend, or delete the associated cloud resources. If you delete the cloud resources of a suspended virtual machine, OpenShift Container Platform might not restore successfully.
Chapter 3. Restarting the cluster gracefully Copier lienLien copié sur presse-papiers!
This document describes the process to restart your cluster after a graceful shutdown.
Even though the cluster is expected to be functional after the restart, the cluster might not recover due to unexpected conditions, for example:
- etcd data corruption during shutdown
- Node failure due to hardware
- Network connectivity issues
If your cluster fails to recover, follow the steps to restore to a previous cluster state.
3.1. Prerequisites Copier lienLien copié sur presse-papiers!
- You have gracefully shut down your cluster.
3.2. Restarting the cluster Copier lienLien copié sur presse-papiers!
You can restart your cluster after it has been shut down gracefully.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin - This procedure assumes that you gracefully shut down the cluster.
Procedure
Turn on the control plane nodes.
If you are using the
from the cluster installation and the API virtual IP address (VIP) is up, complete the following steps:admin.kubeconfig-
Set the environment variable to the
KUBECONFIGpath.admin.kubeconfig For each control plane node in the cluster, run the following command:
$ oc adm uncordon <node>
-
Set the
If you do not have access to your
credentials, complete the following steps:admin.kubeconfig- Use SSH to connect to a control plane node.
-
Copy the file to the
localhost-recovery.kubeconfigdirectory./root Use that file to run the following command for each control plane node in the cluster:
$ oc adm uncordon <node>
- Power on any cluster dependencies, such as external storage or an LDAP server.
Start all cluster machines.
Use the appropriate method for your cloud environment to start the machines, for example, from your cloud provider’s web console.
Wait approximately 10 minutes before continuing to check the status of control plane nodes.
Verify that all control plane nodes are ready.
$ oc get nodes -l node-role.kubernetes.io/masterThe control plane nodes are ready if the status is
, as shown in the following output:ReadyNAME STATUS ROLES AGE VERSION ip-10-0-168-251.ec2.internal Ready master 75m v1.27.3 ip-10-0-170-223.ec2.internal Ready master 75m v1.27.3 ip-10-0-211-16.ec2.internal Ready master 75m v1.27.3If the control plane nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.
Get the list of current CSRs:
$ oc get csrReview the details of a CSR to verify that it is valid:
$ oc describe csr <csr_name>1 - 1
<csr_name>is the name of a CSR from the list of current CSRs.
Approve each valid CSR:
$ oc adm certificate approve <csr_name>
After the control plane nodes are ready, verify that all worker nodes are ready.
$ oc get nodes -l node-role.kubernetes.io/workerThe worker nodes are ready if the status is
, as shown in the following output:ReadyNAME STATUS ROLES AGE VERSION ip-10-0-179-95.ec2.internal Ready worker 64m v1.27.3 ip-10-0-182-134.ec2.internal Ready worker 64m v1.27.3 ip-10-0-250-100.ec2.internal Ready worker 64m v1.27.3If the worker nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.
Get the list of current CSRs:
$ oc get csrReview the details of a CSR to verify that it is valid:
$ oc describe csr <csr_name>1 - 1
<csr_name>is the name of a CSR from the list of current CSRs.
Approve each valid CSR:
$ oc adm certificate approve <csr_name>
After the control plane and compute nodes are ready, mark all the nodes in the cluster as schedulable by running the following command:
$ for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc adm uncordon ${node} ; doneVerify that the cluster started properly.
Check that there are no degraded cluster Operators.
$ oc get clusteroperatorsCheck that there are no cluster Operators with the
condition set toDEGRADED.TrueNAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.14.0 True False False 59m cloud-credential 4.14.0 True False False 85m cluster-autoscaler 4.14.0 True False False 73m config-operator 4.14.0 True False False 73m console 4.14.0 True False False 62m csi-snapshot-controller 4.14.0 True False False 66m dns 4.14.0 True False False 76m etcd 4.14.0 True False False 76m ...Check that all nodes are in the
state:Ready$ oc get nodesCheck that the status for all nodes is
.ReadyNAME STATUS ROLES AGE VERSION ip-10-0-168-251.ec2.internal Ready master 82m v1.27.3 ip-10-0-170-223.ec2.internal Ready master 82m v1.27.3 ip-10-0-179-95.ec2.internal Ready worker 70m v1.27.3 ip-10-0-182-134.ec2.internal Ready worker 70m v1.27.3 ip-10-0-211-16.ec2.internal Ready master 82m v1.27.3 ip-10-0-250-100.ec2.internal Ready worker 69m v1.27.3If the cluster did not start properly, you might need to restore your cluster using an etcd backup.
Chapter 4. OADP Application backup and restore Copier lienLien copié sur presse-papiers!
4.1. Introduction to OpenShift API for Data Protection Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) product safeguards customer applications on OpenShift Container Platform. It offers comprehensive disaster recovery protection, covering OpenShift Container Platform applications, application-related cluster resources, persistent volumes, and internal images. OADP is also capable of backing up both containerized applications and virtual machines (VMs).
However, OADP does not serve as a disaster recovery solution for etcd or {OCP-short} Operators.
OADP support is provided to customer workload namespaces, and cluster scope resources.
Full cluster backup and restore are not supported.
4.1.1. OpenShift API for Data Protection APIs Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) provides APIs that enable multiple approaches to customizing backups and preventing the inclusion of unnecessary or inappropriate resources.
OADP provides the following APIs:
4.1.1.1. Support for OpenShift API for Data Protection Copier lienLien copié sur presse-papiers!
| Version | OCP version | General availability | Full support ends | Maintenance ends | Extended Update Support (EUS) | Extended Update Support Term 2 (EUS Term 2) |
| 1.4 |
| 10 Jul 2024 | Release of 1.5 | Release of 1.6 | 27 Jun 2026 EUS must be on OCP 4.16 | 27 Jun 2027 EUS Term 2 must be on OCP 4.16 |
| 1.3 |
| 29 Nov 2023 | 10 Jul 2024 | Release of 1.5 | 31 Oct 2025 EUS must be on OCP 4.14 | 31 Oct 2026 EUS Term 2 must be on OCP 4.14 |
4.1.1.1.1. Unsupported versions of the OADP Operator Copier lienLien copié sur presse-papiers!
| Version | General availability | Full support ended | Maintenance ended |
| 1.2 | 14 Jun 2023 | 29 Nov 2023 | 10 Jul 2024 |
| 1.1 | 01 Sep 2022 | 14 Jun 2023 | 29 Nov 2023 |
| 1.0 | 09 Feb 2022 | 01 Sep 2022 | 14 Jun 2023 |
For more details about EUS, see Extended Update Support.
For more details about EUS Term 2, see Extended Update Support Term 2.
4.2. OADP release notes Copier lienLien copié sur presse-papiers!
4.2.1. OADP 1.4 release notes Copier lienLien copié sur presse-papiers!
The release notes for OpenShift API for Data Protection (OADP) describe new features and enhancements, deprecated features, product recommendations, known issues, and resolved issues.
For additional information about OADP, see OpenShift API for Data Protection (OADP) FAQs
4.2.1.1. OADP 1.4.8 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.4.8 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.4.7. OADP 1.4.8 fixes several Common Vulnerabilities and Exposures (CVEs).
4.2.1.1.1. Resolved issues Copier lienLien copié sur presse-papiers!
- OADP 1.4.8 fixes the following CVEs
4.2.1.2. OADP 1.4.7 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.4.7 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.4.6.
4.2.1.3. OADP 1.4.6 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.4.6 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.4.5.
4.2.1.4. OADP 1.4.5 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.4.5 release notes lists new features and resolved issues.
4.2.1.4.1. New features Copier lienLien copié sur presse-papiers!
Collecting logs with the must-gather tool has been improved with a Markdown summary
You can collect logs and information about OpenShift API for Data Protection (OADP) custom resources by using the
must-gather
must-gather
must-gather
4.2.1.4.2. Resolved issues Copier lienLien copié sur presse-papiers!
- OADP 1.4.5 fixes the following CVEs
4.2.1.5. OADP 1.4.4 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.4.4 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.4.3.
4.2.1.5.1. Known issues Copier lienLien copié sur presse-papiers!
Issue with restoring stateful applications
When you restore a stateful application that uses the
azurefile-csi
Finalizing
4.2.1.6. OADP 1.4.3 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.4.3 release notes lists the following new feature.
4.2.1.6.1. New features Copier lienLien copié sur presse-papiers!
Notable changes in the kubevirt velero plugin in version 0.7.1
With this release, the
kubevirt
- Virtual machine instances (VMIs) are no longer ignored from backup when the owner VM is excluded.
- Object graphs now include all extra objects during backup and restore operations.
- Optionally generated labels are now added to new firmware Universally Unique Identifiers (UUIDs) during restore operations.
- Switching VM run strategies during restore operations is now possible.
- Clearing a MAC address by label is now supported.
- The restore-specific checks during the backup operation are now skipped.
-
The and
VirtualMachineClusterInstancetypecustom resource definitions (CRDs) are now supported.VirtualMachineClusterPreference
4.2.1.7. OADP 1.4.2 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.4.2 release notes lists new features, resolved issues and bugs, and known issues.
4.2.1.7.1. New features Copier lienLien copié sur presse-papiers!
Backing up different volumes in the same namespace by using the VolumePolicy feature is now possible
With this release, Velero provides resource policies to back up different volumes in the same namespace by using the
VolumePolicy
VolumePolicy
skip
snapshot
fs-backup
File system backup and data mover can now use short-term credentials
File system backup and data mover can now use short-term credentials such as AWS Security Token Service (STS) and Google Cloud WIF. With this support, backup is successfully completed without any
PartiallyFailed
4.2.1.7.2. Resolved issues Copier lienLien copié sur presse-papiers!
DPA now reports errors if VSL contains an incorrect provider value
Previously, if the provider of a Volume Snapshot Location (VSL) spec was incorrect, the Data Protection Application (DPA) reconciled successfully. With this update, DPA reports errors and requests for a valid provider value. OADP-5044
Data Mover restore is successful irrespective of using different OADP namespaces for backup and restore
Previously, when backup operation was executed by using OADP installed in one namespace but was restored by using OADP installed in a different namespace, the Data Mover restore failed. With this update, Data Mover restore is now successful. OADP-5460
SSE-C backup works with the calculated MD5 of the secret key
Previously, backup failed with the following error:
Requests specifying Server Side Encryption with Customer provided keys must provide the client calculated MD5 of the secret key.
With this update, missing Server-Side Encryption with Customer-Provided Keys (SSE-C) base64 and MD5 hash are now fixed. As a result, SSE-C backup works with the calculated MD5 of the secret key. In addition, incorrect
errorhandling
customerKey
For a complete list of all issues resolved in this release, see the list of OADP 1.4.2 resolved issues in Jira.
4.2.1.7.3. Known issues Copier lienLien copié sur presse-papiers!
The nodeSelector spec is not supported for the Data Mover restore action
When a Data Protection Application (DPA) is created with the
nodeSelector
nodeAgent
The S3 storage does not use proxy environment when TLS skip verify is specified
In the image registry backup, the S3 storage does not use the proxy environment when the
insecureSkipTLSVerify
true
Kopia does not delete artifacts after backup expiration
Even after you delete a backup, Kopia does not delete the volume artifacts from the
${bucket_name}/kopia/$openshift-adp
4.2.1.8. OADP 1.4.1 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.4.1 release notes lists new features, resolved issues and bugs, and known issues.
4.2.1.8.1. New features Copier lienLien copié sur presse-papiers!
New DPA fields to update client qps and burst
You can now change Velero Server Kubernetes API queries per second and burst values by using the new Data Protection Application (DPA) fields. The new DPA fields are
spec.configuration.velero.client-qps
spec.configuration.velero.client-burst
Enabling non-default algorithms with Kopia
With this update, you can now configure the hash, encryption, and splitter algorithms in Kopia to select non-default options to optimize performance for different backup workloads.
To configure these algorithms, set the
env
velero
podConfig
4.2.1.8.2. Resolved issues Copier lienLien copié sur presse-papiers!
Restoring a backup without pods is now successful
Previously, restoring a backup without pods and having
StorageClass VolumeBindingMode
WaitForFirstConsumer
PartiallyFailed
fail to patch dynamic PV, err: context deadline exceeded
PartiallyFailed
PodVolumeBackup CR now displays correct message
Previously, the
PodVolumeBackup
get a podvolumebackup with status "InProgress" during the server starting, mark it as "Failed"
found a podvolumebackup with status "InProgress" during the server starting,
mark it as "Failed".
Overriding imagePullPolicy is now possible with DPA
Previously, OADP set the
imagePullPolicy
Always
sha256
sha512
imagePullPolicy
IfNotPresent
imagePullPolicy
Always
spec.containerImagePullPolicy
OADP Velero can now retry updating the restore status if initial update fails
Previously, OADP Velero failed to update the restored CR status. This left the status at
InProgress
Completed
Failed
Restoring BuildConfig Build from a different cluster is successful without any errors
Previously, when performing a restore of the
BuildConfig
failed to verify certificate: x509: certificate signed by unknown authority
BuildConfig
failed to verify certificate
Restoring an empty PVC is successful
Previously, downloading data failed while restoring an empty persistent volume claim (PVC). It failed with the following error:
data path restore failed: Failed to run kopia restore: Unable to load
snapshot : snapshot not found
With this update, the downloading of data proceeds to correct conclusion when restoring an empty PVC and the error message is not generated. OADP-3106
There is no Velero memory leak in CSI and DataMover plugins
Previously, a Velero memory leak was caused by using the CSI and DataMover plugins. When the backup ended, the Velero plugin instance was not deleted and the memory leak consumed memory until an
Out of Memory
Post-hook operation does not start before the related PVs are released
Previously, due to the asynchronous nature of the Data Mover operation, a post-hook might be attempted before the Data Mover persistent volume claim (PVC) releases the persistent volumes (PVs) of the related pods. This problem would cause the backup to fail with a
PartiallyFailed
PartiallyFailed
Deploying a DPA works as expected in namespaces with more than 37 characters
When you install the OADP Operator in a namespace with more than 37 characters to create a new DPA, labeling the "cloud-credentials" Secret fails and the DPA reports the following error:
The generated label name is too long.
With this update, creating a DPA does not fail in namespaces with more than 37 characters in the name. OADP-3960
Restore is successfully completed by overriding the timeout error
Previously, in a large scale environment, the restore operation would result in a
Partiallyfailed
fail to patch dynamic PV, err: context deadline exceeded
resourceTimeout
For a complete list of all issues resolved in this release, see the list of OADP 1.4.1 resolved issues in Jira.
4.2.1.8.3. Known issues Copier lienLien copié sur presse-papiers!
Cassandra application pods enter into the CrashLoopBackoff status after restoring OADP
After OADP restores, the Cassandra application pods might enter
CrashLoopBackoff
StatefulSet
CrashLoopBackoff
StatefulSet
Deployment referencing ImageStream is not restored properly leading to corrupted pod and volume contents
During a File System Backup (FSB) restore operation, a
Deployment
ImageStream
postHook
During the restore operation, the OpenShift Container Platform controller updates the
spec.template.spec.containers[0].image
Deployment
ImageStreamTag
velero
The workaround for this behavior is a two-step restore process:
Perform a restore excluding the
resources, for example:Deployment$ velero restore create <RESTORE_NAME> \ --from-backup <BACKUP_NAME> \ --exclude-resources=deployment.appsOnce the first restore is successful, perform a second restore by including these resources, for example:
$ velero restore create <RESTORE_NAME> \ --from-backup <BACKUP_NAME> \ --include-resources=deployment.apps
4.2.1.9. OADP 1.4.0 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.4.0 release notes lists resolved issues and known issues.
4.2.1.9.1. Resolved issues Copier lienLien copié sur presse-papiers!
Restore works correctly in OpenShift Container Platform 4.16
Previously, while restoring the deleted application namespace, the restore operation partially failed with the
resource name may not be empty
Data Mover backups work properly in the OpenShift Container Platform 4.16 cluster
Previously, Velero was using the earlier version of SDK where the
Spec.SourceVolumeMode
For a complete list of all issues resolved in this release, see the list of OADP 1.4.0 resolved issues in Jira.
4.2.1.9.2. Known issues Copier lienLien copié sur presse-papiers!
Backup fails when checksumAlgorithm is not set for MCG
While performing a backup of any application with Noobaa as the backup location, if the
checksumAlgorithm
checksumAlgorithm
For a complete list of all known issues in this release, see the list of OADP 1.4.0 known issues in Jira.
4.2.1.9.3. Upgrade notes Copier lienLien copié sur presse-papiers!
Always upgrade to the next minor version. Do not skip versions. To update to a later version, upgrade only one channel at a time. For example, to upgrade from OpenShift API for Data Protection (OADP) 1.1 to 1.3, upgrade first to 1.2, and then to 1.3.
4.2.1.9.3.1. Changes from OADP 1.3 to 1.4 Copier lienLien copié sur presse-papiers!
The Velero server has been updated from version 1.12 to 1.14. Note that there are no changes in the Data Protection Application (DPA).
This changes the following:
-
The code is now available in the Velero code, which means an
velero-plugin-for-csicontainer is no longer required for the plugin.init - Velero changed client Burst and QPS defaults from 30 and 20 to 100 and 100, respectively.
The
plugin updated default value of thevelero-plugin-for-awsfield inspec.config.checksumAlgorithmobjects (BSLs) fromBackupStorageLocation(no checksum calculation) to the""algorithm. For more information, see Velero plugins for AWS Backup Storage Location. The checksum algorithm types are known to work only with AWS. Several S3 providers require theCRC32to be disabled by setting the checksum algorithm tomd5sum. Confirm""algorithm support and configuration with your storage provider.md5sumIn OADP 1.4, the default value for BSLs created within DPA for this configuration is
. This default value means that the""is not checked, which is consistent with OADP 1.3. For BSLs created within DPA, update it by using themd5sumfield in the DPA. If your BSLs are created outside DPA, you can update this configuration by usingspec.backupLocations[].velero.config.checksumAlgorithmin the BSLs.spec.config.checksumAlgorithm
4.2.1.9.3.2. Backing up the DPA configuration Copier lienLien copié sur presse-papiers!
You must back up your current
DataProtectionApplication
Procedure
Save your current DPA configuration by running the following command:
Example command
$ oc get dpa -n openshift-adp -o yaml > dpa.orig.backup
4.2.1.9.3.3. Upgrading the OADP Operator Copier lienLien copié sur presse-papiers!
Use the following procedure when upgrading the OpenShift API for Data Protection (OADP) Operator.
Procedure
-
Change your subscription channel for the OADP Operator from to
stable-1.3.stable-1.4 - Wait for the Operator and containers to update and restart.
4.2.1.9.4. Converting DPA to the new version Copier lienLien copié sur presse-papiers!
To upgrade from OADP 1.3 to 1.4, no Data Protection Application (DPA) changes are required.
4.2.1.9.5. Verifying the upgrade Copier lienLien copié sur presse-papiers!
Use the following procedure to verify the upgrade.
Procedure
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpExample output
NAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/restic-9cq4q 1/1 Running 0 94s pod/restic-m4lts 1/1 Running 0 94s pod/restic-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/restic 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'Example output
{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpExample output
NAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true
4.2.2. OADP 1.3 release notes Copier lienLien copié sur presse-papiers!
The release notes for OpenShift API for Data Protection (OADP) 1.3 describe new features and enhancements, deprecated features, product recommendations, known issues, and resolved issues.
4.2.2.1. OADP 1.3.9 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.3.9 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.3.8. OADP 1.3.9 fixes several Common Vulnerabilities and Exposures (CVEs).
4.2.2.1.1. Resolved issues Copier lienLien copié sur presse-papiers!
- OADP 1.3.9 fixes the following CVEs
4.2.2.2. OADP 1.3.8 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.3.8 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.3.7.
4.2.2.3. OADP 1.3.7 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.3.7 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.3.6.
The following Common Vulnerabilities and Exposures (CVEs) have been fixed in OADP 1.3.7
4.2.2.3.1. New features Copier lienLien copié sur presse-papiers!
Collecting logs with the must-gather tool has been improved with a Markdown summary
You can collect logs and information about OADP custom resources by using the
must-gather
must-gather
must-gather
4.2.2.4. OADP 1.3.6 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.3.6 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.3.5.
4.2.2.5. OADP 1.3.5 release notes Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.3.5 is a Container Grade Only (CGO) release, which is released to refresh the health grades of the containers. No code was changed in the product itself compared to that of OADP 1.3.4.
4.2.2.6. OADP 1.3.4 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.3.4 release notes list resolved issues and known issues.
4.2.2.6.1. Resolved issues Copier lienLien copié sur presse-papiers!
The backup spec.resourcepolicy.kind parameter is now case-insensitive
Previously, the
backup spec.resourcepolicy.kind
Use olm.maxOpenShiftVersion to prevent cluster upgrade to OCP 4.16 version
The cluster
operator-lifecycle-manager
olm.maxOpenShiftVersion
BSL and VSL are removed from the cluster
Previously, when any Data Protection Application (DPA) was modified to remove the Backup Storage Locations (BSL) or Volume Snapshot Locations (VSL) from the
backupLocations
snapshotLocations
DPA reconciles and validates the secret key
Previously, the Data Protection Application (DPA) reconciled successfully on the wrong Volume Snapshot Locations (VSL) secret key name. With this update, DPA validates the secret key name before reconciling on any VSL. OADP-3052
Velero’s cloud credential permissions are now restrictive
Previously, Velero’s cloud credential permissions were mounted with the 0644 permissions. As a consequence, any one could read the
/credentials/cloud
Warning is displayed when ArgoCD managed namespace is included in the backup
A warning is displayed during the backup operation when ArgoCD and Velero manage the same namespace. OADP-4736
The list of security fixes that are included in this release is documented in the RHSA-2024:9960 advisory.
For a complete list of all issues resolved in this release, see the list of OADP 1.3.4 resolved issues in Jira.
4.2.2.6.2. Known issues Copier lienLien copié sur presse-papiers!
Cassandra application pods enter into the CrashLoopBackoff status after restore
After OADP restores, the Cassandra application pods might enter the
CrashLoopBackoff
StatefulSet
CrashLoopBackoff
StatefulSet
defaultVolumesToFSBackup and defaultVolumesToFsBackup flags are not identical
The
dpa.spec.configuration.velero.defaultVolumesToFSBackup
backup.spec.defaultVolumesToFsBackup
PodVolumeRestore works even though the restore is marked as failed
The
podvolumerestore
Velero is unable to skip restoring of initContainer spec
Velero might restore the
restore-wait init
4.2.2.7. OADP 1.3.3 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.3.3 release notes list resolved issues and known issues.
4.2.2.7.1. Resolved issues Copier lienLien copié sur presse-papiers!
OADP fails when its namespace name is longer than 37 characters
When installing the OADP Operator in a namespace with more than 37 characters and when creating a new DPA, labeling the
cloud-credentials
OADP image PullPolicy set to Always
In previous versions of OADP, the image PullPolicy of the adp-controller-manager and Velero pods was set to
Always
openshift-adp-controller-manager
IfNotPresent
The list of security fixes that are included in this release is documented in the RHSA-2024:4982 advisory.
For a complete list of all issues resolved in this release, see the list of OADP 1.3.3 resolved issues in Jira.
4.2.2.7.2. Known issues Copier lienLien copié sur presse-papiers!
Cassandra application pods enter into the CrashLoopBackoff status after restoring OADP
After OADP restores, the Cassandra application pods might enter in the
CrashLoopBackoff
StatefulSet
CrashLoopBackoff
StatefulSet
4.2.2.8. OADP 1.3.2 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.3.2 release notes list resolved issues and known issues.
4.2.2.8.1. Resolved issues Copier lienLien copié sur presse-papiers!
DPA fails to reconcile if a valid custom secret is used for BSL
DPA fails to reconcile if a valid custom secret is used for Backup Storage Location (BSL), but the default secret is missing. The workaround is to create the required default
cloud-credentials
CVE-2023-45290: oadp-velero-container: Golang net/http: Memory exhaustion in Request.ParseMultipartForm
A flaw was found in the
net/http
multipart
Request.ParseMultipartForm
Request.FormValue
Request.PostFormValue
Request.FormFile
For more details, see CVE-2023-45290.
CVE-2023-45289: oadp-velero-container: Golang net/http/cookiejar: Incorrect forwarding of sensitive headers and cookies on HTTP redirect
A flaw was found in the
net/http/cookiejar
http.Client
Authorization
Cookie
For more details, see CVE-2023-45289.
CVE-2024-24783: oadp-velero-container: Golang crypto/x509: Verify panics on certificates with an unknown public key algorithm
A flaw was found in the
crypto/x509
Certificate.Verify
crypto/tls
Config.ClientAuth
VerifyClientCertIfGiven
RequireAndVerifyClientCert
For more details, see CVE-2024-24783.
CVE-2024-24784: oadp-velero-plugin-container: Golang net/mail: Comments in display names are incorrectly handled
A flaw was found in the
net/mail
ParseAddressList
For more details, see CVE-2024-24784.
CVE-2024-24785: oadp-velero-container: Golang: html/template: errors returned from MarshalJSON methods may break template escaping
A flaw was found in the
html/template
MarshalJSON
For more details, see CVE-2024-24785.
For a complete list of all issues resolved in this release, see the list of OADP 1.3.2 resolved issues in Jira.
4.2.2.8.2. Known issues Copier lienLien copié sur presse-papiers!
Cassandra application pods enter into the CrashLoopBackoff status after restoring OADP
After OADP restores, the Cassandra application pods might enter in the
CrashLoopBackoff
StatefulSet
CrashLoopBackoff
StatefulSet
4.2.2.9. OADP 1.3.1 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.3.1 release notes lists new features and resolved issues.
4.2.2.9.1. New features Copier lienLien copié sur presse-papiers!
OADP 1.3.0 Data Mover is now fully supported
The OADP built-in Data Mover, introduced in OADP 1.3.0 as a Technology Preview, is now fully supported for both containerized and virtual machine workloads.
4.2.2.9.2. Resolved issues Copier lienLien copié sur presse-papiers!
IBM Cloud(R) Object Storage is now supported as a backup storage provider
IBM Cloud® Object Storage is one of the AWS S3 compatible backup storage providers, which was unsupported previously. With this update, IBM Cloud® Object Storage is now supported as an AWS S3 compatible backup storage provider.
OADP operator now correctly reports the missing region error
Previously, when you specified
profile:default
region
missing region
missing region
Custom labels are not removed from the openshift-adp namespace
Previously, the
openshift-adp-controller-manager
openshift-adp
openshift-adp
OADP must-gather image collects CRDs
Previously, the OADP
must-gather
omg
must-gather
omg
Garbage collection has the correct description for the default frequency value
Previously, the
garbage-collection-frequency
garbage-collection-frequency
gc-controller
FIPS Mode flag is available in OperatorHub
By setting the
fips-compliant
true
CSI plugin does not panic with a nil pointer when csiSnapshotTimeout is set to a short duration
Previously, when the
csiSnapshotTimeout
plugin panicked: runtime error: invalid memory address or nil pointer dereference
With this fix, the backup fails with the following error:
Timed out awaiting reconciliation of volumesnapshot
For a complete list of all issues resolved in this release, see the list of OADP 1.3.1 resolved issues in Jira.
4.2.2.9.3. Known issues Copier lienLien copié sur presse-papiers!
Backup and storage restrictions for Single-node OpenShift clusters deployed on IBM Power(R) and IBM Z(R) platforms
Review the following backup and storage related restrictions for Single-node OpenShift clusters that are deployed on IBM Power® and IBM Z® platforms:
- Storage
- Only NFS storage is currently compatible with single-node OpenShift clusters deployed on IBM Power® and IBM Z® platforms.
- Backup
-
Only the backing up applications with File System Backup such as
kopiaandresticare supported for backup and restore operations.
Cassandra application pods enter in the CrashLoopBackoff status after restoring OADP
After OADP restores, the Cassandra application pods might enter in the
CrashLoopBackoff
StatefulSet
CrashLoopBackoff
StatefulSet
4.2.2.10. OADP 1.3.0 release notes Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) 1.3.0 release notes lists new features, resolved issues and bugs, and known issues.
4.2.2.10.1. New features Copier lienLien copié sur presse-papiers!
Velero built-in DataMover is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
OADP 1.3 includes a built-in Data Mover that you can use to move Container Storage Interface (CSI) volume snapshots to a remote object store. The built-in Data Mover allows you to restore stateful applications from the remote object store if a failure, accidental deletion, or corruption of the cluster occurs. It uses Kopia as the uploader mechanism to read the snapshot data and to write to the Unified Repository.
Backing up applications with File System Backup: Kopia or Restic
Velero’s File System Backup (FSB) supports two backup libraries: the Restic path and the Kopia path.
Velero allows users to select between the two paths.
For backup, specify the path during the installation through the
uploader-type
restic
kopia
kopia
Google Cloud authentication
Google Cloud authentication enables you to use short-lived Google credentials.
Google Cloud with Workload Identity Federation enables you to use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This eliminates the maintenance and security risks associated with service account keys.
AWS ROSA STS authentication
You can use OpenShift API for Data Protection (OADP) with Red Hat OpenShift Service on AWS (ROSA) clusters to backup and restore application data.
ROSA provides seamless integration with a wide range of AWS compute, database, analytics, machine learning, networking, mobile, and other services to speed up the building and delivering of differentiating experiences to your customers.
You can subscribe to the service directly from your AWS account.
After the clusters are created, you can operate your clusters by using the OpenShift web console. The ROSA service also uses OpenShift APIs and command-line interface (CLI) tools.
4.2.2.10.2. Resolved issues Copier lienLien copié sur presse-papiers!
ACM applications were removed and re-created on managed clusters after restore
Applications on managed clusters were deleted and re-created upon restore activation. OpenShift API for Data Protection (OADP 1.2) backup and restore process is faster than the older versions. The OADP performance change caused this behavior when restoring ACM resources. Therefore, some resources were restored before other resources, which caused the removal of the applications from managed clusters. OADP-2686
Restic restore was partially failing due to Pod Security standard
During interoperability testing, OpenShift Container Platform 4.14 had the pod Security mode set to
enforce
podSecurity
Possible pod volume backup failure if Velero is installed in several namespaces
There was a regression in Pod Volume Backup (PVB) functionality when Velero was installed in several namespaces. The PVB controller was not properly limiting itself to PVBs in its own namespace. OADP-2308
OADP Velero plugins returning "received EOF, stopping recv loop" message
In OADP, Velero plugins were started as separate processes. When the Velero operation completes, either successfully or not, they exit. Therefore, if you see a
received EOF, stopping recv loop
CVE-2023-39325 Multiple HTTP/2 enabled web servers are vulnerable to a DDoS attack (Rapid Reset Attack)
In previous releases of OADP, the HTTP/2 protocol was susceptible to a denial of service attack because request cancellation could reset multiple streams quickly. The server had to set up and tear down the streams while not hitting any server-side limit for the maximum number of active streams per connection. This resulted in a denial of service due to server resource consumption.
For more information, see CVE-2023-39325 (Rapid Reset Attack)
For a complete list of all issues resolved in this release, see the list of OADP 1.3.0 resolved issues in Jira.
4.2.2.10.3. Known issues Copier lienLien copié sur presse-papiers!
CSI plugin errors on nil pointer when csiSnapshotTimeout is set to a short duration
The CSI plugin errors on nil pointer when
csiSnapshotTimeout
PartiallyFailed
plugin panicked: runtime error: invalid memory address or nil pointer dereference
Backup is marked as PartiallyFailed when volumeSnapshotContent CR has an error
If any of the
VolumeSnapshotContent
VolumeSnapshotBeingCreated
WaitingForPluginOperationsPartiallyFailed
Performance issues when restoring 30,000 resources for the first time
When restoring 30,000 resources for the first time, without an existing-resource-policy, it takes twice as long to restore them, than it takes during the second and third try with an existing-resource-policy set to
update
Post restore hooks might start running before Datadownload operation has released the related PV
Due to the asynchronous nature of the Data Mover operation, a post-hook might be attempted before the related pods persistent volumes (PVs) are released by the Data Mover persistent volume claim (PVC).
Google Cloud Workload Identity Federation VSL backup PartiallyFailed
VSL backup
PartiallyFailed
For a complete list of all known issues in this release, see the list of OADP 1.3.0 known issues in Jira.
4.2.2.10.4. Upgrade notes Copier lienLien copié sur presse-papiers!
Always upgrade to the next minor version. Do not skip versions. To update to a later version, upgrade only one channel at a time. For example, to upgrade from OpenShift API for Data Protection (OADP) 1.1 to 1.3, upgrade first to 1.2, and then to 1.3.
4.2.2.10.4.1. Changes from OADP 1.2 to 1.3 Copier lienLien copié sur presse-papiers!
The Velero server has been updated from version 1.11 to 1.12.
OpenShift API for Data Protection (OADP) 1.3 uses the Velero built-in Data Mover instead of the VolumeSnapshotMover (VSM) or the Volsync Data Mover.
This changes the following:
-
The field and the VSM plugin are not compatible with OADP 1.3, and you must remove the configuration from the
spec.features.dataMover(DPA) configuration.DataProtectionApplication - The Volsync Operator is no longer required for Data Mover functionality, and you can remove it.
-
The custom resource definitions and
volumesnapshotbackups.datamover.oadp.openshift.ioare no longer required, and you can remove them.volumesnapshotrestores.datamover.oadp.openshift.io - The secrets used for the OADP-1.2 Data Mover are no longer required, and you can remove them.
OADP 1.3 supports Kopia, which is an alternative file system backup tool to Restic.
To employ Kopia, use the new
field as shown in the following example:spec.configuration.nodeAgentExample
spec: configuration: nodeAgent: enable: true uploaderType: kopia # ...The
field is deprecated in OADP 1.3 and will be removed in a future version of OADP. To avoid seeing deprecation warnings, remove thespec.configuration.restickey and its values, and use the following new syntax:resticExample
spec: configuration: nodeAgent: enable: true uploaderType: restic # ...
In a future OADP release, it is planned that the
kopia
uploaderType
4.2.2.10.4.2. Upgrading from OADP 1.2 Technology Preview Data Mover Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) 1.2 Data Mover backups cannot be restored with OADP 1.3. To prevent a gap in the data protection of your applications, complete the following steps before upgrading to OADP 1.3:
Procedure
- If your cluster backups are sufficient and Container Storage Interface (CSI) storage is available, back up the applications with a CSI backup.
If you require off cluster backups:
-
Back up the applications with a file system backup that uses the options.
--default-volumes-to-fs-backup=true or backup.spec.defaultVolumesToFsBackup -
Back up the applications with your object storage plugins, for example, .
velero-plugin-for-aws
-
Back up the applications with a file system backup that uses the
The default timeout value for the Restic file system backup is one hour. In OADP 1.3.1 and later, the default timeout value for Restic and Kopia is four hours.
To restore OADP 1.2 Data Mover backup, you must uninstall OADP, and install and configure OADP 1.2.
4.2.2.10.4.3. Backing up the DPA configuration Copier lienLien copié sur presse-papiers!
You must back up your current
DataProtectionApplication
Procedure
Save your current DPA configuration by running the following command:
Example
$ oc get dpa -n openshift-adp -o yaml > dpa.orig.backup
4.2.2.10.4.4. Upgrading the OADP Operator Copier lienLien copié sur presse-papiers!
Use the following sequence when upgrading the OpenShift API for Data Protection (OADP) Operator.
Procedure
-
Change your subscription channel for the OADP Operator from to
stable-1.2.stable-1.3 - Allow time for the Operator and containers to update and restart.
4.2.2.10.4.5. Converting DPA to the new version Copier lienLien copié sur presse-papiers!
If you need to move backups off cluster with the Data Mover, reconfigure the
DataProtectionApplication
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- In the Provided APIs section, click View more.
- Click Create instance in the DataProtectionApplication box.
Click YAML View to display the current DPA parameters.
Example current DPA
spec: configuration: features: dataMover: enable: true credentialName: dm-credentials velero: defaultPlugins: - vsm - csi - openshift # ...Update the DPA parameters:
-
Remove the key and values from the DPA.
features.dataMover - Remove the VolumeSnapshotMover (VSM) plugin.
Add the
key and values.nodeAgentExample updated DPA
spec: configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - csi - openshift # ...
-
Remove the
- Wait for the DPA to reconcile successfully.
4.2.2.10.4.6. Verifying the upgrade Copier lienLien copié sur presse-papiers!
Use the following procedure to verify the upgrade.
Procedure
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpExample output
NAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'Example output
{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpExample output
NAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true
In OADP 1.3 you can start data movement off cluster per backup versus creating a
DataProtectionApplication
Example command
$ velero backup create example-backup --include-namespaces mysql-persistent --snapshot-move-data=true
Example configuration file
apiVersion: velero.io/v1
kind: Backup
metadata:
name: example-backup
namespace: openshift-adp
spec:
snapshotMoveData: true
includedNamespaces:
- mysql-persistent
storageLocation: dpa-sample-1
ttl: 720h0m0s
# ...
4.3. OADP performance Copier lienLien copié sur presse-papiers!
4.3.1. OADP recommended network settings Copier lienLien copié sur presse-papiers!
For a supported experience with OpenShift API for Data Protection (OADP), you should have a stable and resilient network across {OCP-short} nodes, S3 storage, and in supported cloud environments that meet {OCP-short} network requirement recommendations.
To ensure successful backup and restore operations for deployments with remote S3 buckets located off-cluster with suboptimal data paths, it is recommended that your network settings meet the following minimum requirements in such less optimal conditions:
- Bandwidth (network upload speed to object storage): Greater than 2 Mbps for small backups and 10-100 Mbps depending on the data volume for larger backups.
- Packet loss: 1%
- Packet corruption: 1%
- Latency: 100ms
Ensure that your OpenShift Container Platform network performs optimally and meets OpenShift Container Platform network requirements.
Although Red Hat provides supports for standard backup and restore failures, it does not provide support for failures caused by network settings that do not meet the recommended thresholds.
4.4. OADP features and plugins Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) features provide options for backing up and restoring applications.
The default plugins enable Velero to integrate with certain cloud providers and to back up and restore OpenShift Container Platform resources.
4.4.1. OADP features Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) supports the following features:
- Backup
You can use OADP to back up all applications on the OpenShift Platform, or you can filter the resources by type, namespace, or label.
OADP backs up Kubernetes objects and internal images by saving them as an archive file on object storage. OADP backs up persistent volumes (PVs) by creating snapshots with the native cloud snapshot API or with the Container Storage Interface (CSI). For cloud providers that do not support snapshots, OADP backs up resources and PV data with Restic.
NoteYou must exclude Operators from the backup of an application for backup and restore to succeed.
- Restore
You can restore resources and PVs from a backup. You can restore all objects in a backup or filter the objects by namespace, PV, or label.
NoteYou must exclude Operators from the backup of an application for backup and restore to succeed.
- Schedule
- You can schedule backups at specified intervals.
- Hooks
-
You can use hooks to run commands in a container on a pod, for example,
fsfreezeto freeze a file system. You can configure a hook to run before or after a backup or restore. Restore hooks can run in an init container or in the application container.
4.4.2. OADP plugins Copier lienLien copié sur presse-papiers!
The OpenShift API for Data Protection (OADP) provides default Velero plugins that are integrated with storage providers to support backup and snapshot operations. You can create custom plugins based on the Velero plugins.
OADP also provides plugins for OpenShift Container Platform resource backups, OpenShift Virtualization resource backups, and Container Storage Interface (CSI) snapshots.
| OADP plugin | Function | Storage location |
|---|---|---|
|
| Backs up and restores Kubernetes objects. | AWS S3 |
| Backs up and restores volumes with snapshots. | AWS EBS | |
|
| Backs up and restores Kubernetes objects. | Microsoft Azure Blob storage |
| Backs up and restores volumes with snapshots. | Microsoft Azure Managed Disks | |
|
| Backs up and restores Kubernetes objects. | Google Cloud Storage |
| Backs up and restores volumes with snapshots. | Google Compute Engine Disks | |
|
| Backs up and restores OpenShift Container Platform resources. [1] | Object store |
|
| Backs up and restores OpenShift Virtualization resources. [2] | Object store |
|
| Backs up and restores volumes with CSI snapshots. [3] | Cloud storage that supports CSI snapshots |
|
| VolumeSnapshotMover relocates snapshots from the cluster into an object store to be used during a restore process to recover stateful applications, in situations such as cluster deletion. [4] | Object store |
- Mandatory.
- Virtual machine disks are backed up with CSI snapshots or Restic.
The
plugin uses the Kubernetes CSI snapshot API.csi-
OADP 1.1 or later uses
snapshot.storage.k8s.io/v1 -
OADP 1.0 uses
snapshot.storage.k8s.io/v1beta1
-
OADP 1.1 or later uses
- OADP 1.2 only.
4.4.3. About OADP Velero plugins Copier lienLien copié sur presse-papiers!
You can configure two types of plugins when you install Velero:
- Default cloud provider plugins
- Custom plugins
Both types of plugin are optional, but most users configure at least one cloud provider plugin.
4.4.3.1. Default Velero cloud provider plugins Copier lienLien copié sur presse-papiers!
You can install any of the following default Velero cloud provider plugins when you configure the
oadp_v1alpha1_dpa.yaml
-
(Amazon Web Services)
aws -
(Google Cloud)
gcp -
(Microsoft Azure)
azure -
(OpenShift Velero plugin)
openshift -
(Container Storage Interface)
csi -
(KubeVirt)
kubevirt
You specify the desired default plugins in the
oadp_v1alpha1_dpa.yaml
Example file
The following
.yaml
openshift
aws
azure
gcp
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: dpa-sample
spec:
configuration:
velero:
defaultPlugins:
- openshift
- aws
- azure
- gcp
4.4.3.2. Custom Velero plugins Copier lienLien copié sur presse-papiers!
You can install a custom Velero plugin by specifying the plugin
image
name
oadp_v1alpha1_dpa.yaml
You specify the desired custom plugins in the
oadp_v1alpha1_dpa.yaml
Example file
The following
.yaml
openshift
azure
gcp
custom-plugin-example
quay.io/example-repo/custom-velero-plugin
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: dpa-sample
spec:
configuration:
velero:
defaultPlugins:
- openshift
- azure
- gcp
customPlugins:
- name: custom-plugin-example
image: quay.io/example-repo/custom-velero-plugin
4.4.3.3. Velero plugins returning "received EOF, stopping recv loop" message Copier lienLien copié sur presse-papiers!
Velero plugins are started as separate processes. After the Velero operation has completed, either successfully or not, they exit. Receiving a
received EOF, stopping recv loop
4.4.4. Supported architectures for OADP Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) supports the following architectures:
- AMD64
- ARM64
- PPC64le
- s390x
OADP 1.2.0 and later versions support the ARM64 architecture.
4.4.5. OADP support for IBM Power and IBM Z Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) is platform neutral. The information that follows relates only to IBM Power® and to IBM Z®.
- OADP 1.1.7 was tested successfully against OpenShift Container Platform 4.11 for both IBM Power® and IBM Z®. The sections that follow give testing and support information for OADP 1.1.7 in terms of backup locations for these systems.
- OADP 1.2.3 was tested successfully against OpenShift Container Platform 4.12, 4.13, 4.14, and 4.15 for both IBM Power® and IBM Z®. The sections that follow give testing and support information for OADP 1.2.3 in terms of backup locations for these systems.
- OADP 1.3.9 was tested successfully against OpenShift Container Platform 4.12, 4.13, 4.14, and 4.15 for both IBM Power® and IBM Z®. The sections that follow give testing and support information for OADP 1.3.9 in terms of backup locations for these systems.
- OADP 1.4.8 was tested successfully against OpenShift Container Platform 4.14, 4.15, and 4.16 for both IBM Power® and IBM Z®. The sections that follow give testing and support information for OADP 1.4.8 in terms of backup locations for these systems.
4.4.5.1. OADP support for target backup locations using IBM Power Copier lienLien copié sur presse-papiers!
- IBM Power® running with OpenShift Container Platform 4.12, 4.13, 4.14, and 4.15, and OADP 1.3.9 was tested successfully against an AWS S3 backup location target. Although the test involved only an AWS S3 target, Red Hat supports running IBM Power® with OpenShift Container Platform 4.13, 4.14, and 4.15, and OADP 1.3.9 against all S3 backup location targets, which are not AWS, as well.
- IBM Power® running with OpenShift Container Platform 4.14, 4.15, and 4.16, and OADP 1.4.8 was tested successfully against an AWS S3 backup location target. Although the test involved only an AWS S3 target, Red Hat supports running IBM Power® with OpenShift Container Platform 4.14, 4.15, and 4.16, and OADP 1.4.8 against all S3 backup location targets, which are not AWS, as well.
4.4.5.2. OADP testing and support for target backup locations using IBM Z Copier lienLien copié sur presse-papiers!
- IBM Z® running with OpenShift Container Platform 4.12, 4.13, 4.14, and 4.15, and 1.3.9 was tested successfully against an AWS S3 backup location target. Although the test involved only an AWS S3 target, Red Hat supports running IBM Z® with OpenShift Container Platform 4.13 4.14, and 4.15, and 1.3.9 against all S3 backup location targets, which are not AWS, as well.
- IBM Z® running with OpenShift Container Platform 4.14, 4.15, and 4.16, and 1.4.8 was tested successfully against an AWS S3 backup location target. Although the test involved only an AWS S3 target, Red Hat supports running IBM Z® with OpenShift Container Platform 4.14, 4.15, and 4.16, and 1.4.8 against all S3 backup location targets, which are not AWS, as well.
4.4.5.2.1. Known issue of OADP using IBM Power(R) and IBM Z(R) platforms Copier lienLien copié sur presse-papiers!
- Currently, there are backup method restrictions for Single-node OpenShift clusters deployed on IBM Power® and IBM Z® platforms. Only NFS storage is currently compatible with Single-node OpenShift clusters on these platforms. In addition, only the File System Backup (FSB) methods such as Kopia and Restic are supported for backup and restore operations. There is currently no workaround for this issue.
4.4.6. OADP plugins known issues Copier lienLien copié sur presse-papiers!
The following section describes known issues in OpenShift API for Data Protection (OADP) plugins:
4.4.6.1. Velero plugin panics during imagestream backups due to a missing secret Copier lienLien copié sur presse-papiers!
When the backup and the Backup Storage Location (BSL) are managed outside the scope of the Data Protection Application (DPA), the OADP controller, meaning the DPA reconciliation does not create the relevant
oadp-<bsl_name>-<bsl_provider>-registry-secret
When the backup is run, the OpenShift Velero plugin panics on the imagestream backup, with the following panic error:
024-02-27T10:46:50.028951744Z time="2024-02-27T10:46:50Z" level=error msg="Error backing up item"
backup=openshift-adp/<backup name> error="error executing custom action (groupResource=imagestreams.image.openshift.io,
namespace=<BSL Name>, name=postgres): rpc error: code = Aborted desc = plugin panicked:
runtime error: index out of range with length 1, stack trace: goroutine 94…
4.4.6.1.1. Workaround to avoid the panic error Copier lienLien copié sur presse-papiers!
To avoid the Velero plugin panic error, perform the following steps:
Label the custom BSL with the relevant label:
$ oc label backupstoragelocations.velero.io <bsl_name> app.kubernetes.io/component=bslAfter the BSL is labeled, wait until the DPA reconciles.
NoteYou can force the reconciliation by making any minor change to the DPA itself.
When the DPA reconciles, confirm that the relevant
has been created and that the correct registry data has been populated into it:oadp-<bsl_name>-<bsl_provider>-registry-secret$ oc -n openshift-adp get secret/oadp-<bsl_name>-<bsl_provider>-registry-secret -o json | jq -r '.data'
4.4.6.2. OpenShift ADP Controller segmentation fault Copier lienLien copié sur presse-papiers!
If you configure a DPA with both
cloudstorage
restic
openshift-adp-controller-manager
You can have either
velero
cloudstorage
-
If you have both and
velerodefined, thecloudstoragefails.openshift-adp-controller-manager -
If you have neither nor
velerodefined, thecloudstoragefails.openshift-adp-controller-manager
For more information about this issue, see OADP-1054.
4.4.6.2.1. OpenShift ADP Controller segmentation fault workaround Copier lienLien copié sur presse-papiers!
You must define either
velero
cloudstorage
openshift-adp-controller-manager
4.4.7. OADP and FIPS Copier lienLien copié sur presse-papiers!
Federal Information Processing Standards (FIPS) are a set of computer security standards developed by the United States federal government in line with the Federal Information Security Management Act (FISMA).
OpenShift API for Data Protection (OADP) has been tested and works on FIPS-enabled OpenShift Container Platform clusters.
4.5. OADP use cases Copier lienLien copié sur presse-papiers!
4.5.1. Backup using OpenShift API for Data Protection and Red Hat OpenShift Data Foundation (ODF) Copier lienLien copié sur presse-papiers!
Following is a use case for using OADP and ODF to back up an application.
4.5.1.1. Backing up an application using OADP and ODF Copier lienLien copié sur presse-papiers!
In this use case, you back up an application by using OADP and store the backup in an object storage provided by Red Hat OpenShift Data Foundation (ODF).
- You create an object bucket claim (OBC) to configure the backup storage location. You use ODF to configure an Amazon S3-compatible object storage bucket. ODF provides MultiCloud Object Gateway (NooBaa MCG) and Ceph Object Gateway, also known as RADOS Gateway (RGW), object storage service. In this use case, you use NooBaa MCG as the backup storage location.
-
You use the NooBaa MCG service with OADP by using the provider plugin.
aws - You configure the Data Protection Application (DPA) with the backup storage location (BSL).
- You create a backup custom resource (CR) and specify the application namespace to back up.
- You create and verify the backup.
Prerequisites
- You installed the OADP Operator.
- You installed the ODF Operator.
- You have an application with a database running in a separate namespace.
Procedure
Create an OBC manifest file to request a NooBaa MCG bucket as shown in the following example:
apiVersion: objectbucket.io/v1alpha1 kind: ObjectBucketClaim metadata: name: test-obc namespace: openshift-adp spec: storageClassName: openshift-storage.noobaa.io generateBucketName: test-backup-bucketwhere:
test-obc- Specifies the name of the object bucket claim.
test-backup-bucket- Specifies the name of the bucket.
Create the OBC by running the following command:
$ oc create -f <obc_file_name>where:
<obc_file_name>- Specifies the file name of the object bucket claim manifest.
When you create an OBC, ODF creates a
and asecretwith the same name as the object bucket claim. Theconfig maphas the bucket credentials, and thesecrethas information to access the bucket. To get the bucket name and bucket host from the generated config map, run the following command:config map$ oc extract --to=- cm/test-obcis the name of the OBC.test-obcExample output
# BUCKET_NAME backup-c20...41fd # BUCKET_PORT 443 # BUCKET_REGION # BUCKET_SUBREGION # BUCKET_HOST s3.openshift-storage.svcTo get the bucket credentials from the generated
, run the following command:secret$ oc extract --to=- secret/test-obcExample output
# AWS_ACCESS_KEY_ID ebYR....xLNMc # AWS_SECRET_ACCESS_KEY YXf...+NaCkdyC3QPymGet the public URL for the S3 endpoint from the s3 route in the
namespace by running the following command:openshift-storage$ oc get route s3 -n openshift-storageCreate a
file with the object bucket credentials as shown in the following command:cloud-credentials[default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Create the
secret with thecloud-credentialsfile content as shown in the following command:cloud-credentials$ oc create secret generic \ cloud-credentials \ -n openshift-adp \ --from-file cloud=cloud-credentialsConfigure the Data Protection Application (DPA) as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: oadp-backup namespace: openshift-adp spec: configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - aws - openshift - csi defaultSnapshotMoveData: true backupLocations: - velero: config: profile: "default" region: noobaa s3Url: https://s3.openshift-storage.svc s3ForcePathStyle: "true" insecureSkipTLSVerify: "true" provider: aws default: true credential: key: cloud name: cloud-credentials objectStorage: bucket: <bucket_name> prefix: oadpwhere:
defaultSnapshotMoveData-
Set to
trueto use the OADP Data Mover to enable movement of Container Storage Interface (CSI) snapshots to a remote object storage. s3Url- Specifies the S3 URL of ODF storage.
<bucket_name>- Specifies the bucket name.
Create the DPA by running the following command:
$ oc apply -f <dpa_filename>Verify that the DPA is created successfully by running the following command. In the example output, you can see the
object hasstatusfield set totype. This means, the DPA is successfully created.Reconciled$ oc get dpa -o yamlExample output
apiVersion: v1 items: - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: namespace: openshift-adp #...# spec: backupLocations: - velero: config: #...# status: conditions: - lastTransitionTime: "20....9:54:02Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled kind: List metadata: resourceVersion: ""Verify that the backup storage location (BSL) is available by running the following command:
$ oc get backupstoragelocations.velero.io -n openshift-adpExample output
NAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 3s 15s trueConfigure a backup CR as shown in the following example:
apiVersion: velero.io/v1 kind: Backup metadata: name: test-backup namespace: openshift-adp spec: includedNamespaces: - <application_namespace>where:
<application_namespace>- Specifies the namespace for the application to back up.
Create the backup CR by running the following command:
$ oc apply -f <backup_cr_filename>
Verification
Verify that the backup object is in the
phase by running the following command. For more details, see the example output.Completed$ oc describe backup test-backup -n openshift-adpExample output
Name: test-backup Namespace: openshift-adp # ....# Status: Backup Item Operations Attempted: 1 Backup Item Operations Completed: 1 Completion Timestamp: 2024-09-25T10:17:01Z Expiration: 2024-10-25T10:16:31Z Format Version: 1.1.0 Hook Status: Phase: Completed Progress: Items Backed Up: 34 Total Items: 34 Start Timestamp: 2024-09-25T10:16:31Z Version: 1 Events: <none>
4.5.2. OpenShift API for Data Protection (OADP) restore use case Copier lienLien copié sur presse-papiers!
Following is a use case for using OADP to restore a backup to a different namespace.
4.5.2.1. Restoring an application to a different namespace using OADP Copier lienLien copié sur presse-papiers!
Restore a backup of an application by using OADP to a new target namespace,
test-restore-application
Prerequisites
- You installed the OADP Operator.
- You have the backup of an application to be restored.
Procedure
Create a restore CR as shown in the following example:
apiVersion: velero.io/v1 kind: Restore metadata: name: test-restore namespace: openshift-adp spec: backupName: <backup_name> restorePVs: true namespaceMapping: <application_namespace>: test-restore-applicationwhere:
test-restore- Specifies the name of the restore CR.
<backup_name>- Specifies the name of the backup.
<application_namespace>-
Specifies the target namespace to restore to.
namespaceMappingmaps the source application namespace to the target application namespace.test-restore-applicationis the name of target namespace where you want to restore the backup.
Apply the restore CR by running the following command:
$ oc apply -f <restore_cr_filename>
Verification
Verify that the restore is in the
phase by running the following command:Completed$ oc describe restores.velero.io <restore_name> -n openshift-adpChange to the restored namespace
by running the following command:test-restore-application$ oc project test-restore-applicationVerify the restored resources such as persistent volume claim (pvc), service (svc), deployment, secret, and config map by running the following command:
$ oc get pvc,svc,deployment,secret,configmapExample output
NAME STATUS VOLUME persistentvolumeclaim/mysql Bound pvc-9b3583db-...-14b86 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mysql ClusterIP 172....157 <none> 3306/TCP 2m56s service/todolist ClusterIP 172.....15 <none> 8000/TCP 2m56s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mysql 0/1 1 0 2m55s NAME TYPE DATA AGE secret/builder-dockercfg-6bfmd kubernetes.io/dockercfg 1 2m57s secret/default-dockercfg-hz9kz kubernetes.io/dockercfg 1 2m57s secret/deployer-dockercfg-86cvd kubernetes.io/dockercfg 1 2m57s secret/mysql-persistent-sa-dockercfg-rgp9b kubernetes.io/dockercfg 1 2m57s NAME DATA AGE configmap/kube-root-ca.crt 1 2m57s configmap/openshift-service-ca.crt 1 2m57s
4.5.3. Including a self-signed CA certificate during backup Copier lienLien copié sur presse-papiers!
You can include a self-signed Certificate Authority (CA) certificate in the Data Protection Application (DPA) and then back up an application. You store the backup in a NooBaa bucket provided by Red Hat OpenShift Data Foundation (ODF).
4.5.3.1. Backing up an application and its self-signed CA certificate Copier lienLien copié sur presse-papiers!
The
s3.openshift-storage.svc
To prevent a
certificate signed by unknown authority
DataProtectionApplication
- Request a NooBaa bucket by creating an object bucket claim (OBC).
- Extract the bucket details.
-
Include a self-signed CA certificate in the CR.
DataProtectionApplication - Back up an application.
Prerequisites
- You installed the OADP Operator.
- You installed the ODF Operator.
- You have an application with a database running in a separate namespace.
Procedure
Create an OBC manifest to request a NooBaa bucket as shown in the following example:
apiVersion: objectbucket.io/v1alpha1 kind: ObjectBucketClaim metadata: name: test-obc namespace: openshift-adp spec: storageClassName: openshift-storage.noobaa.io generateBucketName: test-backup-bucketwhere:
test-obc- Specifies the name of the object bucket claim.
test-backup-bucket- Specifies the name of the bucket.
Create the OBC by running the following command:
$ oc create -f <obc_file_name>When you create an OBC, ODF creates a
and asecretwith the same name as the object bucket claim. TheConfigMapobject contains the bucket credentials, and thesecretobject contains information to access the bucket. To get the bucket name and bucket host from the generated config map, run the following command:ConfigMap$ oc extract --to=- cm/test-obcis the name of the OBC.test-obcExample output
# BUCKET_NAME backup-c20...41fd # BUCKET_PORT 443 # BUCKET_REGION # BUCKET_SUBREGION # BUCKET_HOST s3.openshift-storage.svcTo get the bucket credentials from the
object, run the following command:secret$ oc extract --to=- secret/test-obcExample output
# AWS_ACCESS_KEY_ID ebYR....xLNMc # AWS_SECRET_ACCESS_KEY YXf...+NaCkdyC3QPymCreate a
file with the object bucket credentials by using the following example configuration:cloud-credentials[default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Create the
secret with thecloud-credentialsfile content by running the following command:cloud-credentials$ oc create secret generic \ cloud-credentials \ -n openshift-adp \ --from-file cloud=cloud-credentialsExtract the service CA certificate from the
config map by running the following command. Ensure that you encode the certificate inopenshift-service-ca.crtformat and note the value to use in the next step.Base64$ oc get cm/openshift-service-ca.crt \ -o jsonpath='{.data.service-ca\.crt}' | base64 -w0; echoExample output
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0... ....gpwOHMwaG9CRmk5a3....FLS0tLS0KConfigure the
CR manifest file with the bucket name and CA certificate as shown in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: oadp-backup namespace: openshift-adp spec: configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - aws - openshift - csi defaultSnapshotMoveData: true backupLocations: - velero: config: profile: "default" region: noobaa s3Url: https://s3.openshift-storage.svc s3ForcePathStyle: "true" insecureSkipTLSVerify: "false" provider: aws default: true credential: key: cloud name: cloud-credentials objectStorage: bucket: <bucket_name> prefix: oadp caCert: <ca_cert>where:
insecureSkipTLSVerify-
Specifies whether SSL/TLS security is enabled. If set to
true, SSL/TLS security is disabled. If set tofalse, SSL/TLS security is enabled. <bucket_name>- Specifies the name of the bucket extracted in an earlier step.
<ca_cert>-
Specifies the
Base64encoded certificate from the previous step.
Create the
CR by running the following command:DataProtectionApplication$ oc apply -f <dpa_filename>Verify that the
CR is created successfully by running the following command:DataProtectionApplication$ oc get dpa -o yamlExample output
apiVersion: v1 items: - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: namespace: openshift-adp #...# spec: backupLocations: - velero: config: #...# status: conditions: - lastTransitionTime: "20....9:54:02Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled kind: List metadata: resourceVersion: ""Verify that the backup storage location (BSL) is available by running the following command:
$ oc get backupstoragelocations.velero.io -n openshift-adpExample output
NAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 3s 15s trueConfigure the
CR by using the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: test-backup namespace: openshift-adp spec: includedNamespaces: - <application_namespace>where:
<application_namespace>- Specifies the namespace for the application to back up.
Create the
CR by running the following command:Backup$ oc apply -f <backup_cr_filename>
Verification
Verify that the
object is in theBackupphase by running the following command:Completed$ oc describe backup test-backup -n openshift-adpExample output
Name: test-backup Namespace: openshift-adp # ....# Status: Backup Item Operations Attempted: 1 Backup Item Operations Completed: 1 Completion Timestamp: 2024-09-25T10:17:01Z Expiration: 2024-10-25T10:16:31Z Format Version: 1.1.0 Hook Status: Phase: Completed Progress: Items Backed Up: 34 Total Items: 34 Start Timestamp: 2024-09-25T10:16:31Z Version: 1 Events: <none>
4.5.4. Using the legacy-aws Velero plugin Copier lienLien copié sur presse-papiers!
If you are using an AWS S3-compatible backup storage location, you might get a
SignatureDoesNotMatch
legacy-aws
DataProtectionApplication
legacy-aws
4.5.4.1. Using the legacy-aws Velero plugin in the DataProtectionApplication CR Copier lienLien copié sur presse-papiers!
In the following use case, you configure the
DataProtectionApplication
legacy-aws
Depending on the backup storage location you choose, you can use either the
legacy-aws
aws
DataProtectionApplication
DataProtectionApplication
aws and legacy-aws can not be both specified in DPA spec.configuration.velero.defaultPlugins
Prerequisites
- You have installed the OADP Operator.
- You have configured an AWS S3-compatible object storage as a backup location.
- You have an application with a database running in a separate namespace.
Procedure
Configure the
CR to use theDataProtectionApplicationVelero plugin as shown in the following example:legacy-awsapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: oadp-backup namespace: openshift-adp spec: configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - legacy-aws - openshift - csi defaultSnapshotMoveData: true backupLocations: - velero: config: profile: "default" region: noobaa s3Url: https://s3.openshift-storage.svc s3ForcePathStyle: "true" insecureSkipTLSVerify: "true" provider: aws default: true credential: key: cloud name: cloud-credentials objectStorage: bucket: <bucket_name> prefix: oadpwhere:
legacy-aws-
Specifies to use the
legacy-awsplugin. <bucket_name>- Specifies the bucket name.
Create the
CR by running the following command:DataProtectionApplication$ oc apply -f <dpa_filename>Verify that the
CR is created successfully by running the following command. In the example output, you can see theDataProtectionApplicationobject has thestatusfield set totypeand theReconciledfield set tostatus. That status indicates that the"True"CR is successfully created.DataProtectionApplication$ oc get dpa -o yamlExample output
apiVersion: v1 items: - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: namespace: openshift-adp #...# spec: backupLocations: - velero: config: #...# status: conditions: - lastTransitionTime: "20....9:54:02Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled kind: List metadata: resourceVersion: ""Verify that the backup storage location (BSL) is available by running the following command:
$ oc get backupstoragelocations.velero.io -n openshift-adpYou should see an output similar to the following example:
NAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 3s 15s trueConfigure a
CR as shown in the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: test-backup namespace: openshift-adp spec: includedNamespaces: - <application_namespace>where:
<application_namespace>- Specifies the namespace for the application to back up.
Create the
CR by running the following command:Backup$ oc apply -f <backup_cr_filename>
Verification
Verify that the backup object is in the
phase by running the following command. For more details, see the example output.Completed$ oc describe backups.velero.io test-backup -n openshift-adpExample output
Name: test-backup Namespace: openshift-adp # ....# Status: Backup Item Operations Attempted: 1 Backup Item Operations Completed: 1 Completion Timestamp: 2024-09-25T10:17:01Z Expiration: 2024-10-25T10:16:31Z Format Version: 1.1.0 Hook Status: Phase: Completed Progress: Items Backed Up: 34 Total Items: 34 Start Timestamp: 2024-09-25T10:16:31Z Version: 1 Events: <none>
4.6. Installing OADP Copier lienLien copié sur presse-papiers!
4.6.1. About installing OADP Copier lienLien copié sur presse-papiers!
As a cluster administrator, you install the OpenShift API for Data Protection (OADP) by installing the OADP Operator. The OADP Operator installs Velero 1.14.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
To back up Kubernetes resources and internal images, you must have object storage as a backup location, such as one of the following storage types:
- Amazon Web Services
- Microsoft Azure
- Google Cloud
- Multicloud Object Gateway
- IBM Cloud® Object Storage S3
- AWS S3 compatible object storage, such as Multicloud Object Gateway or MinIO
You can configure multiple backup storage locations within the same namespace for each individual OADP deployment.
Unless specified otherwise, "NooBaa" refers to the open source project that provides lightweight object storage, while "Multicloud Object Gateway (MCG)" refers to the Red Hat distribution of NooBaa.
For more information on the MCG, see Accessing the Multicloud Object Gateway with your applications.
The
CloudStorage
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The
CloudStorage
CloudStorage
CloudStorage
BackupStorageLocation
The
CloudStorage
BackupStorageLocation
CloudStorage
You can back up persistent volumes (PVs) by using snapshots or a File System Backup (FSB).
To back up PVs with snapshots, you must have a cloud provider that supports either a native snapshot API or Container Storage Interface (CSI) snapshots, such as one of the following cloud providers:
- Amazon Web Services
- Microsoft Azure
- Google Cloud
- CSI snapshot-enabled cloud provider, such as OpenShift Data Foundation
If you want to use CSI backup on OCP 4.11 and later, install OADP 1.1.x.
OADP 1.0.x does not support CSI backup on OCP 4.11 and later. OADP 1.0.x includes Velero 1.7.x and expects the API group
snapshot.storage.k8s.io/v1beta1
If your cloud provider does not support snapshots or if your storage is NFS, you can back up applications with Backing up applications with File System Backup: Kopia or Restic on object storage.
You create a default
Secret
4.6.1.1. AWS S3 compatible backup storage providers Copier lienLien copié sur presse-papiers!
OADP works with many S3-compatible object storage providers. Several object storage providers are certified and tested with every release of OADP. Various S3 providers are known to work with OADP but are not specifically tested and certified. These providers will be supported on a best-effort basis. Additionally, there are a few S3 object storage providers with known issues and limitations that are listed in this documentation.
Red Hat will provide support for OADP on any S3-compatible storage, but support will stop if the S3 endpoint is determined to be the root cause of an issue.
4.6.1.1.1. Certified backup storage providers Copier lienLien copié sur presse-papiers!
The following AWS S3 compatible object storage providers are fully supported by OADP through the AWS plugin for use as backup storage locations:
- MinIO
- Multicloud Object Gateway (MCG)
- Amazon Web Services (AWS) S3
- IBM Cloud® Object Storage S3
- Ceph RADOS Gateway (Ceph Object Gateway)
- Red Hat Container Storage
- Red Hat OpenShift Data Foundation
- NetApp ONTAP S3 Object Storage
Google Cloud and Microsoft Azure have their own Velero object store plugins.
4.6.1.1.2. Unsupported backup storage providers Copier lienLien copié sur presse-papiers!
The following AWS S3 compatible object storage providers, are known to work with Velero through the AWS plugin, for use as backup storage locations, however, they are unsupported and have not been tested by Red Hat:
- Oracle Cloud
- DigitalOcean
- NooBaa, unless installed using Multicloud Object Gateway (MCG)
- Tencent Cloud
- Quobyte
- Cloudian HyperStore
Unless specified otherwise, "NooBaa" refers to the open source project that provides lightweight object storage, while "Multicloud Object Gateway (MCG)" refers to the Red Hat distribution of NooBaa.
For more information on the MCG, see Accessing the Multicloud Object Gateway with your applications.
4.6.1.1.3. Backup storage providers with known limitations Copier lienLien copié sur presse-papiers!
The following AWS S3 compatible object storage providers are known to work with Velero through the AWS plugin with a limited feature set:
- Swift - It works for use as a backup storage location for backup storage, but is not compatible with Restic for filesystem-based volume backup and restore.
4.6.1.2. Configuring Multicloud Object Gateway (MCG) for disaster recovery on OpenShift Data Foundation Copier lienLien copié sur presse-papiers!
If you use cluster storage for your MCG bucket
backupStorageLocation
Failure to configure MCG as an external object store might lead to backups not being available.
Unless specified otherwise, "NooBaa" refers to the open source project that provides lightweight object storage, while "Multicloud Object Gateway (MCG)" refers to the Red Hat distribution of NooBaa.
For more information on the MCG, see Accessing the Multicloud Object Gateway with your applications.
Procedure
- Configure MCG as an external object store as described in Adding storage resources for hybrid or Multicloud.
4.6.1.3. About OADP update channels Copier lienLien copié sur presse-papiers!
When you install an OADP Operator, you choose an update channel. This channel determines which upgrades to the OADP Operator and to Velero you receive. You can switch channels at any time.
The following update channels are available:
-
The stable channel is now deprecated. The stable channel contains the patches (z-stream updates) of OADP for
ClusterServiceVersionand older versions fromOADP.v1.1.z.OADP.v1.0.z - The stable-1.0 channel is deprecated and is not supported.
- The stable-1.1 channel is deprecated and is not supported.
- The stable-1.2 channel is deprecated and is not supported.
-
The stable-1.3 channel contains , the most recent OADP 1.3
OADP.v1.3.z.ClusterServiceVersion -
The stable-1.4 channel contains , the most recent OADP 1.4
OADP.v1.4.z.ClusterServiceVersion
For more information, see OpenShift Operator Life Cycles.
Which update channel is right for you?
-
The stable channel is now deprecated. If you are already using the stable channel, you will continue to get updates from .
OADP.v1.1.z - Choose the stable-1.y update channel to install OADP 1.y and to continue receiving patches for it. If you choose this channel, you will receive all z-stream patches for version 1.y.z.
When must you switch update channels?
- If you have OADP 1.y installed, and you want to receive patches only for that y-stream, you must switch from the stable update channel to the stable-1.y update channel. You will then receive all z-stream patches for version 1.y.z.
- If you have OADP 1.0 installed, want to upgrade to OADP 1.1, and then receive patches only for OADP 1.1, you must switch from the stable-1.0 update channel to the stable-1.1 update channel. You will then receive all z-stream patches for version 1.1.z.
- If you have OADP 1.y installed, with y greater than 0, and want to switch to OADP 1.0, you must uninstall your OADP Operator and then reinstall it using the stable-1.0 update channel. You will then receive all z-stream patches for version 1.0.z.
You cannot switch from OADP 1.y to OADP 1.0 by switching update channels. You must uninstall the Operator and then reinstall it.
4.6.1.4. Installation of OADP on multiple namespaces Copier lienLien copié sur presse-papiers!
You can install OpenShift API for Data Protection into multiple namespaces on the same cluster so that multiple project owners can manage their own OADP instance. This use case has been validated with File System Backup (FSB) and Container Storage Interface (CSI).
You install each instance of OADP as specified by the per-platform procedures contained in this document with the following additional requirements:
- All deployments of OADP on the same cluster must be the same version, for example, 1.4.0. Installing different versions of OADP on the same cluster is not supported.
-
Each individual deployment of OADP must have a unique set of credentials and at least one configuration. You can also use multiple
BackupStorageLocationconfigurations within the same namespace.BackupStorageLocation - By default, each OADP deployment has cluster-level access across namespaces. OpenShift Container Platform administrators need to carefully review potential impacts, such as not backing up and restoring to and from the same namespace concurrently.
4.6.1.5. OADP support for backup data immutability Copier lienLien copié sur presse-papiers!
Starting with OADP 1.4, you can store OADP backups in an AWS S3 bucket with enabled versioning. The versioning support is only for AWS S3 buckets and not for S3-compatible buckets.
See the following list for specific cloud provider limitations:
- AWS S3 service supports backups because an S3 object lock applies only to versioned buckets. You can still update the object data for the new version. However, when backups are deleted, old versions of the objects are not deleted.
- OADP backups are not supported and might not work as expected when you enable immutability on Azure Storage Blob.
- Google Cloud storage policy only supports bucket-level immutability. Therefore, it is not feasible to implement it in the Google Cloud environment.
Depending on your storage provider, the immutability options are called differently:
- S3 object lock
- Object retention
- Bucket versioning
- Write Once Read Many (WORM) buckets
The primary reason for the absence of support for other S3-compatible object storage is that OADP initially saves the state of a backup as finalizing and then verifies whether any asynchronous operations are in progress.
4.6.1.6. Velero CPU and memory requirements based on collected data Copier lienLien copié sur presse-papiers!
The following recommendations are based on observations of performance made in the scale and performance lab. The backup and restore resources can be impacted by the type of plugin, the amount of resources required by that backup or restore, and the respective data contained in the persistent volumes (PVs) related to those resources.
4.6.1.6.1. CPU and memory requirement for configurations Copier lienLien copié sur presse-papiers!
| Configuration types | [1] Average usage | [2] Large usage | resourceTimeouts |
|---|---|---|---|
| CSI | Velero: CPU- Request 200m, Limits 1000m Memory - Request 256Mi, Limits 1024Mi | Velero: CPU- Request 200m, Limits 2000m Memory- Request 256Mi, Limits 2048Mi | N/A |
| Restic | [3] Restic: CPU- Request 1000m, Limits 2000m Memory - Request 16Gi, Limits 32Gi | [4] Restic: CPU - Request 2000m, Limits 8000m Memory - Request 16Gi, Limits 40Gi | 900m |
| [5] Data Mover | N/A | N/A | 10m - average usage 60m - large usage |
- Average usage - use these settings for most usage situations.
- Large usage - use these settings for large usage situations, such as a large PV (500GB Usage), multiple namespaces (100+), or many pods within a single namespace (2000 pods+), and for optimal performance for backup and restore involving large datasets.
- Restic resource usage corresponds to the amount of data, and type of data. For example, many small files or large amounts of data can cause Restic to use large amounts of resources. The Velero documentation references 500m as a supplied default, for most of our testing we found a 200m request suitable with 1000m limit. As cited in the Velero documentation, exact CPU and memory usage is dependent on the scale of files and directories, in addition to environmental limitations.
- Increasing the CPU has a significant impact on improving backup and restore times.
- Data Mover - Data Mover default resourceTimeout is 10m. Our tests show that for restoring a large PV (500GB usage), it is required to increase the resourceTimeout to 60m.
The resource requirements listed throughout the guide are for average usage only. For large usage, adjust the settings as described in the table above.
4.6.1.6.2. NodeAgent CPU for large usage Copier lienLien copié sur presse-papiers!
Testing shows that increasing
NodeAgent
You can tune your OpenShift Container Platform environment based on your performance analysis and preference. Use CPU limits in the workloads when you use Kopia for file system backups.
If you do not use CPU limits on the pods, the pods can use excess CPU when it is available. If you specify CPU limits, the pods might be throttled if they exceed their limits. Therefore, the use of CPU limits on the pods is considered an anti-pattern.
Ensure that you are accurately specifying CPU requests so that pods can take advantage of excess CPU. Resource allocation is guaranteed based on CPU requests rather than CPU limits.
Testing showed that running Kopia with 20 cores and 32 Gi memory supported backup and restore operations of over 100 GB of data, multiple namespaces, or over 2000 pods in a single namespace. Testing detected no CPU limiting or memory saturation with these resource specifications.
In some environments, you might need to adjust Ceph MDS pod resources to avoid pod restarts, which occur when default settings cause resource saturation.
For more information about how to set the pod resources limit in Ceph MDS pods, see Changing the CPU and memory resources on the rook-ceph pods.
4.6.2. Installing the OADP Operator Copier lienLien copié sur presse-papiers!
Install the OpenShift API for Data Protection (OADP) Operator on OpenShift Container Platform 4.14 by using Operator Lifecycle Manager (OLM).
The OADP Operator installs Velero 1.14.
4.6.2.1. Installing the OADP Operator Copier lienLien copié sur presse-papiers!
Install the OADP Operator by using the OpenShift Container Platform web console.
Prerequisites
You must be logged in as a user with
privileges. .Procedurecluster-admin- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Use the Filter by keyword field to find the OADP Operator.
- Select the OADP Operator and click Install.
-
Click Install to install the Operator in the project.
openshift-adp - Click Operators → Installed Operators to verify the installation.
4.6.2.2. OADP-Velero-OpenShift Container Platform version relationship Copier lienLien copié sur presse-papiers!
Review the version relationship between OADP, Velero, and OpenShift Container Platform to decide compatible version combinations. This helps you select the appropriate OADP version for your cluster environment.
4.7. Configuring OADP with AWS S3 compatible storage Copier lienLien copié sur presse-papiers!
4.7.1. Configuring the OpenShift API for Data Protection with AWS S3 compatible storage Copier lienLien copié sur presse-papiers!
You install the OpenShift API for Data Protection (OADP) with Amazon Web Services (AWS) S3 compatible storage by installing the OADP Operator. The Operator installs Velero 1.14.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
You configure AWS for Velero, create a default
Secret
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. See Using Operator Lifecycle Manager on restricted networks for details.
4.7.1.1. About Amazon Simple Storage Service, Identity and Access Management, and GovCloud Copier lienLien copié sur presse-papiers!
Review Amazon Simple Storage Service (S3), Identity and Access Management (IAM), and AWS GovCloud requirements to configure backup storage with appropriate security controls. This helps you meet federal data security requirements and use correct endpoints.
AWS S3 is a storage solution of Amazon for the internet. As an authorized user, you can use this service to store and retrieve any amount of data whenever you want, from anywhere on the web.
You securely control access to Amazon S3 and other Amazon services by using the AWS Identity and Access Management (IAM) web service.
You can use IAM to manage permissions that control which AWS resources users can access. You use IAM to both authenticate, or verify that a user is who they claim to be, and to authorize, or grant permissions to use resources.
AWS GovCloud (US) is an Amazon storage solution developed to meet the stringent and specific data security requirements of the United States Federal Government. AWS GovCloud (US) works the same as Amazon S3 except for the following:
- You cannot copy the contents of an Amazon S3 bucket in the AWS GovCloud (US) regions directly to or from another AWS region.
If you use Amazon S3 policies, use the AWS GovCloud (US) Amazon Resource Name (ARN) identifier to unambiguously specify a resource across all of AWS, such as in IAM policies, Amazon S3 bucket names, and API calls.
In AWS GovCloud (US) regions, ARNs have an identifier that is different from the one in other standard AWS regions,
. If you need to specify the US-West or US-East region, use one the following ARNs:arn:aws-us-gov-
For US-West, use .
us-gov-west-1 -
For US-East, use .
us-gov-east-1
-
For US-West, use
-
For all other standard regions, ARNs begin with: .
arn:aws
- In AWS GovCloud (US) regions, use the endpoints listed in the AWS GovCloud (US-East) and AWS GovCloud (US-West) rows of the "Amazon S3 endpoints" table on Amazon Simple Storage Service endpoints and quotas. If you are processing export-controlled data, use one of the SSL/TLS endpoints. If you have FIPS requirements, use a FIPS 140-2 endpoint such as https://s3-fips.us-gov-west-1.amazonaws.com or https://s3-fips.us-gov-east-1.amazonaws.com.
- To find the other AWS-imposed restrictions, see How Amazon Simple Storage Service Differs for AWS GovCloud (US).
4.7.1.2. Configuring Amazon Web Services Copier lienLien copié sur presse-papiers!
Configure Amazon Web Services (AWS) S3 storage and Identity and Access Management (IAM) credentials for backup storage with OADP. This provides the necessary permissions and storage infrastructure for data protection operations.
Prerequisites
- You must have the AWS CLI installed.
Procedure
Set the
variable:BUCKET$ BUCKET=<your_bucket>Set the
variable:REGION$ REGION=<your_region>Create an AWS S3 bucket:
$ aws s3api create-bucket \ --bucket $BUCKET \ --region $REGION \ --create-bucket-configuration LocationConstraint=$REGIONwhere:
LocationConstraint-
Specifies the bucket configuration location constraint.
us-east-1does not supportLocationConstraint. If your region isus-east-1, omit--create-bucket-configuration LocationConstraint=$REGION.
Create an IAM user:
$ aws iam create-user --user-name velerowhere:
velero- Specifies the user name. If you want to use Velero to back up multiple clusters with multiple S3 buckets, create a unique user name for each cluster.
Create a
file:velero-policy.json$ cat > velero-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:DescribeVolumes", "ec2:DescribeSnapshots", "ec2:CreateTags", "ec2:CreateVolume", "ec2:CreateSnapshot", "ec2:DeleteSnapshot" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:DeleteObject", "s3:PutObject", "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::${BUCKET}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation", "s3:ListBucketMultipartUploads" ], "Resource": [ "arn:aws:s3:::${BUCKET}" ] } ] } EOFAttach the policies to give the
user the minimum necessary permissions:velero$ aws iam put-user-policy \ --user-name velero \ --policy-name velero \ --policy-document file://velero-policy.jsonCreate an access key for the
user:velero$ aws iam create-access-key --user-name velero{ "AccessKey": { "UserName": "velero", "Status": "Active", "CreateDate": "2017-07-31T22:24:41.576Z", "SecretAccessKey": <AWS_SECRET_ACCESS_KEY>, "AccessKeyId": <AWS_ACCESS_KEY_ID> } }Create a
file:credentials-velero$ cat << EOF > ./credentials-velero [default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY> EOFYou use the
file to create acredentials-veleroobject for AWS before you install the Data Protection Application.Secret
4.7.1.3. About backup and snapshot locations and their secrets Copier lienLien copié sur presse-papiers!
Review backup location, snapshot location, and secret configuration requirements for the
DataProtectionApplication
4.7.1.3.1. Backup locations Copier lienLien copié sur presse-papiers!
You can specify one of the following AWS S3-compatible object storage solutions as a backup location:
- Multicloud Object Gateway (MCG)
- Red Hat Container Storage
- Ceph RADOS Gateway; also known as Ceph Object Gateway
- Red Hat OpenShift Data Foundation
- MinIO
Velero backs up OpenShift Container Platform resources, Kubernetes objects, and internal images as an archive file on object storage.
4.7.1.3.2. Snapshot locations Copier lienLien copié sur presse-papiers!
If you use your cloud provider’s native snapshot API to back up persistent volumes, you must specify the cloud provider as the snapshot location.
If you use Container Storage Interface (CSI) snapshots, you do not need to specify a snapshot location because you will create a
VolumeSnapshotClass
If you use File System Backup (FSB), you do not need to specify a snapshot location because FSB backs up the file system on object storage.
4.7.1.3.3. Secrets Copier lienLien copié sur presse-papiers!
If the backup and snapshot locations use the same credentials or if you do not require a snapshot location, you create a default
Secret
If the backup and snapshot locations use different credentials, you create two secret objects:
-
Custom for the backup location, which you specify in the
SecretCR.DataProtectionApplication -
Default for the snapshot location, which is not referenced in the
SecretCR.DataProtectionApplication
The Data Protection Application requires a default
Secret
If you do not want to specify backup or snapshot locations during the installation, you can create a default
Secret
credentials-velero
4.7.1.3.4. Creating a default Secret Copier lienLien copié sur presse-papiers!
You create a default
Secret
The default name of the
Secret
cloud-credentials
The
DataProtectionApplication
Secret
Secret
If you do not want to use the backup location credentials during the installation, you can create a
Secret
credentials-velero
Prerequisites
- Your object storage and cloud storage, if any, must use the same credentials.
- You must configure object storage for Velero.
Procedure
Create a
file for the backup storage location in the appropriate format for your cloud provider.credentials-veleroSee the following example:
[default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Create a
custom resource (CR) with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-veleroThe
is referenced in theSecretblock of thespec.backupLocations.credentialCR when you install the Data Protection Application.DataProtectionApplication
4.7.1.3.5. Creating profiles for different credentials Copier lienLien copié sur presse-papiers!
If your backup and snapshot locations use different credentials, you create separate profiles in the
credentials-velero
Then, you create a
Secret
DataProtectionApplication
Procedure
Create a
file with separate profiles for the backup and snapshot locations, as in the following example:credentials-velero[backupStorage] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY> [volumeSnapshot] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Create a
object with theSecretfile:credentials-velero$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-velero1 Add the profiles to the
CR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: ... backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket_name> prefix: <prefix> config: region: us-east-1 profile: "backupStorage" credential: key: cloud name: cloud-credentials snapshotLocations: - velero: provider: aws config: region: us-west-2 profile: "volumeSnapshot"
4.7.1.3.6. Creating an OADP SSE-C encryption key for additional data security Copier lienLien copié sur presse-papiers!
Configure server-side encryption with customer-provided keys (SSE-C) to add an additional layer of encryption for backup data stored in Amazon Web Services (AWS) S3. This protects backup data if AWS credentials become exposed.
Amazon Web Services (AWS) S3 applies server-side encryption with AWS S3 managed keys (SSE-S3) as the base level of encryption for every bucket in Amazon S3.
OpenShift API for Data Protection (OADP) encrypts data by using SSL/TLS, HTTPS, and the
velero-repo-credentials
The velero-plugin-for-aws plugin provides several additional encryption methods. You should review its configuration options and consider implementing additional encryption.
You can store your own encryption keys by using server-side encryption with customer-provided keys (SSE-C). This feature provides additional security if your AWS credentials become exposed.
Be sure to store cryptographic keys in a secure and safe manner. Encrypted data and backups cannot be recovered if you do not have the encryption key.
Prerequisites
To make OADP mount a secret that contains your SSE-C key to the Velero pod at
, use the following default secret name for AWS:/credentials, and leave at least one of the following labels empty:cloud-credentials-
dpa.spec.backupLocations[].velero.credential dpa.spec.snapshotLocations[].velero.credentialThis is a workaround for a known issue: https://issues.redhat.com/browse/OADP-3971.
-
The following procedure contains an example of a
spec:backupLocations
If you need the backup location to have credentials with a different name than
, you must add a snapshot location, such as the one in the following example, that does not contain a credential name. Because the following example does not contain a credential name, the snapshot location will usecloud-credentialsas its secret for taking snapshots.cloud-credentialssnapshotLocations: - velero: config: profile: default region: <region> provider: aws # ...
Procedure
Create an SSE-C encryption key:
Generate a random number and save it as a file named
by running the following command:sse.key$ dd if=/dev/urandom bs=1 count=32 > sse.key
Create an OpenShift Container Platform secret:
If you are initially installing and configuring OADP, create the AWS credential and encryption key secret at the same time by running the following command:
$ oc create secret generic cloud-credentials --namespace openshift-adp --from-file cloud=<path>/openshift_aws_credentials,customer-key=<path>/sse.keyIf you are updating an existing installation, edit the values of the
cloud-credentialblock of thesecretCR manifest, as in the following example:DataProtectionApplicationapiVersion: v1 data: cloud: W2Rfa2V5X2lkPSJBS0lBVkJRWUIyRkQ0TlFHRFFPQiIKYXdzX3NlY3JldF9hY2Nlc3Nfa2V5P<snip>rUE1mNWVSbTN5K2FpeWhUTUQyQk1WZHBOIgo= customer-key: v+<snip>TFIiq6aaXPbj8dhos= kind: Secret # ...
Edit the value of the
attribute in thecustomerKeyEncryptionFileblock of thebackupLocationsCR manifest, as in the following example:DataProtectionApplicationspec: backupLocations: - velero: config: customerKeyEncryptionFile: /credentials/customer-key profile: default # ...WarningYou must restart the Velero pod to remount the secret credentials properly on an existing installation.
The installation is complete, and you can back up and restore OpenShift Container Platform resources. The data saved in AWS S3 storage is encrypted with the new key, and you cannot download it from the AWS S3 console or API without the additional encryption key.
Verification
To verify that you cannot download the encrypted files without the inclusion of an additional key, create a test file, upload it, and then try to download it.
Create a test file by running the following command:
$ echo "encrypt me please" > test.txtUpload the test file by running the following command:
$ aws s3api put-object \ --bucket <bucket> \ --key test.txt \ --body test.txt \ --sse-customer-key fileb://sse.key \ --sse-customer-algorithm AES256Try to download the file. In either the Amazon web console or the terminal, run the following command:
$ s3cmd get s3://<bucket>/test.txt test.txtThe download fails because the file is encrypted with an additional key.
Download the file with the additional encryption key by running the following command:
$ aws s3api get-object \ --bucket <bucket> \ --key test.txt \ --sse-customer-key fileb://sse.key \ --sse-customer-algorithm AES256 \ downloaded.txtRead the file contents by running the following command:
$ cat downloaded.txtencrypt me please
4.7.1.3.6.1. Downloading a file with an SSE-C encryption key for files backed up by Velero Copier lienLien copié sur presse-papiers!
When you are verifying an SSE-C encryption key, you can also download the file with the additional encryption key for files that were backed up with Velero.
Procedure
Download the file with the additional encryption key for files backed up by Velero by running the following command:
$ aws s3api get-object \ --bucket <bucket> \ --key velero/backups/mysql-persistent-customerkeyencryptionfile4/mysql-persistent-customerkeyencryptionfile4.tar.gz \ --sse-customer-key fileb://sse.key \ --sse-customer-algorithm AES256 \ --debug \ velero_download.tar.gz
4.7.1.4. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
-
If the backup and snapshot locations use the same credentials, you must create a with the default name,
Secret.cloud-credentials If the backup and snapshot locations use different credentials, you must create a
with the default name,Secret, which contains separate profiles for the backup and snapshot location credentials.cloud-credentialsNoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: configuration: velero: defaultPlugins: - openshift - aws resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia podConfig: nodeSelector: <node_selector> backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket_name> prefix: <prefix> config: region: <region> profile: "default" s3ForcePathStyle: "true" s3Url: <s3_url> credential: key: cloud name: cloud-credentials snapshotLocations: - name: default velero: provider: aws config: region: <region> profile: "default" credential: key: cloud name: cloud-credentialswhere:
namespace-
Specifies the default namespace for OADP which is
openshift-adp. The namespace is a variable and is configurable. openshift-
Specifies that the
openshiftplugin is mandatory. resourceTimeout- Specifies how many minutes to wait for several Velero resources such as Velero CRD availability, volumeSnapshot deletion, and backup repository availability, before timeout occurs. The default is 10m.
nodeAgent- Specifies the administrative agent that routes the administrative requests to servers.
enable-
Set this value to
trueif you want to enablenodeAgentand perform File System Backup. uploaderType-
Specifies the uploader type. Enter
kopiaorresticas your uploader. You cannot change the selection after the installation. For the Built-in DataMover you must use Kopia. ThenodeAgentdeploys a daemon set, which means that thenodeAgentpods run on each working node. You can configure File System Backup by addingspec.defaultVolumesToFsBackup: trueto theBackupCR. nodeSelector- Specifies the nodes on which Kopia or Restic are available. By default, Kopia or Restic run on all nodes.
bucket- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
prefix-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes. s3ForcePathStyle- Specifies whether to force path style URLs for S3 objects (Boolean). Not Required for AWS S3. Required only for S3 compatible storage.
s3Url- Specifies the URL of the object store that you are using to store backups. Not required for AWS S3. Required only for S3 compatible storage.
name-
Specifies the name of the
Secretobject that you created. If you do not specify this value, the default name,cloud-credentials, is used. If you specify a custom name, the custom name is used for the backup location. snapshotLocations- Specifies a snapshot location, unless you use CSI snapshots or a File System Backup (FSB) to back up PVs.
region- Specifies that the snapshot location must be in the same region as the PVs.
name-
Specifies the name of the
Secretobject that you created. If you do not specify this value, the default name,cloud-credentials, is used. If you specify a custom name, the custom name is used for the snapshot location. If your backup and snapshot locations use different credentials, create separate profiles in thecredentials-velerofile.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
4.7.1.4.1. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
Use the
nodeSelector
nodeSelector
4.7.1.4.2. Enabling self-signed CA certificates Copier lienLien copié sur presse-papiers!
You must enable a self-signed CA certificate for object storage by editing the
DataProtectionApplication
certificate signed by unknown authority
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the
parameter andspec.backupLocations.velero.objectStorage.caCertparameters of thespec.backupLocations.velero.configCR manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket> prefix: <prefix> caCert: <base64_encoded_cert_string> config: insecureSkipTLSVerify: "false" # ...where:
caCert- Specifies the Base64-encoded CA certificate string.
insecureSkipTLSVerify-
Specifies the
insecureSkipTLSVerifyconfiguration. The configuration can be set to either"true"or"false". If set to"true", SSL/TLS security is disabled. If set to"false", SSL/TLS security is enabled.
4.7.1.4.3. Using CA certificates with the velero command aliased for Velero deployment Copier lienLien copié sur presse-papiers!
You might want to use the Velero CLI without installing it locally on your system by creating an alias for it.
Prerequisites
-
You must be logged in to the OpenShift Container Platform cluster as a user with the role.
cluster-admin You must have the OpenShift CLI (
) installed. .ProcedureocTo use an aliased Velero command, run the following command:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'Check that the alias is working by running the following command:
$ velero versionClient: Version: v1.12.1-OADP Git commit: - Server: Version: v1.12.1-OADPTo use a CA certificate with this command, you can add a certificate to the Velero deployment by running the following commands:
$ CA_CERT=$(oc -n openshift-adp get dataprotectionapplications.oadp.openshift.io <dpa-name> -o jsonpath='{.spec.backupLocations[0].velero.objectStorage.caCert}')$ [[ -n $CA_CERT ]] && echo "$CA_CERT" | base64 -d | oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "cat > /tmp/your-cacert.txt" || echo "DPA BSL has no caCert"$ velero describe backup <backup_name> --details --cacert /tmp/<your_cacert>.txtTo fetch the backup logs, run the following command:
$ velero backup logs <backup_name> --cacert /tmp/<your_cacert.txt>You can use these logs to view failures and warnings for the resources that you cannot back up.
-
If the Velero pod restarts, the file disappears, and you must re-create the
/tmp/your-cacert.txtfile by re-running the commands from the previous step./tmp/your-cacert.txt You can check if the
file still exists, in the file location where you stored it, by running the following command:/tmp/your-cacert.txt$ oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "ls /tmp/your-cacert.txt" /tmp/your-cacert.txtIn a future release of OpenShift API for Data Protection (OADP), we plan to mount the certificate to the Velero pod so that this step is not required.
4.7.1.4.4. Configuring node agents and node labels Copier lienLien copié sur presse-papiers!
The Data Protection Application (DPA) uses the
nodeSelector
nodeSelector
Procedure
Run the node agent on any node that you choose by adding a custom label:
$ oc label node/<node_name> node-role.kubernetes.io/nodeAgent=""NoteAny label specified must match the labels on each node.
Use the same custom label in the
field, which you used for labeling nodes:DPA.spec.configuration.nodeAgent.podConfig.nodeSelectorconfiguration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/nodeAgent: ""The following example is an anti-pattern of
and does not work unless both labels,nodeSelectorandnode-role.kubernetes.io/infra: "", are on the node:node-role.kubernetes.io/worker: ""configuration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: ""
4.7.1.5. Configuring the backup storage location with a MD5 checksum algorithm Copier lienLien copié sur presse-papiers!
You can configure the Backup Storage Location (BSL) in the Data Protection Application (DPA) to use a MD5 checksum algorithm for both Amazon Simple Storage Service (Amazon S3) and S3-compatible storage providers. The checksum algorithm calculates the checksum for uploading and downloading objects to Amazon S3. You can use one of the following options to set the
checksumAlgorithm
spec.backupLocations.velero.config.checksumAlgorithm
-
CRC32 -
CRC32C -
SHA1 -
SHA256
You can also set the
checksumAlgorithm
checksumAlgorithm
CRC32
Prerequisites
- You have installed the OADP Operator.
- You have configured Amazon S3, or S3-compatible object storage as a backup location.
Procedure
Configure the BSL in the DPA as shown in the following example:
Example Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: checksumAlgorithm: "" insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: velero: defaultPlugins: - openshift - aws - csiwhere:
checksumAlgorithm-
Specifies the
checksumAlgorithm. In this example, thechecksumAlgorithmfield is set to an empty value. You can select an option from the following list:CRC32,CRC32C,SHA1,SHA256.
ImportantIf you are using Noobaa as the object storage provider, and you do not set the
field in the DPA, an empty value ofspec.backupLocations.velero.config.checksumAlgorithmis added to the BSL configuration.checksumAlgorithmThe empty value is only added for BSLs that are created using the DPA. This value is not added if you create the BSL by using any other method.
4.7.1.6. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.7.1.7. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.7.1.8. Enabling CSI in the DataProtectionApplication CR Copier lienLien copié sur presse-papiers!
You enable the Container Storage Interface (CSI) in the
DataProtectionApplication
Prerequisites
- The cloud provider must support CSI snapshots.
Procedure
Edit the
CR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... spec: configuration: velero: defaultPlugins: - openshift - csiwhere:
csi-
Specifies the
csidefault plugin.
4.7.1.9. Disabling the node agent in DataProtectionApplication Copier lienLien copié sur presse-papiers!
If you are not using
Restic
Kopia
DataMover
nodeAgent
DataProtectionApplication
nodeAgent
Procedure
To disable the
, set thenodeAgentflag toenable. See the following example:falseExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: false uploaderType: kopia # ...where:
enable- Enables the node agent.
To enable the
, set thenodeAgentflag toenable. See the following example:trueExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: true uploaderType: kopia # ...where:
enableEnables the node agent.
You can set up a job to enable and disable the
field in thenodeAgentCR. For more information, see "Running tasks in pods using jobs".DataProtectionApplication
4.8. Configuring OADP with IBM Cloud Copier lienLien copié sur presse-papiers!
4.8.1. Configuring the OpenShift API for Data Protection with IBM Cloud Copier lienLien copié sur presse-papiers!
You install the OpenShift API for Data Protection (OADP) Operator on an IBM Cloud cluster to back up and restore applications on the cluster. You configure IBM Cloud Object Storage (COS) to store the backups.
4.8.1.1. Configuring the COS instance Copier lienLien copié sur presse-papiers!
You create an IBM Cloud Object Storage (COS) instance to store the OADP backup data. After you create the COS instance, configure the
HMAC
Prerequisites
- You have an IBM Cloud Platform account.
- You installed the IBM Cloud CLI.
- You are logged in to IBM Cloud.
Procedure
Install the IBM Cloud Object Storage (COS) plugin by running the following command:
$ ibmcloud plugin install cos -fSet a bucket name by running the following command:
$ BUCKET=<bucket_name>Set a bucket region by running the following command:
$ REGION=<bucket_region>where:
<bucket_region>-
Specifies the bucket region. For example,
eu-gb.
Create a resource group by running the following command:
$ ibmcloud resource group-create <resource_group_name>Set the target resource group by running the following command:
$ ibmcloud target -g <resource_group_name>Verify that the target resource group is correctly set by running the following command:
$ ibmcloud targetExample output
API endpoint: https://cloud.ibm.com Region: User: test-user Account: Test Account (fb6......e95) <-> 2...122 Resource group: DefaultIn the example output, the resource group is set to
.DefaultSet a resource group name by running the following command:
$ RESOURCE_GROUP=<resource_group>where:
<resource_group>-
Specifies the resource group name. For example,
"default".
Create an IBM Cloud
resource by running the following command:service-instance$ ibmcloud resource service-instance-create \ <service_instance_name> \ <service_name> \ <service_plan> \ <region_name>where:
<service_instance_name>-
Specifies a name for the
service-instanceresource. <service_name>- Specifies the service name. Alternatively, you can specify a service ID.
<service_plan>- Specifies the service plan for your IBM Cloud account.
<region_name>- Specifies the region name.
Refer to the following example command:
$ ibmcloud resource service-instance-create test-service-instance cloud-object-storage \ standard \ global \ -d premium-global-deploymentwhere:
cloud-object-storage- Specifies the service name.
-d premium-global-deployment- Specifies the deployment name.
Extract the service instance ID by running the following command:
$ SERVICE_INSTANCE_ID=$(ibmcloud resource service-instance test-service-instance --output json | jq -r '.[0].id')Create a COS bucket by running the following command:
$ ibmcloud cos bucket-create \ --bucket $BUCKET \ --ibm-service-instance-id $SERVICE_INSTANCE_ID \ --region $REGIONVariables such as
,$BUCKET, and$SERVICE_INSTANCE_IDare replaced by the values you set previously.$REGIONCreate
credentials by running the following command.HMAC$ ibmcloud resource service-key-create test-key Writer --instance-name test-service-instance --parameters {\"HMAC\":true}Extract the access key ID and the secret access key from the
credentials and save them in theHMACfile. You can use thecredentials-velerofile to create acredentials-velerofor the backup storage location. Run the following command:secret$ cat > credentials-velero << __EOF__ [default] aws_access_key_id=$(ibmcloud resource service-key test-key -o json | jq -r '.[0].credentials.cos_hmac_keys.access_key_id') aws_secret_access_key=$(ibmcloud resource service-key test-key -o json | jq -r '.[0].credentials.cos_hmac_keys.secret_access_key') __EOF__
4.8.1.2. Creating a default Secret Copier lienLien copié sur presse-papiers!
You create a default
Secret
The
DataProtectionApplication
Secret
Secret
If you do not want to use the backup location credentials during the installation, you can create a
Secret
credentials-velero
Prerequisites
- Your object storage and cloud storage, if any, must use the same credentials.
- You must configure object storage for Velero.
Procedure
-
Create a file for the backup storage location in the appropriate format for your cloud provider.
credentials-velero Create a
custom resource (CR) with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-veleroThe
is referenced in theSecretblock of thespec.backupLocations.credentialCR when you install the Data Protection Application.DataProtectionApplication
4.8.1.3. Creating secrets for different credentials Copier lienLien copié sur presse-papiers!
Create separate
Secret
Procedure
-
Create a file for the snapshot location in the appropriate format for your cloud provider.
credentials-velero Create a
for the snapshot location with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-velero-
Create a file for the backup location in the appropriate format for your object storage.
credentials-velero Create a
for the backup location with a custom name:Secret$ oc create secret generic <custom_secret> -n openshift-adp --from-file cloud=credentials-veleroAdd the
with the custom name to theSecretCR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: ... backupLocations: - velero: provider: <provider> default: true credential: key: cloud name: <custom_secret> objectStorage: bucket: <bucket_name> prefix: <prefix>where:
custom_secret-
Specifies the backup location
Secretwith custom name.
4.8.1.4. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
If the backup and snapshot locations use the same credentials, you must create a
with the default name,Secret.cloud-credentialsNoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: namespace: openshift-adp name: <dpa_name> spec: configuration: velero: defaultPlugins: - openshift - aws - csi backupLocations: - velero: provider: aws default: true objectStorage: bucket: <bucket_name> prefix: velero config: insecureSkipTLSVerify: 'true' profile: default region: <region_name> s3ForcePathStyle: 'true' s3Url: <s3_url> credential: key: cloud name: cloud-credentialswhere:
provider-
Specifies that the provider is
awswhen you use IBM Cloud as a backup storage location. bucket- Specifies the IBM Cloud Object Storage (COS) bucket name.
region-
Specifies the COS region name, for example,
eu-gb. s3Url-
Specifies the S3 URL of the COS bucket. For example,
http://s3.eu-gb.cloud-object-storage.appdomain.cloud. Here,eu-gbis the region name. Replace the region name according to your bucket region. name-
Specifies the name of the secret you created by using the access key and the secret access key from the
HMACcredentials.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
4.8.1.5. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
4.8.1.6. Configuring node agents and node labels Copier lienLien copié sur presse-papiers!
The Data Protection Application (DPA) uses the
nodeSelector
nodeSelector
Procedure
Run the node agent on any node that you choose by adding a custom label:
$ oc label node/<node_name> node-role.kubernetes.io/nodeAgent=""NoteAny label specified must match the labels on each node.
Use the same custom label in the
field, which you used for labeling nodes:DPA.spec.configuration.nodeAgent.podConfig.nodeSelectorconfiguration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/nodeAgent: ""The following example is an anti-pattern of
and does not work unless both labels,nodeSelectorandnode-role.kubernetes.io/infra: "", are on the node:node-role.kubernetes.io/worker: ""configuration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: ""
4.8.1.7. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.8.1.8. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.8.1.9. Configuring the DPA with more than one BSL Copier lienLien copié sur presse-papiers!
Configure the
DataProtectionApplication
BackupStorageLocation
For example, you have configured the following two BSLs:
- Configured one BSL in the DPA and set it as the default BSL.
-
Created another BSL independently by using the CR.
BackupStorageLocation
As you have already set the BSL created through the DPA as the default, you cannot set the independently created BSL again as the default. This means, at any given time, you can set only one BSL as the default BSL.
Prerequisites
- You must install the OADP Operator.
- You must create the secrets by using the credentials provided by the cloud provider.
Procedure
Configure the
CR with more than oneDataProtectionApplicationCR. See the following example:BackupStorageLocationExample DPA
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication #... backupLocations: - name: aws velero: provider: aws default: true objectStorage: bucket: <bucket_name> prefix: <prefix> config: region: <region_name> profile: "default" credential: key: cloud name: cloud-credentials - name: odf velero: provider: aws default: false objectStorage: bucket: <bucket_name> prefix: <prefix> config: profile: "default" region: <region_name> s3Url: <url> insecureSkipTLSVerify: "true" s3ForcePathStyle: "true" credential: key: cloud name: <custom_secret_name_odf> #...where:
name: aws- Specifies a name for the first BSL.
default: true-
Indicates that this BSL is the default BSL. If a BSL is not set in the
Backup CR, the default BSL is used. You can set only one BSL as the default. <bucket_name>- Specifies the bucket name.
<prefix>-
Specifies a prefix for Velero backups. For example,
velero. <region_name>- Specifies the AWS region for the bucket.
cloud-credentials-
Specifies the name of the default
Secretobject that you created. name: odf- Specifies a name for the second BSL.
<url>- Specifies the URL of the S3 endpoint.
<custom_secret_name_odf>-
Specifies the correct name for the
Secret. For example,custom_secret_name_odf. If you do not specify aSecretname, the default name is used.
Specify the BSL to be used in the backup CR. See the following example.
Example backup CR
apiVersion: velero.io/v1 kind: Backup # ... spec: includedNamespaces: - <namespace> storageLocation: <backup_storage_location> defaultVolumesToFsBackup: truewhere:
<namespace>- Specifies the namespace to back up.
<backup_storage_location>- Specifies the storage location.
4.8.1.10. Disabling the node agent in DataProtectionApplication Copier lienLien copié sur presse-papiers!
If you are not using
Restic
Kopia
DataMover
nodeAgent
DataProtectionApplication
nodeAgent
Procedure
To disable the
, set thenodeAgentflag toenable. See the following example:falseExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: false uploaderType: kopia # ...where:
enable- Enables the node agent.
To enable the
, set thenodeAgentflag toenable. See the following example:trueExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: true uploaderType: kopia # ...where:
enableEnables the node agent.
You can set up a job to enable and disable the
field in thenodeAgentCR. For more information, see "Running tasks in pods using jobs".DataProtectionApplication
4.9. Configuring OADP with Azure Copier lienLien copié sur presse-papiers!
4.9.1. Configuring the OpenShift API for Data Protection with Microsoft Azure Copier lienLien copié sur presse-papiers!
Configure the OpenShift API for Data Protection (OADP) with Microsoft Azure to back up and restore cluster resources by using Azure storage. This provides data protection capabilities for your OpenShift Container Platform clusters.
The OADP Operator installs Velero 1.14.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
You configure Azure for Velero, create a default
Secret
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. See Using Operator Lifecycle Manager on restricted networks for details.
4.9.1.1. Configuring Microsoft Azure Copier lienLien copié sur presse-papiers!
Configure Microsoft Azure storage and service principal credentials for backup storage with OADP. This provides the necessary authentication and storage infrastructure for data protection operations.
Prerequisites
- You must have the Azure CLI installed.
Tools that use Azure services should always have restricted permissions to make sure that Azure resources are safe. Therefore, instead of having applications sign in as a fully privileged user, Azure offers service principals. An Azure service principal is a name that can be used with applications, hosted services, or automated tools.
This identity is used for access to resources.
- Create a service principal
- Sign in using a service principal and password
- Sign in using a service principal and certificate
- Manage service principal roles
- Create an Azure resource using a service principal
- Reset service principal credentials
For more details, see Create an Azure service principal with Azure CLI.
Procedure
Log in to Azure:
$ az loginSet the
variable:AZURE_RESOURCE_GROUP$ AZURE_RESOURCE_GROUP=Velero_BackupsCreate an Azure resource group:
$ az group create -n $AZURE_RESOURCE_GROUP --location CentralUSwhere:
CentralUS- Specifies your location.
Set the
variable:AZURE_STORAGE_ACCOUNT_ID$ AZURE_STORAGE_ACCOUNT_ID="velero$(uuidgen | cut -d '-' -f5 | tr '[A-Z]' '[a-z]')"Create an Azure storage account:
$ az storage account create \ --name $AZURE_STORAGE_ACCOUNT_ID \ --resource-group $AZURE_RESOURCE_GROUP \ --sku Standard_GRS \ --encryption-services blob \ --https-only true \ --kind BlobStorage \ --access-tier HotSet the
variable:BLOB_CONTAINER$ BLOB_CONTAINER=veleroCreate an Azure Blob storage container:
$ az storage container create \ -n $BLOB_CONTAINER \ --public-access off \ --account-name $AZURE_STORAGE_ACCOUNT_IDCreate a service principal and credentials for
:velero$ AZURE_SUBSCRIPTION_ID=`az account list --query '[?isDefault].id' -o tsv` AZURE_TENANT_ID=`az account list --query '[?isDefault].tenantId' -o tsv`Create a service principal with the
role, assigning a specificContributorand--role:--scopes$ AZURE_CLIENT_SECRET=`az ad sp create-for-rbac --name "velero" \ --role "Contributor" \ --query 'password' -o tsv \ --scopes /subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AZURE_RESOURCE_GROUP`The CLI generates a password for you. Ensure you capture the password.
After creating the service principal, obtain the client id.
$ AZURE_CLIENT_ID=`az ad app credential list --id <your_app_id>`NoteFor this to be successful, you must know your Azure application ID.
Save the service principal credentials in the
file:credentials-velero$ cat << EOF > ./credentials-velero AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID} AZURE_TENANT_ID=${AZURE_TENANT_ID} AZURE_CLIENT_ID=${AZURE_CLIENT_ID} AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET} AZURE_RESOURCE_GROUP=${AZURE_RESOURCE_GROUP} AZURE_CLOUD_NAME=AzurePublicCloud EOFYou use the
file to add Azure as a replication repository.credentials-velero
4.9.1.2. About backup and snapshot locations and their secrets Copier lienLien copié sur presse-papiers!
Review backup location, snapshot location, and secret configuration requirements for the
DataProtectionApplication
4.9.1.2.1. Backup locations Copier lienLien copié sur presse-papiers!
You can specify one of the following AWS S3-compatible object storage solutions as a backup location:
- Multicloud Object Gateway (MCG)
- Red Hat Container Storage
- Ceph RADOS Gateway; also known as Ceph Object Gateway
- Red Hat OpenShift Data Foundation
- MinIO
Velero backs up OpenShift Container Platform resources, Kubernetes objects, and internal images as an archive file on object storage.
4.9.1.2.2. Snapshot locations Copier lienLien copié sur presse-papiers!
If you use your cloud provider’s native snapshot API to back up persistent volumes, you must specify the cloud provider as the snapshot location.
If you use Container Storage Interface (CSI) snapshots, you do not need to specify a snapshot location because you will create a
VolumeSnapshotClass
If you use File System Backup (FSB), you do not need to specify a snapshot location because FSB backs up the file system on object storage.
4.9.1.2.3. Secrets Copier lienLien copié sur presse-papiers!
If the backup and snapshot locations use the same credentials or if you do not require a snapshot location, you create a default
Secret
If the backup and snapshot locations use different credentials, you create two secret objects:
-
Custom for the backup location, which you specify in the
SecretCR.DataProtectionApplication -
Default for the snapshot location, which is not referenced in the
SecretCR.DataProtectionApplication
The Data Protection Application requires a default
Secret
If you do not want to specify backup or snapshot locations during the installation, you can create a default
Secret
credentials-velero
4.9.1.3. About authenticating OADP with Azure Copier lienLien copié sur presse-papiers!
Review authentication methods for OADP with Azure to select the appropriate authentication approach for your security requirements.
You can authenticate OADP with Azure by using the following methods:
- A Velero-specific service principal with secret-based authentication.
- A Velero-specific storage account access key with secret-based authentication.
4.9.1.4. Using a service principal or a storage account access key Copier lienLien copié sur presse-papiers!
You create a default
Secret
Secret
The default name of the
Secret
cloud-credentials-azure
The
DataProtectionApplication
Secret
Secret
If you do not want to use the backup location credentials during the installation, you can create a
Secret
credentials-velero
Prerequisites
-
You have access to the OpenShift cluster as a user with privileges.
cluster-admin - You have an Azure subscription with appropriate permissions.
- You have installed OADP.
- You have configured an object storage for storing the backups.
Procedure
Create a
file for the backup storage location in the appropriate format for your cloud provider.credentials-veleroYou can use one of the following two methods to authenticate OADP with Azure.
Use the service principal with secret-based authentication. See the following example:
AZURE_SUBSCRIPTION_ID=<azure_subscription_id> AZURE_TENANT_ID=<azure_tenant_id> AZURE_CLIENT_ID=<azure_client_id> AZURE_CLIENT_SECRET=<azure_client_secret> AZURE_RESOURCE_GROUP=<azure_resource_group> AZURE_CLOUD_NAME=<azure_cloud_name>Use a storage account access key. See the following example:
AZURE_STORAGE_ACCOUNT_ACCESS_KEY=<azure_storage_account_access_key> AZURE_SUBSCRIPTION_ID=<azure_subscription_id> AZURE_RESOURCE_GROUP=<azure_resource_group> AZURE_CLOUD_NAME=<azure_cloud_name>
Create a
custom resource (CR) with the default name:Secret$ oc create secret generic cloud-credentials-azure -n openshift-adp --from-file cloud=credentials-veleroReference the
in theSecretblock of thespec.backupLocations.velero.credentialCR when you install the Data Protection Application as shown in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: ... backupLocations: - velero: config: resourceGroup: <azure_resource_group> storageAccount: <azure_storage_account_id> subscriptionId: <azure_subscription_id> credential: key: cloud name: <custom_secret> provider: azure default: true objectStorage: bucket: <bucket_name> prefix: <prefix> snapshotLocations: - velero: config: resourceGroup: <azure_resource_group> subscriptionId: <azure_subscription_id> incremental: "true" provider: azurewhere:
<custom_secret>-
Specifies the backup location
Secretwith custom name.
You can configure the Data Protection Application by setting Velero resource allocations or enabling self-signed CA certificates.
4.9.1.5. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
Use the
nodeSelector
nodeSelector
4.9.1.6. Enabling self-signed CA certificates Copier lienLien copié sur presse-papiers!
You must enable a self-signed CA certificate for object storage by editing the
DataProtectionApplication
certificate signed by unknown authority
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the
parameter andspec.backupLocations.velero.objectStorage.caCertparameters of thespec.backupLocations.velero.configCR manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket> prefix: <prefix> caCert: <base64_encoded_cert_string> config: insecureSkipTLSVerify: "false" # ...where:
caCert- Specifies the Base64-encoded CA certificate string.
insecureSkipTLSVerify-
Specifies the
insecureSkipTLSVerifyconfiguration. The configuration can be set to either"true"or"false". If set to"true", SSL/TLS security is disabled. If set to"false", SSL/TLS security is enabled.
4.9.1.6.1. Using CA certificates with the velero command aliased for Velero deployment Copier lienLien copié sur presse-papiers!
You might want to use the Velero CLI without installing it locally on your system by creating an alias for it.
Prerequisites
-
You must be logged in to the OpenShift Container Platform cluster as a user with the role.
cluster-admin You must have the OpenShift CLI (
) installed. .ProcedureocTo use an aliased Velero command, run the following command:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'Check that the alias is working by running the following command:
$ velero versionClient: Version: v1.12.1-OADP Git commit: - Server: Version: v1.12.1-OADPTo use a CA certificate with this command, you can add a certificate to the Velero deployment by running the following commands:
$ CA_CERT=$(oc -n openshift-adp get dataprotectionapplications.oadp.openshift.io <dpa-name> -o jsonpath='{.spec.backupLocations[0].velero.objectStorage.caCert}')$ [[ -n $CA_CERT ]] && echo "$CA_CERT" | base64 -d | oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "cat > /tmp/your-cacert.txt" || echo "DPA BSL has no caCert"$ velero describe backup <backup_name> --details --cacert /tmp/<your_cacert>.txtTo fetch the backup logs, run the following command:
$ velero backup logs <backup_name> --cacert /tmp/<your_cacert.txt>You can use these logs to view failures and warnings for the resources that you cannot back up.
-
If the Velero pod restarts, the file disappears, and you must re-create the
/tmp/your-cacert.txtfile by re-running the commands from the previous step./tmp/your-cacert.txt You can check if the
file still exists, in the file location where you stored it, by running the following command:/tmp/your-cacert.txt$ oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "ls /tmp/your-cacert.txt" /tmp/your-cacert.txtIn a future release of OpenShift API for Data Protection (OADP), we plan to mount the certificate to the Velero pod so that this step is not required.
4.9.1.7. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
-
If the backup and snapshot locations use the same credentials, you must create a with the default name,
Secret.cloud-credentials-azure If the backup and snapshot locations use different credentials, you must create two
:Secrets-
with a custom name for the backup location. You add this
Secretto theSecretCR.DataProtectionApplication -
with another custom name for the snapshot location. You add this
Secretto theSecretCR.DataProtectionApplication
NoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret-
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: configuration: velero: defaultPlugins: - azure - openshift resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia podConfig: nodeSelector: <node_selector> backupLocations: - velero: config: resourceGroup: <azure_resource_group> storageAccount: <azure_storage_account_id> subscriptionId: <azure_subscription_id> credential: key: cloud name: cloud-credentials-azure provider: azure default: true objectStorage: bucket: <bucket_name> prefix: <prefix> snapshotLocations: - velero: config: resourceGroup: <azure_resource_group> subscriptionId: <azure_subscription_id> incremental: "true" name: default provider: azure credential: key: cloud name: cloud-credentials-azurewhere:
namespace-
Specifies the default namespace for OADP which is
openshift-adp. The namespace is a variable and is configurable. openshift-
Specifies that the
openshiftplugin is mandatory. resourceTimeout- Specifies how many minutes to wait for several Velero resources such as Velero CRD availability, volumeSnapshot deletion, and backup repository availability, before timeout occurs. The default is 10m.
nodeAgent- Specifies the administrative agent that routes the administrative requests to servers.
enable-
Set this value to
trueif you want to enablenodeAgentand perform File System Backup. uploaderType-
Specifies the uploader type. Enter
kopiaorresticas your uploader. You cannot change the selection after the installation. For the Built-in DataMover you must use Kopia. ThenodeAgentdeploys a daemon set, which means that thenodeAgentpods run on each working node. You can configure File System Backup by addingspec.defaultVolumesToFsBackup: trueto theBackupCR. nodeSelector- Specifies the nodes on which Kopia or Restic are available. By default, Kopia or Restic run on all nodes.
resourceGroup- Specifies the Azure resource group.
storageAccount- Specifies the Azure storage account ID.
subscriptionId- Specifies the Azure subscription ID.
name-
Specifies the name of the
Secretobject. If you do not specify this value, the default name,cloud-credentials-azure, is used. If you specify a custom name, the custom name is used for the backup location. bucket- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
prefix-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes. snapshotLocations- Specifies the snapshot location. You do not need to specify a snapshot location if you use CSI snapshots or Restic to back up PVs.
name-
Specifies the name of the
Secretobject that you created. If you do not specify this value, the default name,cloud-credentials-azure, is used. If you specify a custom name, the custom name is used for the backup location.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
4.9.1.8. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.9.1.9. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.9.1.9.1. Configuring node agents and node labels Copier lienLien copié sur presse-papiers!
The Data Protection Application (DPA) uses the
nodeSelector
nodeSelector
Procedure
Run the node agent on any node that you choose by adding a custom label:
$ oc label node/<node_name> node-role.kubernetes.io/nodeAgent=""NoteAny label specified must match the labels on each node.
Use the same custom label in the
field, which you used for labeling nodes:DPA.spec.configuration.nodeAgent.podConfig.nodeSelectorconfiguration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/nodeAgent: ""The following example is an anti-pattern of
and does not work unless both labels,nodeSelectorandnode-role.kubernetes.io/infra: "", are on the node:node-role.kubernetes.io/worker: ""configuration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: ""
4.9.1.9.2. Enabling CSI in the DataProtectionApplication CR Copier lienLien copié sur presse-papiers!
You enable the Container Storage Interface (CSI) in the
DataProtectionApplication
Prerequisites
- The cloud provider must support CSI snapshots.
Procedure
Edit the
CR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... spec: configuration: velero: defaultPlugins: - openshift - csiwhere:
csi-
Specifies the
csidefault plugin.
4.9.1.9.3. Disabling the node agent in DataProtectionApplication Copier lienLien copié sur presse-papiers!
If you are not using
Restic
Kopia
DataMover
nodeAgent
DataProtectionApplication
nodeAgent
Procedure
To disable the
, set thenodeAgentflag toenable. See the following example:falseExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: false uploaderType: kopia # ...where:
enable- Enables the node agent.
To enable the
, set thenodeAgentflag toenable. See the following example:trueExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: true uploaderType: kopia # ...where:
enableEnables the node agent.
You can set up a job to enable and disable the
field in thenodeAgentCR. For more information, see "Running tasks in pods using jobs".DataProtectionApplication
4.10. Configuring OADP with Google Cloud Copier lienLien copié sur presse-papiers!
4.10.1. Configuring the OpenShift API for Data Protection with Google Cloud Copier lienLien copié sur presse-papiers!
You install the OpenShift API for Data Protection (OADP) with Google Cloud by installing the OADP Operator. The Operator installs Velero 1.14.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
You configure Google Cloud for Velero, create a default
Secret
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. See Using Operator Lifecycle Manager on restricted networks for details.
4.10.1.1. Configuring Google Cloud Copier lienLien copié sur presse-papiers!
You configure Google Cloud for the OpenShift API for Data Protection (OADP).
Prerequisites
-
You must have the and
gcloudCLI tools installed. See the Google cloud documentation for details.gsutil
Procedure
Log in to Google Cloud:
$ gcloud auth loginSet the
variable:BUCKET$ BUCKET=<bucket>where:
bucket- Specifies the bucket name.
Create the storage bucket:
$ gsutil mb gs://$BUCKET/Set the
variable to your active project:PROJECT_ID$ PROJECT_ID=$(gcloud config get-value project)Create a service account:
$ gcloud iam service-accounts create velero \ --display-name "Velero service account"List your service accounts:
$ gcloud iam service-accounts listSet the
variable to match itsSERVICE_ACCOUNT_EMAILvalue:email$ SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \ --filter="displayName:Velero service account" \ --format 'value(email)')Attach the policies to give the
user the minimum necessary permissions:velero$ ROLE_PERMISSIONS=( compute.disks.get compute.disks.create compute.disks.createSnapshot compute.snapshots.get compute.snapshots.create compute.snapshots.useReadOnly compute.snapshots.delete compute.zones.get storage.objects.create storage.objects.delete storage.objects.get storage.objects.list iam.serviceAccounts.signBlob )Create the
custom role:velero.server$ gcloud iam roles create velero.server \ --project $PROJECT_ID \ --title "Velero Server" \ --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"Add IAM policy binding to the project:
$ gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \ --role projects/$PROJECT_ID/roles/velero.serverUpdate the IAM service account:
$ gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}Save the IAM service account keys to the
file in the current directory:credentials-velero$ gcloud iam service-accounts keys create credentials-velero \ --iam-account $SERVICE_ACCOUNT_EMAILYou use the
file to create acredentials-veleroobject for Google Cloud before you install the Data Protection Application.Secret
4.10.1.2. About backup and snapshot locations and their secrets Copier lienLien copié sur presse-papiers!
Review backup location, snapshot location, and secret configuration requirements for the
DataProtectionApplication
4.10.1.2.1. Backup locations Copier lienLien copié sur presse-papiers!
You can specify one of the following AWS S3-compatible object storage solutions as a backup location:
- Multicloud Object Gateway (MCG)
- Red Hat Container Storage
- Ceph RADOS Gateway; also known as Ceph Object Gateway
- Red Hat OpenShift Data Foundation
- MinIO
Velero backs up OpenShift Container Platform resources, Kubernetes objects, and internal images as an archive file on object storage.
4.10.1.2.2. Snapshot locations Copier lienLien copié sur presse-papiers!
If you use your cloud provider’s native snapshot API to back up persistent volumes, you must specify the cloud provider as the snapshot location.
If you use Container Storage Interface (CSI) snapshots, you do not need to specify a snapshot location because you will create a
VolumeSnapshotClass
If you use File System Backup (FSB), you do not need to specify a snapshot location because FSB backs up the file system on object storage.
4.10.1.2.3. Secrets Copier lienLien copié sur presse-papiers!
If the backup and snapshot locations use the same credentials or if you do not require a snapshot location, you create a default
Secret
If the backup and snapshot locations use different credentials, you create two secret objects:
-
Custom for the backup location, which you specify in the
SecretCR.DataProtectionApplication -
Default for the snapshot location, which is not referenced in the
SecretCR.DataProtectionApplication
The Data Protection Application requires a default
Secret
If you do not want to specify backup or snapshot locations during the installation, you can create a default
Secret
credentials-velero
4.10.1.2.4. Creating a default Secret Copier lienLien copié sur presse-papiers!
You create a default
Secret
The default name of the
Secret
cloud-credentials-gcp
The
DataProtectionApplication
Secret
Secret
If you do not want to use the backup location credentials during the installation, you can create a
Secret
credentials-velero
Prerequisites
- Your object storage and cloud storage, if any, must use the same credentials.
- You must configure object storage for Velero.
Procedure
-
Create a file for the backup storage location in the appropriate format for your cloud provider.
credentials-velero Create a
custom resource (CR) with the default name:Secret$ oc create secret generic cloud-credentials-gcp -n openshift-adp --from-file cloud=credentials-veleroThe
is referenced in theSecretblock of thespec.backupLocations.credentialCR when you install the Data Protection Application.DataProtectionApplication
4.10.1.2.5. Creating secrets for different credentials Copier lienLien copié sur presse-papiers!
Create separate
Secret
Procedure
-
Create a file for the snapshot location in the appropriate format for your cloud provider.
credentials-velero Create a
for the snapshot location with the default name:Secret$ oc create secret generic cloud-credentials-gcp -n openshift-adp --from-file cloud=credentials-velero-
Create a file for the backup location in the appropriate format for your object storage.
credentials-velero Create a
for the backup location with a custom name:Secret$ oc create secret generic <custom_secret> -n openshift-adp --from-file cloud=credentials-veleroAdd the
with the custom name to theSecretCR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: ... backupLocations: - velero: provider: gcp default: true credential: key: cloud name: <custom_secret> objectStorage: bucket: <bucket_name> prefix: <prefix> snapshotLocations: - velero: provider: gcp default: true config: project: <project> snapshotLocation: us-west1where:
custom_secret-
Specifies the backup location
Secretwith custom name.
4.10.1.2.6. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
Use the
nodeSelector
nodeSelector
4.10.1.2.7. Enabling self-signed CA certificates Copier lienLien copié sur presse-papiers!
You must enable a self-signed CA certificate for object storage by editing the
DataProtectionApplication
certificate signed by unknown authority
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the
parameter andspec.backupLocations.velero.objectStorage.caCertparameters of thespec.backupLocations.velero.configCR manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket> prefix: <prefix> caCert: <base64_encoded_cert_string> config: insecureSkipTLSVerify: "false" # ...where:
caCert- Specifies the Base64-encoded CA certificate string.
insecureSkipTLSVerify-
Specifies the
insecureSkipTLSVerifyconfiguration. The configuration can be set to either"true"or"false". If set to"true", SSL/TLS security is disabled. If set to"false", SSL/TLS security is enabled.
4.10.1.2.8. Using CA certificates with the velero command aliased for Velero deployment Copier lienLien copié sur presse-papiers!
You might want to use the Velero CLI without installing it locally on your system by creating an alias for it.
Prerequisites
-
You must be logged in to the OpenShift Container Platform cluster as a user with the role.
cluster-admin You must have the OpenShift CLI (
) installed. .ProcedureocTo use an aliased Velero command, run the following command:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'Check that the alias is working by running the following command:
$ velero versionClient: Version: v1.12.1-OADP Git commit: - Server: Version: v1.12.1-OADPTo use a CA certificate with this command, you can add a certificate to the Velero deployment by running the following commands:
$ CA_CERT=$(oc -n openshift-adp get dataprotectionapplications.oadp.openshift.io <dpa-name> -o jsonpath='{.spec.backupLocations[0].velero.objectStorage.caCert}')$ [[ -n $CA_CERT ]] && echo "$CA_CERT" | base64 -d | oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "cat > /tmp/your-cacert.txt" || echo "DPA BSL has no caCert"$ velero describe backup <backup_name> --details --cacert /tmp/<your_cacert>.txtTo fetch the backup logs, run the following command:
$ velero backup logs <backup_name> --cacert /tmp/<your_cacert.txt>You can use these logs to view failures and warnings for the resources that you cannot back up.
-
If the Velero pod restarts, the file disappears, and you must re-create the
/tmp/your-cacert.txtfile by re-running the commands from the previous step./tmp/your-cacert.txt You can check if the
file still exists, in the file location where you stored it, by running the following command:/tmp/your-cacert.txt$ oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "ls /tmp/your-cacert.txt" /tmp/your-cacert.txtIn a future release of OpenShift API for Data Protection (OADP), we plan to mount the certificate to the Velero pod so that this step is not required.
4.10.1.3. Google workload identity federation cloud authentication Copier lienLien copié sur presse-papiers!
Applications running outside Google Cloud use service account keys, such as usernames and passwords, to gain access to Google Cloud resources. These service account keys might become a security risk if they are not properly managed.
With Google’s workload identity federation, you can use Identity and Access Management (IAM) to offer IAM roles, including the ability to impersonate service accounts, to external identities. This eliminates the maintenance and security risks associated with service account keys.
Workload identity federation handles encrypting and decrypting certificates, extracting user attributes, and validation. Identity federation externalizes authentication, passing it over to Security Token Services (STS), and reduces the demands on individual developers. Authorization and controlling access to resources remain the responsibility of the application.
Google workload identity federation is available for OADP 1.3.x and later.
When backing up volumes, OADP on Google Cloud with Google workload identity federation authentication only supports CSI snapshots.
OADP on Google Cloud with Google workload identity federation authentication does not support Volume Snapshot Locations (VSL) backups. VSL backups finish with a
PartiallyFailed
If you do not use Google workload identity federation cloud authentication, continue to Installing the Data Protection Application.
Prerequisites
- You have installed a cluster in manual mode with Google Cloud Workload Identity configured.
-
You have access to the Cloud Credential Operator utility () and to the associated workload identity pool.
ccoctl
Procedure
Create an
directory by running the following command:oadp-credrequest$ mkdir -p oadp-credrequestCreate a
file as following:CredentialsRequest.yamlecho 'apiVersion: cloudcredential.openshift.io/v1 kind: CredentialsRequest metadata: name: oadp-operator-credentials namespace: openshift-cloud-credential-operator spec: providerSpec: apiVersion: cloudcredential.openshift.io/v1 kind: GCPProviderSpec permissions: - compute.disks.get - compute.disks.create - compute.disks.createSnapshot - compute.snapshots.get - compute.snapshots.create - compute.snapshots.useReadOnly - compute.snapshots.delete - compute.zones.get - storage.objects.create - storage.objects.delete - storage.objects.get - storage.objects.list - iam.serviceAccounts.signBlob skipServiceCheck: true secretRef: name: cloud-credentials-gcp namespace: <OPERATOR_INSTALL_NS> serviceAccountNames: - velero ' > oadp-credrequest/credrequest.yamlUse the
utility to process theccoctlobjects in theCredentialsRequestdirectory by running the following command:oadp-credrequest$ ccoctl gcp create-service-accounts \ --name=<name> \ --project=<gcp_project_id> \ --credentials-requests-dir=oadp-credrequest \ --workload-identity-pool=<pool_id> \ --workload-identity-provider=<provider_id>The
file is now available to use in the following steps.manifests/openshift-adp-cloud-credentials-gcp-credentials.yamlCreate a namespace by running the following command:
$ oc create namespace <OPERATOR_INSTALL_NS>Apply the credentials to the namespace by running the following command:
$ oc apply -f manifests/openshift-adp-cloud-credentials-gcp-credentials.yaml
4.10.1.4. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
-
If the backup and snapshot locations use the same credentials, you must create a with the default name,
Secret.cloud-credentials-gcp If the backup and snapshot locations use different credentials, you must create two
:Secrets-
with a custom name for the backup location. You add this
Secretto theSecretCR.DataProtectionApplication -
with another custom name for the snapshot location. You add this
Secretto theSecretCR.DataProtectionApplication
NoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret-
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: <OPERATOR_INSTALL_NS> spec: configuration: velero: defaultPlugins: - gcp - openshift resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia podConfig: nodeSelector: <node_selector> backupLocations: - velero: provider: gcp default: true credential: key: cloud name: cloud-credentials-gcp objectStorage: bucket: <bucket_name> prefix: <prefix> snapshotLocations: - velero: provider: gcp default: true config: project: <project> snapshotLocation: us-west1 credential: key: cloud name: cloud-credentials-gcp backupImages: truewhere:
namespace-
Specifies the default namespace for OADP which is
openshift-adp. The namespace is a variable and is configurable. openshift-
Specifies that the
openshiftplugin is mandatory. resourceTimeout- Specifies how many minutes to wait for several Velero resources such as Velero CRD availability, volumeSnapshot deletion, and backup repository availability, before timeout occurs. The default is 10m.
nodeAgent- Specifies the administrative agent that routes the administrative requests to servers.
enable-
Set this value to
trueif you want to enablenodeAgentand perform File System Backup. uploaderType-
Specifies the uploader type. Enter
kopiaorresticas your uploader. You cannot change the selection after the installation. For the Built-in DataMover you must use Kopia. ThenodeAgentdeploys a daemon set, which means that thenodeAgentpods run on each working node. You can configure File System Backup by addingspec.defaultVolumesToFsBackup: trueto theBackupCR. nodeSelector- Specifies the nodes on which Kopia or Restic are available. By default, Kopia or Restic run on all nodes.
key-
Specifies the secret key that contains credentials. For Google workload identity federation cloud authentication use
service_account.json. name-
Specifies the secret name that contains credentials. If you do not specify this value, the default name,
cloud-credentials-gcp, is used. bucket- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
prefix-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes. snapshotLocations- Specifies a snapshot location, unless you use CSI snapshots or Restic to back up PVs.
snapshotLocation- Specifies that the snapshot location must be in the same region as the PVs.
name-
Specifies the name of the
Secretobject that you created. If you do not specify this value, the default name,cloud-credentials-gcp, is used. If you specify a custom name, the custom name is used for the backup location. backupImages-
Specifies that Google workload identity federation supports internal image backup. Set this field to
falseif you do not want to use image backup.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
4.10.1.5. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.10.1.6. Configuring node agents and node labels Copier lienLien copié sur presse-papiers!
The Data Protection Application (DPA) uses the
nodeSelector
nodeSelector
Procedure
Run the node agent on any node that you choose by adding a custom label:
$ oc label node/<node_name> node-role.kubernetes.io/nodeAgent=""NoteAny label specified must match the labels on each node.
Use the same custom label in the
field, which you used for labeling nodes:DPA.spec.configuration.nodeAgent.podConfig.nodeSelectorconfiguration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/nodeAgent: ""The following example is an anti-pattern of
and does not work unless both labels,nodeSelectorandnode-role.kubernetes.io/infra: "", are on the node:node-role.kubernetes.io/worker: ""configuration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: ""
4.10.1.7. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.10.1.7.1. Enabling CSI in the DataProtectionApplication CR Copier lienLien copié sur presse-papiers!
You enable the Container Storage Interface (CSI) in the
DataProtectionApplication
Prerequisites
- The cloud provider must support CSI snapshots.
Procedure
Edit the
CR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... spec: configuration: velero: defaultPlugins: - openshift - csiwhere:
csi-
Specifies the
csidefault plugin.
4.10.1.7.2. Disabling the node agent in DataProtectionApplication Copier lienLien copié sur presse-papiers!
If you are not using
Restic
Kopia
DataMover
nodeAgent
DataProtectionApplication
nodeAgent
Procedure
To disable the
, set thenodeAgentflag toenable. See the following example:falseExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: false uploaderType: kopia # ...where:
enable- Enables the node agent.
To enable the
, set thenodeAgentflag toenable. See the following example:trueExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: true uploaderType: kopia # ...where:
enableEnables the node agent.
You can set up a job to enable and disable the
field in thenodeAgentCR. For more information, see "Running tasks in pods using jobs".DataProtectionApplication
4.11. Configuring OADP with MCG Copier lienLien copié sur presse-papiers!
4.11.1. Configuring the OpenShift API for Data Protection with Multicloud Object Gateway Copier lienLien copié sur presse-papiers!
Configure OpenShift API for Data Protection (OADP) to use Multicloud Object Gateway (MCG), a component of OpenShift Data Foundation, as a backup storage location by setting up credentials, secrets, and the Data Protection Application.
You can install the OpenShift API for Data Protection (OADP) with MCG by installing the OADP Operator. The Operator installs Velero 1.14.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
The
CloudStorage
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can create a
Secret
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. For details, see Using Operator Lifecycle Manager on restricted networks.
4.11.1.1. Retrieving Multicloud Object Gateway credentials Copier lienLien copié sur presse-papiers!
Retrieve the Multicloud Object Gateway (MCG) bucket credentials to create a
Secret
Although the MCG Operator is deprecated, the MCG plugin is still available for OpenShift Data Foundation. To download the plugin, browse to Download Red Hat OpenShift Data Foundation and download the appropriate MCG plugin for your operating system.
Prerequisites
- You must deploy OpenShift Data Foundation by using the appropriate Red Hat OpenShift Data Foundation deployment guide.
Procedure
- Create an MCG bucket. For more information, see Managing hybrid and multicloud resources.
-
Obtain the S3 endpoint, ,
AWS_ACCESS_KEY_ID, and the bucket name by running theAWS_SECRET_ACCESS_KEYcommand on the bucket resource.oc describe Create a
file:credentials-velero$ cat << EOF > ./credentials-velero [default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY> EOFYou can use the
file to create acredentials-veleroobject when you install the Data Protection Application.Secret
4.11.1.2. About backup and snapshot locations and their secrets Copier lienLien copié sur presse-papiers!
Review backup location, snapshot location, and secret configuration requirements for the
DataProtectionApplication
4.11.1.2.1. Backup locations Copier lienLien copié sur presse-papiers!
You can specify one of the following AWS S3-compatible object storage solutions as a backup location:
- Multicloud Object Gateway (MCG)
- Red Hat Container Storage
- Ceph RADOS Gateway; also known as Ceph Object Gateway
- Red Hat OpenShift Data Foundation
- MinIO
Velero backs up OpenShift Container Platform resources, Kubernetes objects, and internal images as an archive file on object storage.
4.11.1.2.2. Snapshot locations Copier lienLien copié sur presse-papiers!
If you use your cloud provider’s native snapshot API to back up persistent volumes, you must specify the cloud provider as the snapshot location.
If you use Container Storage Interface (CSI) snapshots, you do not need to specify a snapshot location because you will create a
VolumeSnapshotClass
If you use File System Backup (FSB), you do not need to specify a snapshot location because FSB backs up the file system on object storage.
4.11.1.2.3. Secrets Copier lienLien copié sur presse-papiers!
If the backup and snapshot locations use the same credentials or if you do not require a snapshot location, you create a default
Secret
If the backup and snapshot locations use different credentials, you create two secret objects:
-
Custom for the backup location, which you specify in the
SecretCR.DataProtectionApplication -
Default for the snapshot location, which is not referenced in the
SecretCR.DataProtectionApplication
The Data Protection Application requires a default
Secret
If you do not want to specify backup or snapshot locations during the installation, you can create a default
Secret
credentials-velero
4.11.1.2.4. Creating a default Secret Copier lienLien copié sur presse-papiers!
You create a default
Secret
The default name of the
Secret
cloud-credentials
The
DataProtectionApplication
Secret
Secret
If you do not want to use the backup location credentials during the installation, you can create a
Secret
credentials-velero
Prerequisites
- Your object storage and cloud storage, if any, must use the same credentials.
- You must configure object storage for Velero.
Procedure
Create a
file for the backup storage location in the appropriate format for your cloud provider.credentials-veleroSee the following example:
[default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Create a
custom resource (CR) with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-veleroThe
is referenced in theSecretblock of thespec.backupLocations.credentialCR when you install the Data Protection Application.DataProtectionApplication
4.11.1.2.5. Creating secrets for different credentials Copier lienLien copié sur presse-papiers!
Create separate
Secret
Procedure
-
Create a file for the snapshot location in the appropriate format for your cloud provider.
credentials-velero Create a
for the snapshot location with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-velero-
Create a file for the backup location in the appropriate format for your object storage.
credentials-velero Create a
for the backup location with a custom name:Secret$ oc create secret generic <custom_secret> -n openshift-adp --from-file cloud=credentials-veleroAdd the
with the custom name to theSecretCR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: ... backupLocations: - velero: config: profile: "default" region: <region_name> s3Url: <url> insecureSkipTLSVerify: "true" s3ForcePathStyle: "true" provider: aws default: true credential: key: cloud name: <custom_secret> objectStorage: bucket: <bucket_name> prefix: <prefix>where:
region_name- Specifies the region, following the naming convention of the documentation of your object storage server.
custom_secret-
Specifies the backup location
Secretwith custom name.
4.11.1.2.6. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
Use the
nodeSelector
nodeSelector
4.11.1.2.7. Enabling self-signed CA certificates Copier lienLien copié sur presse-papiers!
You must enable a self-signed CA certificate for object storage by editing the
DataProtectionApplication
certificate signed by unknown authority
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the
parameter andspec.backupLocations.velero.objectStorage.caCertparameters of thespec.backupLocations.velero.configCR manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket> prefix: <prefix> caCert: <base64_encoded_cert_string> config: insecureSkipTLSVerify: "false" # ...where:
caCert- Specifies the Base64-encoded CA certificate string.
insecureSkipTLSVerify-
Specifies the
insecureSkipTLSVerifyconfiguration. The configuration can be set to either"true"or"false". If set to"true", SSL/TLS security is disabled. If set to"false", SSL/TLS security is enabled.
4.11.1.2.8. Using CA certificates with the velero command aliased for Velero deployment Copier lienLien copié sur presse-papiers!
You might want to use the Velero CLI without installing it locally on your system by creating an alias for it.
Prerequisites
-
You must be logged in to the OpenShift Container Platform cluster as a user with the role.
cluster-admin You must have the OpenShift CLI (
) installed. .ProcedureocTo use an aliased Velero command, run the following command:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'Check that the alias is working by running the following command:
$ velero versionClient: Version: v1.12.1-OADP Git commit: - Server: Version: v1.12.1-OADPTo use a CA certificate with this command, you can add a certificate to the Velero deployment by running the following commands:
$ CA_CERT=$(oc -n openshift-adp get dataprotectionapplications.oadp.openshift.io <dpa-name> -o jsonpath='{.spec.backupLocations[0].velero.objectStorage.caCert}')$ [[ -n $CA_CERT ]] && echo "$CA_CERT" | base64 -d | oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "cat > /tmp/your-cacert.txt" || echo "DPA BSL has no caCert"$ velero describe backup <backup_name> --details --cacert /tmp/<your_cacert>.txtTo fetch the backup logs, run the following command:
$ velero backup logs <backup_name> --cacert /tmp/<your_cacert.txt>You can use these logs to view failures and warnings for the resources that you cannot back up.
-
If the Velero pod restarts, the file disappears, and you must re-create the
/tmp/your-cacert.txtfile by re-running the commands from the previous step./tmp/your-cacert.txt You can check if the
file still exists, in the file location where you stored it, by running the following command:/tmp/your-cacert.txt$ oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "ls /tmp/your-cacert.txt" /tmp/your-cacert.txtIn a future release of OpenShift API for Data Protection (OADP), we plan to mount the certificate to the Velero pod so that this step is not required.
4.11.1.3. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
-
If the backup and snapshot locations use the same credentials, you must create a with the default name,
Secret.cloud-credentials If the backup and snapshot locations use different credentials, you must create two
:Secrets-
with a custom name for the backup location. You add this
Secretto theSecretCR.DataProtectionApplication -
with another custom name for the snapshot location. You add this
Secretto theSecretCR.DataProtectionApplication
NoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret-
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: configuration: velero: defaultPlugins: - aws - openshift resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia podConfig: nodeSelector: <node_selector> backupLocations: - velero: config: profile: "default" region: <region_name> s3Url: <url> insecureSkipTLSVerify: "true" s3ForcePathStyle: "true" provider: aws default: true credential: key: cloud name: cloud-credentials objectStorage: bucket: <bucket_name> prefix: <prefix>where:
namespace-
Specifies the default namespace for OADP which is
openshift-adp. The namespace is a variable and is configurable. aws-
Specifies that an object store plugin corresponding to your storage locations is required. For all S3 providers, the required plugin is
aws. For Azure and Google Cloud object stores, theazureorgcpplugin is required. openshift-
Specifies that the
openshiftplugin is mandatory. resourceTimeout- Specifies how many minutes to wait for several Velero resources such as Velero CRD availability, volumeSnapshot deletion, and backup repository availability, before timeout occurs. The default is 10m.
nodeAgent- Specifies the administrative agent that routes the administrative requests to servers.
enable-
Set this value to
trueif you want to enablenodeAgentand perform File System Backup. uploaderType-
Specifies the uploader type. Enter
kopiaorresticas your uploader. You cannot change the selection after the installation. For the Built-in DataMover you must use Kopia. ThenodeAgentdeploys a daemon set, which means that thenodeAgentpods run on each working node. You can configure File System Backup by addingspec.defaultVolumesToFsBackup: trueto theBackupCR. nodeSelector- Specifies the nodes on which Kopia or Restic are available. By default, Kopia or Restic run on all nodes.
region- Specifies the region, following the naming convention of the documentation of your object storage server.
s3Url- Specifies the URL of the S3 endpoint.
name-
Specifies the name of the
Secretobject that you created. If you do not specify this value, the default name,cloud-credentials, is used. If you specify a custom name, the custom name is used for the backup location. bucket- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
prefix-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
4.11.1.4. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.11.1.5. Configuring node agents and node labels Copier lienLien copié sur presse-papiers!
The Data Protection Application (DPA) uses the
nodeSelector
nodeSelector
Procedure
Run the node agent on any node that you choose by adding a custom label:
$ oc label node/<node_name> node-role.kubernetes.io/nodeAgent=""NoteAny label specified must match the labels on each node.
Use the same custom label in the
field, which you used for labeling nodes:DPA.spec.configuration.nodeAgent.podConfig.nodeSelectorconfiguration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/nodeAgent: ""The following example is an anti-pattern of
and does not work unless both labels,nodeSelectorandnode-role.kubernetes.io/infra: "", are on the node:node-role.kubernetes.io/worker: ""configuration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: ""
4.11.1.6. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.11.1.6.1. Enabling CSI in the DataProtectionApplication CR Copier lienLien copié sur presse-papiers!
You enable the Container Storage Interface (CSI) in the
DataProtectionApplication
Prerequisites
- The cloud provider must support CSI snapshots.
Procedure
Edit the
CR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... spec: configuration: velero: defaultPlugins: - openshift - csiwhere:
csi-
Specifies the
csidefault plugin.
4.11.1.6.2. Disabling the node agent in DataProtectionApplication Copier lienLien copié sur presse-papiers!
If you are not using
Restic
Kopia
DataMover
nodeAgent
DataProtectionApplication
nodeAgent
Procedure
To disable the
, set thenodeAgentflag toenable. See the following example:falseExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: false uploaderType: kopia # ...where:
enable- Enables the node agent.
To enable the
, set thenodeAgentflag toenable. See the following example:trueExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: true uploaderType: kopia # ...where:
enableEnables the node agent.
You can set up a job to enable and disable the
field in thenodeAgentCR. For more information, see "Running tasks in pods using jobs".DataProtectionApplication
4.12. Configuring OADP with ODF Copier lienLien copié sur presse-papiers!
4.12.1. Configuring the OpenShift API for Data Protection with OpenShift Data Foundation Copier lienLien copié sur presse-papiers!
Install the OpenShift API for Data Protection (OADP) with OpenShift Data Foundation by installing the OADP Operator and configuring a backup location and a snapshot location. You then install the Data Protection Application.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
You can configure Multicloud Object Gateway or any AWS S3-compatible object storage as a backup location.
The
CloudStorage
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can create a
Secret
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. For details, see Using Operator Lifecycle Manager on restricted networks.
4.12.1.1. About backup and snapshot locations and their secrets Copier lienLien copié sur presse-papiers!
Review backup location, snapshot location, and secret configuration requirements for the
DataProtectionApplication
4.12.1.1.1. Backup locations Copier lienLien copié sur presse-papiers!
You can specify one of the following AWS S3-compatible object storage solutions as a backup location:
- Multicloud Object Gateway (MCG)
- Red Hat Container Storage
- Ceph RADOS Gateway; also known as Ceph Object Gateway
- Red Hat OpenShift Data Foundation
- MinIO
Velero backs up OpenShift Container Platform resources, Kubernetes objects, and internal images as an archive file on object storage.
4.12.1.1.2. Snapshot locations Copier lienLien copié sur presse-papiers!
If you use your cloud provider’s native snapshot API to back up persistent volumes, you must specify the cloud provider as the snapshot location.
If you use Container Storage Interface (CSI) snapshots, you do not need to specify a snapshot location because you will create a
VolumeSnapshotClass
If you use File System Backup (FSB), you do not need to specify a snapshot location because FSB backs up the file system on object storage.
4.12.1.1.3. Secrets Copier lienLien copié sur presse-papiers!
If the backup and snapshot locations use the same credentials or if you do not require a snapshot location, you create a default
Secret
If the backup and snapshot locations use different credentials, you create two secret objects:
-
Custom for the backup location, which you specify in the
SecretCR.DataProtectionApplication -
Default for the snapshot location, which is not referenced in the
SecretCR.DataProtectionApplication
The Data Protection Application requires a default
Secret
If you do not want to specify backup or snapshot locations during the installation, you can create a default
Secret
credentials-velero
4.12.1.1.4. Creating a default Secret Copier lienLien copié sur presse-papiers!
You create a default
Secret
The default name of the
Secret
cloud-credentials
aws
azure
gcp
The
DataProtectionApplication
Secret
Secret
If you do not want to use the backup location credentials during the installation, you can create a
Secret
credentials-velero
Prerequisites
- Your object storage and cloud storage, if any, must use the same credentials.
- You must configure object storage for Velero.
Procedure
Create a
file for the backup storage location in the appropriate format for your cloud provider.credentials-veleroSee the following example:
[default] aws_access_key_id=<AWS_ACCESS_KEY_ID> aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Create a
custom resource (CR) with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-veleroThe
is referenced in theSecretblock of thespec.backupLocations.credentialCR when you install the Data Protection Application.DataProtectionApplication
4.12.1.1.5. Creating secrets for different credentials Copier lienLien copié sur presse-papiers!
Create separate
Secret
Procedure
-
Create a file for the snapshot location in the appropriate format for your cloud provider.
credentials-velero Create a
for the snapshot location with the default name:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-velero-
Create a file for the backup location in the appropriate format for your object storage.
credentials-velero Create a
for the backup location with a custom name:Secret$ oc create secret generic <custom_secret> -n openshift-adp --from-file cloud=credentials-veleroAdd the
with the custom name to theSecretCR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: ... backupLocations: - velero: provider: <provider> default: true credential: key: cloud name: <custom_secret> objectStorage: bucket: <bucket_name> prefix: <prefix>where:
custom_secret-
Specifies the backup location
Secretwith custom name.
4.12.1.1.6. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
Use the
nodeSelector
nodeSelector
4.12.1.1.6.1. Adjusting Ceph CPU and memory requirements based on collected data Copier lienLien copié sur presse-papiers!
The following recommendations are based on observations of performance made in the scale and performance lab. The changes are specifically related to Red Hat OpenShift Data Foundation (ODF). If working with ODF, consult the appropriate tuning guides for official recommendations.
4.12.1.1.6.1.1. CPU and memory requirement for configurations Copier lienLien copié sur presse-papiers!
Backup and restore operations require large amounts of CephFS
PersistentVolumes
out-of-memory
| Configuration types | Request | Max limit |
|---|---|---|
| CPU | Request changed to 3 | Max limit to 3 |
| Memory | Request changed to 8 Gi | Max limit to 128 Gi |
4.12.1.1.7. Enabling self-signed CA certificates Copier lienLien copié sur presse-papiers!
You must enable a self-signed CA certificate for object storage by editing the
DataProtectionApplication
certificate signed by unknown authority
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the
parameter andspec.backupLocations.velero.objectStorage.caCertparameters of thespec.backupLocations.velero.configCR manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket> prefix: <prefix> caCert: <base64_encoded_cert_string> config: insecureSkipTLSVerify: "false" # ...where:
caCert- Specifies the Base64-encoded CA certificate string.
insecureSkipTLSVerify-
Specifies the
insecureSkipTLSVerifyconfiguration. The configuration can be set to either"true"or"false". If set to"true", SSL/TLS security is disabled. If set to"false", SSL/TLS security is enabled.
4.12.1.1.8. Using CA certificates with the velero command aliased for Velero deployment Copier lienLien copié sur presse-papiers!
You might want to use the Velero CLI without installing it locally on your system by creating an alias for it.
Prerequisites
-
You must be logged in to the OpenShift Container Platform cluster as a user with the role.
cluster-admin You must have the OpenShift CLI (
) installed. .ProcedureocTo use an aliased Velero command, run the following command:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'Check that the alias is working by running the following command:
$ velero versionClient: Version: v1.12.1-OADP Git commit: - Server: Version: v1.12.1-OADPTo use a CA certificate with this command, you can add a certificate to the Velero deployment by running the following commands:
$ CA_CERT=$(oc -n openshift-adp get dataprotectionapplications.oadp.openshift.io <dpa-name> -o jsonpath='{.spec.backupLocations[0].velero.objectStorage.caCert}')$ [[ -n $CA_CERT ]] && echo "$CA_CERT" | base64 -d | oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "cat > /tmp/your-cacert.txt" || echo "DPA BSL has no caCert"$ velero describe backup <backup_name> --details --cacert /tmp/<your_cacert>.txtTo fetch the backup logs, run the following command:
$ velero backup logs <backup_name> --cacert /tmp/<your_cacert.txt>You can use these logs to view failures and warnings for the resources that you cannot back up.
-
If the Velero pod restarts, the file disappears, and you must re-create the
/tmp/your-cacert.txtfile by re-running the commands from the previous step./tmp/your-cacert.txt You can check if the
file still exists, in the file location where you stored it, by running the following command:/tmp/your-cacert.txt$ oc exec -n openshift-adp -i deploy/velero -c velero -- bash -c "ls /tmp/your-cacert.txt" /tmp/your-cacert.txtIn a future release of OpenShift API for Data Protection (OADP), we plan to mount the certificate to the Velero pod so that this step is not required.
4.12.1.2. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
-
If the backup and snapshot locations use the same credentials, you must create a with the default name,
Secret.cloud-credentials If the backup and snapshot locations use different credentials, you must create two
:Secrets-
with a custom name for the backup location. You add this
Secretto theSecretCR.DataProtectionApplication -
with another custom name for the snapshot location. You add this
Secretto theSecretCR.DataProtectionApplication
NoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret-
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: configuration: velero: defaultPlugins: - aws - kubevirt - csi - openshift resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia podConfig: nodeSelector: <node_selector> backupLocations: - velero: provider: gcp default: true credential: key: cloud name: <default_secret> objectStorage: bucket: <bucket_name> prefix: <prefix>where:
namespace-
Specifies the default namespace for OADP which is
openshift-adp. The namespace is a variable and is configurable. aws-
Specifies that an object store plugin corresponding to your storage locations is required. For all S3 providers, the required plugin is
aws. For Azure and Google Cloud object stores, theazureorgcpplugin is required. kubevirt-
Optional: The
kubevirtplugin is used with OpenShift Virtualization. csi-
Specifies the
csidefault plugin if you use CSI snapshots to back up PVs. Thecsiplugin uses the Velero CSI beta snapshot APIs. You do not need to configure a snapshot location. openshift-
Specifies that the
openshiftplugin is mandatory. resourceTimeout- Specifies how many minutes to wait for several Velero resources such as Velero CRD availability, volumeSnapshot deletion, and backup repository availability, before timeout occurs. The default is 10m.
nodeAgent- Specifies the administrative agent that routes the administrative requests to servers.
enable-
Set this value to
trueif you want to enablenodeAgentand perform File System Backup. uploaderType-
Specifies the uploader type. Enter
kopiaorresticas your uploader. You cannot change the selection after the installation. For the Built-in DataMover you must use Kopia. ThenodeAgentdeploys a daemon set, which means that thenodeAgentpods run on each working node. You can configure File System Backup by addingspec.defaultVolumesToFsBackup: trueto theBackupCR. nodeSelector- Specifies the nodes on which Kopia or Restic are available. By default, Kopia or Restic run on all nodes.
provider- Specifies the backup provider.
name-
Specifies the correct default name for the
Secret, for example,cloud-credentials-gcp, if you use a default plugin for the backup provider. If specifying a custom name, then the custom name is used for the backup location. If you do not specify aSecretname, the default name is used. bucket- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
prefix-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
4.12.1.3. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.12.1.4. Configuring node agents and node labels Copier lienLien copié sur presse-papiers!
The Data Protection Application (DPA) uses the
nodeSelector
nodeSelector
Procedure
Run the node agent on any node that you choose by adding a custom label:
$ oc label node/<node_name> node-role.kubernetes.io/nodeAgent=""NoteAny label specified must match the labels on each node.
Use the same custom label in the
field, which you used for labeling nodes:DPA.spec.configuration.nodeAgent.podConfig.nodeSelectorconfiguration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/nodeAgent: ""The following example is an anti-pattern of
and does not work unless both labels,nodeSelectorandnode-role.kubernetes.io/infra: "", are on the node:node-role.kubernetes.io/worker: ""configuration: nodeAgent: enable: true podConfig: nodeSelector: node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: ""
4.12.1.5. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.12.1.5.1. Creating an Object Bucket Claim for disaster recovery on OpenShift Data Foundation Copier lienLien copié sur presse-papiers!
If you use cluster storage for your Multicloud Object Gateway (MCG) bucket
backupStorageLocation
Failure to configure an Object Bucket Claim (OBC) might lead to backups not being available.
Unless specified otherwise, "NooBaa" refers to the open source project that provides lightweight object storage, while "Multicloud Object Gateway (MCG)" refers to the Red Hat distribution of NooBaa.
For more information on the MCG, see Accessing the Multicloud Object Gateway with your applications.
Procedure
- Create an Object Bucket Claim (OBC) using the OpenShift web console as described in Creating an Object Bucket Claim using the OpenShift Web Console.
4.12.1.5.2. Enabling CSI in the DataProtectionApplication CR Copier lienLien copié sur presse-papiers!
You enable the Container Storage Interface (CSI) in the
DataProtectionApplication
Prerequisites
- The cloud provider must support CSI snapshots.
Procedure
Edit the
CR, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... spec: configuration: velero: defaultPlugins: - openshift - csiwhere:
csi-
Specifies the
csidefault plugin.
4.12.1.5.3. Disabling the node agent in DataProtectionApplication Copier lienLien copié sur presse-papiers!
If you are not using
Restic
Kopia
DataMover
nodeAgent
DataProtectionApplication
nodeAgent
Procedure
To disable the
, set thenodeAgentflag toenable. See the following example:falseExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: false uploaderType: kopia # ...where:
enable- Enables the node agent.
To enable the
, set thenodeAgentflag toenable. See the following example:trueExample
DataProtectionApplicationCR# ... configuration: nodeAgent: enable: true uploaderType: kopia # ...where:
enableEnables the node agent.
You can set up a job to enable and disable the
field in thenodeAgentCR. For more information, see "Running tasks in pods using jobs".DataProtectionApplication
4.13. Configuring OADP with OpenShift Virtualization Copier lienLien copié sur presse-papiers!
4.13.1. Configuring the OpenShift API for Data Protection with OpenShift Virtualization Copier lienLien copié sur presse-papiers!
You can install the OpenShift API for Data Protection (OADP) with OpenShift Virtualization by installing the OADP Operator and configuring a backup location. Then, you can install the Data Protection Application.
Back up and restore virtual machines by using the OpenShift API for Data Protection.
OpenShift API for Data Protection with OpenShift Virtualization supports the following backup and restore storage options:
- Container Storage Interface (CSI) backups
- Container Storage Interface (CSI) backups with DataMover
The following storage options are excluded:
- File system backup and restore
- Volume snapshot backups and restores
For more information, see Backing up applications with File System Backup: Kopia or Restic.
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. See Using Operator Lifecycle Manager on restricted networks for details.
Red Hat only supports the combination of OADP versions 1.3.0 and later, and OpenShift Virtualization versions 4.14 and later.
OADP versions before 1.3.0 are not supported for back up and restore of OpenShift Virtualization.
4.13.1.1. Installing and configuring OADP with OpenShift Virtualization Copier lienLien copié sur presse-papiers!
As a cluster administrator, you install OADP by installing the OADP Operator.
The latest version of the OADP Operator installs Velero 1.14.
Prerequisites
-
Access to the cluster as a user with the role.
cluster-admin
Procedure
- Install the OADP Operator according to the instructions for your storage provider.
-
Install the Data Protection Application (DPA) with the and
kubevirtOADP plugins.openshift Back up virtual machines by creating a
custom resource (CR).BackupWarningRed Hat support is limited to only the following options:
- CSI backups
- CSI backups with DataMover.
You restore the
CR by creating aBackupCR.Restore
4.13.1.2. Installing the Data Protection Application Copier lienLien copié sur presse-papiers!
You install the Data Protection Application (DPA) by creating an instance of the
DataProtectionApplication
Prerequisites
- You must install the OADP Operator.
- You must configure object storage as a backup location.
- If you use snapshots to back up PVs, your cloud provider must support either a native snapshot API or Container Storage Interface (CSI) snapshots.
If the backup and snapshot locations use the same credentials, you must create a
with the default name,Secret.cloud-credentialsNoteIf you do not want to specify backup or snapshot locations during the installation, you can create a default
with an emptySecretfile. If there is no defaultcredentials-velero, the installation will fail.Secret
Procedure
- Click Operators → Installed Operators and select the OADP Operator.
- Under Provided APIs, click Create instance in the DataProtectionApplication box.
Click YAML View and update the parameters of the
manifest:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> namespace: openshift-adp spec: configuration: velero: defaultPlugins: - kubevirt - gcp - csi - openshift resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia podConfig: nodeSelector: <node_selector> backupLocations: - velero: provider: gcp default: true credential: key: cloud name: <default_secret> objectStorage: bucket: <bucket_name> prefix: <prefix>where:
namespace-
Specifies the default namespace for OADP which is
openshift-adp. The namespace is a variable and is configurable. kubevirt-
Specifies that the
kubevirtplugin is mandatory for OpenShift Virtualization. gcp-
Specifies the plugin for the backup provider, for example,
gcp, if it exists. csi-
Specifies that the
csiplugin is mandatory for backing up PVs with CSI snapshots. Thecsiplugin uses the Velero CSI beta snapshot APIs. You do not need to configure a snapshot location. openshift-
Specifies that the
openshiftplugin is mandatory. resourceTimeout- Specifies how many minutes to wait for several Velero resources such as Velero CRD availability, volumeSnapshot deletion, and backup repository availability, before timeout occurs. The default is 10m.
nodeAgent- Specifies the administrative agent that routes the administrative requests to servers.
enable-
Set this value to
trueif you want to enablenodeAgentand perform File System Backup. uploaderType-
Specifies the uploader type. Enter
kopiaas your uploader to use the Built-in DataMover. ThenodeAgentdeploys a daemon set, which means that thenodeAgentpods run on each working node. You can configure File System Backup by addingspec.defaultVolumesToFsBackup: trueto theBackupCR. nodeSelector- Specifies the nodes on which Kopia are available. By default, Kopia runs on all nodes.
provider- Specifies the backup provider.
name-
Specifies the correct default name for the
Secret, for example,cloud-credentials-gcp, if you use a default plugin for the backup provider. If specifying a custom name, then the custom name is used for the backup location. If you do not specify aSecretname, the default name is used. bucket- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
prefix-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes.
- Click Create.
Verification
Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:
$ oc get all -n openshift-adpNAME READY STATUS RESTARTS AGE pod/oadp-operator-controller-manager-67d9494d47-6l8z8 2/2 Running 0 2m8s pod/node-agent-9cq4q 1/1 Running 0 94s pod/node-agent-m4lts 1/1 Running 0 94s pod/node-agent-pv4kr 1/1 Running 0 95s pod/velero-588db7f655-n842v 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oadp-operator-controller-manager-metrics-service ClusterIP 172.30.70.140 <none> 8443/TCP 2m8s service/openshift-adp-velero-metrics-svc ClusterIP 172.30.10.0 <none> 8085/TCP 8h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-agent 3 3 3 3 3 <none> 96s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oadp-operator-controller-manager 1/1 1 1 2m9s deployment.apps/velero 1/1 1 1 96s NAME DESIRED CURRENT READY AGE replicaset.apps/oadp-operator-controller-manager-67d9494d47 1 1 1 2m9s replicaset.apps/velero-588db7f655 1 1 1 96sVerify that the
(DPA) is reconciled by running the following command:DataProtectionApplication$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}-
Verify the is set to
type.Reconciled Verify the backup storage location and confirm that the
isPHASEby running the following command:Available$ oc get backupstoragelocations.velero.io -n openshift-adpNAME PHASE LAST VALIDATED AGE DEFAULT dpa-sample-1 Available 1s 3d16h true-
Verify that the is in
PHASE.Available
If you run a backup of a Microsoft Windows virtual machine (VM) immediately after the VM reboots, the backup might fail with a
PartiallyFailed
4.13.1.3. Backing up a single VM Copier lienLien copié sur presse-papiers!
If you have a namespace with multiple virtual machines (VMs), and want to back up only one of them, you can use the label selector to filter the VM that needs to be included in the backup. You can filter the VM by using the
app: vmname
Prerequisites
- You have installed the OADP Operator.
- You have multiple VMs running in a namespace.
-
You have added the plugin in the
kubevirt(DPA) custom resource (CR).DataProtectionApplication -
You have configured the CR in the
BackupStorageLocationCR andDataProtectionApplicationis available.BackupStorageLocation
Procedure
Configure the
CR as shown in the following example:BackupExample
BackupCRapiVersion: velero.io/v1 kind: Backup metadata: name: vmbackupsingle namespace: openshift-adp spec: snapshotMoveData: true includedNamespaces: - <vm_namespace> labelSelector: matchLabels: app: <vm_app_name> storageLocation: <backup_storage_location_name>where:
vm_namespace- Specifies the name of the namespace where you have created the VMs.
vm_app_name- Specifies the VM name that needs to be backed up.
backup_storage_location_name-
Specifies the name of the
BackupStorageLocationCR.
To create a
CR, run the following command:Backup$ oc apply -f <backup_cr_file_name>where:
backup_cr_file_name-
Specifies the name of the
BackupCR file.
4.13.1.4. Restoring a single VM Copier lienLien copié sur presse-papiers!
After you have backed up a single virtual machine (VM) by using the label selector in the
Backup
Restore
Prerequisites
- You have installed the OADP Operator.
- You have backed up a single VM by using the label selector.
Procedure
Configure the
CR as shown in the following example:RestoreExample
RestoreCRapiVersion: velero.io/v1 kind: Restore metadata: name: vmrestoresingle namespace: openshift-adp spec: backupName: vmbackupsingle restorePVs: truewhere:
vmbackupsingle- Specifies the name of the backup of a single VM.
To restore the single VM, run the following command:
$ oc apply -f <restore_cr_file_name>where:
restore_cr_file_nameSpecifies the name of the
CR file.RestoreNoteWhen you restore a backup of VMs, you might notice that the Ceph storage capacity allocated for the restore is higher than expected. This behavior is observed only during the
restore and if the volume type of the VM iskubevirt.blockUse the
tool to reclaim space on target volumes. For more details, see Reclaiming space on target volumes.rbd sparsify
4.13.1.5. Restoring a single VM from a backup of multiple VMs Copier lienLien copié sur presse-papiers!
If you have a backup containing multiple virtual machines (VMs), and you want to restore only one VM, you can use the
LabelSelectors
Restore
Provisioning
app: <vm_name>
kubevirt.io/created-by
kubevirt.io/created-by
DataVolume
Prerequisites
- You have installed the OADP Operator.
- You have labeled the VMs that need to be backed up.
- You have a backup of multiple VMs.
Procedure
Before you take a backup of many VMs, ensure that the VMs are labeled by running the following command:
$ oc label vm <vm_name> app=<vm_name> -n openshift-adpConfigure the label selectors in the
CR as shown in the following example:RestoreExample
RestoreCRapiVersion: velero.io/v1 kind: Restore metadata: name: singlevmrestore namespace: openshift-adp spec: backupName: multiplevmbackup restorePVs: true LabelSelectors: - matchLabels: kubevirt.io/created-by: <datavolume_uid> - matchLabels: app: <vm_name>where:
datavolume_uid-
Specifies the UID of
DataVolumeof the VM that you want to restore. For example,b6…53a-ddd7-4d9d-9407-a0c…e5. vm_name-
Specifies the name of the VM that you want to restore. For example,
test-vm.
To restore a VM, run the following command:
$ oc apply -f <restore_cr_file_name>where:
restore_cr_file_name-
Specifies the name of the
RestoreCR file.
4.13.1.6. Configuring the DPA with client burst and QPS settings Copier lienLien copié sur presse-papiers!
The burst setting determines how many requests can be sent to the
velero
You can set the burst and QPS values of the
velero
dpa.configuration.velero.client-burst
dpa.configuration.velero.client-qps
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
and theclient-burstfields in the DPA as shown in the following example:client-qpsExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: restic velero: client-burst: 500 client-qps: 300 defaultPlugins: - openshift - aws - kubevirtwhere:
client-burst-
Specifies the
client-burstvalue. In this example, theclient-burstfield is set to 500. client-qps-
Specifies the
client-qpsvalue. In this example, theclient-qpsfield is set to 300.
4.13.1.7. Overriding the imagePullPolicy setting in the DPA Copier lienLien copié sur presse-papiers!
In OADP 1.4.0 or earlier, the Operator sets the
imagePullPolicy
Always
In OADP 1.4.1 or later, the Operator first checks if each image has the
sha256
sha512
imagePullPolicy
-
If the image has the digest, the Operator sets to
imagePullPolicy.IfNotPresent -
If the image does not have the digest, the Operator sets to
imagePullPolicy.Always
You can also override the
imagePullPolicy
spec.imagePullPolicy
Prerequisites
- You have installed the OADP Operator.
Procedure
Configure the
field in the DPA as shown in the following example:spec.imagePullPolicyExample Data Protection Application
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-dpa namespace: openshift-adp spec: backupLocations: - name: default velero: config: insecureSkipTLSVerify: "true" profile: "default" region: <bucket_region> s3ForcePathStyle: "true" s3Url: <bucket_url> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - kubevirt - csi imagePullPolicy: Neverwhere:
imagePullPolicy-
Specifies the value for
imagePullPolicy. In this example, theimagePullPolicyfield is set toNever.
4.13.1.8. About incremental back up support Copier lienLien copié sur presse-papiers!
OADP supports incremental backups of
block
Filesystem
| Volume mode | FSB - Restic | FSB - Kopia | CSI | CSI Data Mover |
|---|---|---|---|---|
| Filesystem | S [1], I [2] | S [1], I [2] | S [1] | S [1], I [2] |
| Block | N [3] | N [3] | S [1] | S [1], I [2] |
| Volume mode | FSB - Restic | FSB - Kopia | CSI | CSI Data Mover |
|---|---|---|---|---|
| Filesystem | N [3] | N [3] | S [1] | S [1], I [2] |
| Block | N [3] | N [3] | S [1] | S [1], I [2] |
- Backup supported
- Incremental backup supported
- Not supported
The CSI Data Mover backups use Kopia regardless of
uploaderType
4.14. Configuring OADP with multiple backup storage locations Copier lienLien copié sur presse-papiers!
4.14.1. Configuring the OpenShift API for Data Protection (OADP) with more than one Backup Storage Location Copier lienLien copié sur presse-papiers!
Configure multiple backup storage locations (BSLs) in the Data Protection Application (DPA) to store backups across different regions or storage providers. This provides flexibility and redundancy for your backup strategy.
OADP supports multiple credentials for configuring more than one BSL, so that you can specify the credentials to use with any BSL.
4.14.1.1. Configuring the DPA with more than one BSL Copier lienLien copié sur presse-papiers!
Configure the
DataProtectionApplication
BackupStorageLocation
For example, you have configured the following two BSLs:
- Configured one BSL in the DPA and set it as the default BSL.
-
Created another BSL independently by using the CR.
BackupStorageLocation
As you have already set the BSL created through the DPA as the default, you cannot set the independently created BSL again as the default. This means, at any given time, you can set only one BSL as the default BSL.
Prerequisites
- You must install the OADP Operator.
- You must create the secrets by using the credentials provided by the cloud provider.
Procedure
Configure the
CR with more than oneDataProtectionApplicationCR. See the following example:BackupStorageLocationExample DPA
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication #... backupLocations: - name: aws velero: provider: aws default: true objectStorage: bucket: <bucket_name> prefix: <prefix> config: region: <region_name> profile: "default" credential: key: cloud name: cloud-credentials - name: odf velero: provider: aws default: false objectStorage: bucket: <bucket_name> prefix: <prefix> config: profile: "default" region: <region_name> s3Url: <url> insecureSkipTLSVerify: "true" s3ForcePathStyle: "true" credential: key: cloud name: <custom_secret_name_odf> #...where:
name: aws- Specifies a name for the first BSL.
default: true-
Indicates that this BSL is the default BSL. If a BSL is not set in the
Backup CR, the default BSL is used. You can set only one BSL as the default. <bucket_name>- Specifies the bucket name.
<prefix>-
Specifies a prefix for Velero backups. For example,
velero. <region_name>- Specifies the AWS region for the bucket.
cloud-credentials-
Specifies the name of the default
Secretobject that you created. name: odf- Specifies a name for the second BSL.
<url>- Specifies the URL of the S3 endpoint.
<custom_secret_name_odf>-
Specifies the correct name for the
Secret. For example,custom_secret_name_odf. If you do not specify aSecretname, the default name is used.
Specify the BSL to be used in the backup CR. See the following example.
Example backup CR
apiVersion: velero.io/v1 kind: Backup # ... spec: includedNamespaces: - <namespace> storageLocation: <backup_storage_location> defaultVolumesToFsBackup: truewhere:
<namespace>- Specifies the namespace to back up.
<backup_storage_location>- Specifies the storage location.
4.14.1.2. Configuring two backup BSLs with different cloud credentials Copier lienLien copié sur presse-papiers!
Configure two backup storage locations with different cloud credentials to back up applications to multiple storage targets. With this setup, you can distribute backups across different storage providers for redundancy.
Prerequisites
- You must install the OADP Operator.
- You must configure two backup storage locations: AWS S3 and Multicloud Object Gateway (MCG).
- You must have an application with a database deployed on a Red Hat OpenShift cluster.
Procedure
Create the first
for the AWS S3 storage provider with the default name by running the following command:Secret$ oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=<aws_credentials_file_name>where:
<aws_credentials_file_name>- Specifies the name of the cloud credentials file for AWS S3.
Create the second
for MCG with a custom name by running the following command:Secret$ oc create secret generic mcg-secret -n openshift-adp --from-file cloud=<MCG_credentials_file_name>where:
<MCG_credentials_file_name>-
Specifies the name of the cloud credentials file for MCG. Note the name of the
mcg-secretcustom secret.
Configure the DPA with the two BSLs as shown in the following example.
Example DPA
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: two-bsl-dpa namespace: openshift-adp spec: backupLocations: - name: aws velero: config: profile: default region: <region_name> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws - name: mcg velero: config: insecureSkipTLSVerify: "true" profile: noobaa region: <region_name> s3ForcePathStyle: "true" s3Url: <s3_url> credential: key: cloud name: mcg-secret objectStorage: bucket: <bucket_name_mcg> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - awswhere:
<region_name>- Specifies the AWS region for the bucket.
<bucket_name>- Specifies the AWS S3 bucket name.
region: <region_name>- Specifies the region, following the naming convention of the documentation of MCG.
<s3_url>- Specifies the URL of the S3 endpoint for MCG.
mcg-secret- Specifies the name of the custom secret for MCG storage.
<bucket_name_mcg>- Specifies the MCG bucket name.
Create the DPA by running the following command:
$ oc create -f <dpa_file_name>where:
<dpa_file_name>- Specifies the file name of the DPA you configured.
Verify that the DPA has reconciled by running the following command:
$ oc get dpa -o yamlVerify that the BSLs are available by running the following command:
$ oc get bslExample output
NAME PHASE LAST VALIDATED AGE DEFAULT aws Available 5s 3m28s true mcg Available 5s 3m28sCreate a backup CR with the default BSL.
NoteIn the following example, the
field is not specified in the backup CR.storageLocationExample backup CR
apiVersion: velero.io/v1 kind: Backup metadata: name: test-backup1 namespace: openshift-adp spec: includedNamespaces: - <mysql_namespace> defaultVolumesToFsBackup: truewhere:
<mysql_namespace>- Specifies the namespace for the application installed in the cluster.
Create a backup by running the following command:
$ oc apply -f <backup_file_name>where:
<backup_file_name>- Specifies the name of the backup CR file.
Verify that the backup completed with the default BSL by running the following command:
$ oc get backups.velero.io <backup_name> -o yamlwhere:
<backup_name>- Specifies the name of the backup.
Create a backup CR by using MCG as the BSL. In the following example, note that the second
value is specified at the time of backup CR creation.storageLocationExample backup
CRapiVersion: velero.io/v1 kind: Backup metadata: name: test-backup1 namespace: openshift-adp spec: includedNamespaces: - <mysql_namespace> storageLocation: mcg defaultVolumesToFsBackup: truewhere:
<mysql_namespace>- Specifies the namespace for the application installed in the cluster.
mcg- Specifies the second storage location.
Create a second backup by running the following command:
$ oc apply -f <backup_file_name>where:
<backup_file_name>- Specifies the name of the backup CR file.
Verify that the backup completed with the storage location as MCG by running the following command:
$ oc get backups.velero.io <backup_name> -o yamlwhere:
<backup_name>- Specifies the name of the backup.
4.15. Configuring OADP with multiple Volume Snapshot Locations Copier lienLien copié sur presse-papiers!
4.15.1. Configuring the OpenShift API for Data Protection (OADP) with more than one Volume Snapshot Location Copier lienLien copié sur presse-papiers!
Configure multiple Volume Snapshot Locations (VSLs) in the Data Protection Application (DPA) to store volume snapshots across different cloud provider regions. This provides geographic redundancy and regional disaster recovery capabilities.
4.15.1.1. Configuring the DPA with more than one VSL Copier lienLien copié sur presse-papiers!
Configure the
DataProtectionApplication
Procedure
Configure the DPA CR with more than one VSL as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication #... snapshotLocations: - velero: config: profile: default region: <region> credential: key: cloud name: cloud-credentials provider: aws - velero: config: profile: default region: <region> credential: key: cloud name: <custom_credential> provider: aws #...where:
<region>- Specifies the region. The snapshot location must be in the same region as the persistent volumes.
<custom_credential>- Specifies the custom credential name.
4.16. Uninstalling OADP Copier lienLien copié sur presse-papiers!
4.16.1. Uninstalling the OpenShift API for Data Protection Copier lienLien copié sur presse-papiers!
You uninstall the OpenShift API for Data Protection (OADP) by deleting the OADP Operator. See Deleting Operators from a cluster for details.
4.17. OADP backing up Copier lienLien copié sur presse-papiers!
4.17.1. Backing up applications Copier lienLien copié sur presse-papiers!
Frequent backups might consume storage on the backup storage location. Check the frequency of backups, retention time, and the amount of data of the persistent volumes (PVs) if using non-local backups, for example, S3 buckets. Because all taken backup remains until expired, also check the time to live (TTL) setting of the schedule.
You can back up applications by creating a
Backup
Backup
-
The CR creates backup files for Kubernetes resources and internal images on S3 object storage.
Backup - If you use Velero’s snapshot feature to back up data stored on the persistent volume, only snapshot related information is stored in the S3 bucket along with the Openshift object data.
-
If your cloud provider has a native snapshot API or supports CSI snapshots, the CR backs up persistent volumes (PVs) by creating snapshots. For more information about working with CSI snapshots, see Backing up persistent volumes with CSI snapshots.
Backup
If the underlying storage or the backup bucket are part of the same cluster, then the data might be lost in case of disaster.
For more information about CSI volume snapshots, see CSI volume snapshots.
The
CloudStorage
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The
CloudStorage
CloudStorage
CloudStorage
BackupStorageLocation
The
CloudStorage
BackupStorageLocation
CloudStorage
- If your cloud provider does not support snapshots or if your applications are on NFS data volumes, you can create backups by using Kopia or Restic. See Backing up applications with File System Backup: Kopia or Restic.
…/.snapshot: read-only file system errorThe
…/.snapshot
Do not give Velero write access to the
.snapshot
The OpenShift API for Data Protection (OADP) does not support backing up volume snapshots that were created by other software.
4.17.1.1. Previewing resources before running backup and restore Copier lienLien copié sur presse-papiers!
OADP backs up application resources based on the type, namespace, or label. This means that you can view the resources after the backup is complete. Similarly, you can view the restored objects based on the namespace, persistent volume (PV), or label after a restore operation is complete. To preview the resources in advance, you can do a dry run of the backup and restore operations.
Prerequisites
- You have installed the OADP Operator.
Procedure
To preview the resources included in the backup before running the actual backup, run the following command:
$ velero backup create <backup-name> --snapshot-volumes false1 To know more details about the backup resources, run the following command:
$ velero describe backup <backup_name> --details1 - 1
- Specify the name of the backup.
To preview the resources included in the restore before running the actual restore, run the following command:
$ velero restore create --from-backup <backup-name>1 - 1
- Specify the name of the backup created to review the backup resources.
ImportantThe
command creates restore resources in the cluster. You must delete the resources created as part of the restore, after you review the resources.velero restore createTo know more details about the restore resources, run the following command:
$ velero describe restore <restore_name> --details1 - 1
- Specify the name of the restore.
You can create backup hooks to run commands before or after the backup operation. See Creating backup hooks.
You can schedule backups by creating a
Schedule
Backup
4.17.1.2. Known issues Copier lienLien copié sur presse-papiers!
OpenShift Container Platform 4.14 enforces a pod security admission (PSA) policy that can hinder the readiness of pods during a Restic restore process.
This issue has been resolved in the OADP 1.1.6 and OADP 1.2.2 releases, therefore it is recommended that users upgrade to these releases.
For more information, see Restic restore partially failing on OCP 4.15 due to changed PSA policy.
4.17.2. Creating a Backup CR Copier lienLien copié sur presse-papiers!
You back up Kubernetes images, internal images, and persistent volumes (PVs) by creating a
Backup
Prerequisites
- You must install the OpenShift API for Data Protection (OADP) Operator.
-
The CR must be in a
DataProtectionApplicationstate.Ready Backup location prerequisites:
- You must have S3 object storage configured for Velero.
-
You must have a backup location configured in the CR.
DataProtectionApplication
Snapshot location prerequisites:
- Your cloud provider must have a native snapshot API or support Container Storage Interface (CSI) snapshots.
-
For CSI snapshots, you must create a CR to register the CSI driver.
VolumeSnapshotClass -
You must have a volume location configured in the CR.
DataProtectionApplication
Procedure
Retrieve the
CRs by entering the following command:backupStorageLocations$ oc get backupstoragelocations.velero.io -n openshift-adpExample output
NAMESPACE NAME PHASE LAST VALIDATED AGE DEFAULT openshift-adp velero-sample-1 Available 11s 31mCreate a
CR, as in the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: <backup> labels: velero.io/storage-location: default namespace: openshift-adp spec: hooks: {} includedNamespaces: - <namespace>1 includedResources: []2 excludedResources: []3 storageLocation: <velero-sample-1>4 ttl: 720h0m0s5 labelSelector:6 matchLabels: app: <label_1> app: <label_2> app: <label_3> orLabelSelectors:7 - matchLabels: app: <label_1> app: <label_2> app: <label_3>- 1
- Specify an array of namespaces to back up.
- 2
- Optional: Specify an array of resources to include in the backup. Resources might be shortcuts (for example, 'po' for 'pods') or fully-qualified. If unspecified, all resources are included.
- 3
- Optional: Specify an array of resources to exclude from the backup. Resources might be shortcuts (for example, 'po' for 'pods') or fully-qualified.
- 4
- Specify the name of the
backupStorageLocationsCR. - 5
- The
ttlfield defines the retention time of the created backup and the backed up data. For example, if you are using Restic as the backup tool, the backed up data items and data contents of the persistent volumes (PVs) are stored until the backup expires. But storing this data consumes more space in the target backup locations. An additional storage is consumed with frequent backups, which are created even before other unexpired completed backups might have timed out. - 6
- Map of {key,value} pairs of backup resources that have all the specified labels.
- 7
- Map of {key,value} pairs of backup resources that have one or more of the specified labels.
Verification
Verify that the status of the
CR isBackup:Completed$ oc get backups.velero.io -n openshift-adp <backup> -o jsonpath='{.status.phase}'
4.17.3. Backing up persistent volumes with CSI snapshots Copier lienLien copié sur presse-papiers!
You back up persistent volumes with Container Storage Interface (CSI) snapshots by editing the
VolumeSnapshotClass
Backup
For more information, see Creating a Backup CR.
4.17.3.1. Backing up persistent volumes with CSI snapshots Copier lienLien copié sur presse-papiers!
Prerequisites
- The cloud provider must support CSI snapshots.
-
You must enable CSI in the CR.
DataProtectionApplication
Procedure
Add the
key-value pair to themetadata.labels.velero.io/csi-volumesnapshot-class: "true"CR:VolumeSnapshotClassExample configuration file
apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: <volume_snapshot_class_name> labels: velero.io/csi-volumesnapshot-class: "true"1 annotations: snapshot.storage.kubernetes.io/is-default-class: true2 driver: <csi_driver> deletionPolicy: <deletion_policy_type>3 - 1
- Must be set to
true. - 2
- If you are restoring this volume in another cluster with the same driver, make sure that you set the
snapshot.storage.kubernetes.io/is-default-classparameter tofalseinstead of setting it totrue. Otherwise, the restore will partially fail. - 3
- OADP supports the
RetainandDeletedeletion policy types for CSI and Data Mover backup and restore.
Next steps
-
You can now create a CR.
Backup
4.17.4. Backing up applications with File System Backup: Kopia or Restic Copier lienLien copié sur presse-papiers!
You can use OADP to back up and restore Kubernetes volumes attached to pods from the file system of the volumes. This process is called File System Backup (FSB) or Pod Volume Backup (PVB). It is accomplished by using modules from the open source backup tools Restic or Kopia.
If your cloud provider does not support snapshots or if your applications are on NFS data volumes, you can create backups by using FSB.
FSB integration with OADP provides a solution for backing up and restoring almost any type of Kubernetes volumes. This integration is an additional capability of OADP and is not a replacement for existing functionality.
You back up Kubernetes resources, internal images, and persistent volumes with Kopia or Restic by editing the
Backup
You do not need to specify a snapshot location in the
DataProtectionApplication
In OADP version 1.3 and later, you can use either Kopia or Restic for backing up applications.
For the Built-in DataMover, you must use Kopia.
In OADP version 1.2 and earlier, you can only use Restic for backing up applications.
…/.snapshot: read-only file system errorThe
…/.snapshot
Do not give Velero write access to the
.snapshot
4.17.4.1. Backing up applications with File System Backup Copier lienLien copié sur presse-papiers!
Prerequisites
- You must install the OpenShift API for Data Protection (OADP) Operator.
-
You must not disable the default installation by setting
nodeAgenttospec.configuration.nodeAgent.enablein thefalseCR.DataProtectionApplication -
You must select Kopia or Restic as the uploader by setting to
spec.configuration.nodeAgent.uploaderTypeorkopiain theresticCR.DataProtectionApplication -
The CR must be in a
DataProtectionApplicationstate.Ready
Procedure
Create the
CR, as in the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: <backup> labels: velero.io/storage-location: default namespace: openshift-adp spec: defaultVolumesToFsBackup: true1 ...- 1
- In OADP version 1.2 and later, add the
defaultVolumesToFsBackup: truesetting within thespecblock. In OADP version 1.1, adddefaultVolumesToRestic: true.
4.17.5. Creating backup hooks Copier lienLien copié sur presse-papiers!
When performing a backup, it is possible to specify one or more commands to execute in a container within a pod, based on the pod being backed up.
The commands can be configured to performed before any custom action processing (Pre hooks), or after all custom actions have been completed and any additional items specified by the custom action have been backed up (Post hooks).
You create backup hooks to run commands in a container in a pod by editing the
Backup
Procedure
Add a hook to the
block of thespec.hooksCR, as in the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: <backup> namespace: openshift-adp spec: hooks: resources: - name: <hook_name> includedNamespaces: - <namespace>1 excludedNamespaces:2 - <namespace> includedResources: [] - pods3 excludedResources: []4 labelSelector:5 matchLabels: app: velero component: server pre:6 - exec: container: <container>7 command: - /bin/uname8 - -a onError: Fail9 timeout: 30s10 post:11 ...- 1
- Optional: You can specify namespaces to which the hook applies. If this value is not specified, the hook applies to all namespaces.
- 2
- Optional: You can specify namespaces to which the hook does not apply.
- 3
- Currently, pods are the only supported resource that hooks can apply to.
- 4
- Optional: You can specify resources to which the hook does not apply.
- 5
- Optional: This hook only applies to objects matching the label. If this value is not specified, the hook applies to all objects.
- 6
- Array of hooks to run before the backup.
- 7
- Optional: If the container is not specified, the command runs in the first container in the pod.
- 8
- This is the entry point for the
initcontainer being added. - 9
- Allowed values for error handling are
FailandContinue. The default isFail. - 10
- Optional: How long to wait for the commands to run. The default is
30s. - 11
- This block defines an array of hooks to run after the backup, with the same parameters as the pre-backup hooks.
4.17.6. Scheduling backups using Schedule CR Copier lienLien copié sur presse-papiers!
The schedule operation allows you to create a backup of your data at a particular time, specified by a Cron expression.
You schedule backups by creating a
Schedule
Backup
Leave enough time in your backup schedule for a backup to finish before another backup is created.
For example, if a backup of a namespace typically takes 10 minutes, do not schedule backups more frequently than every 15 minutes.
Prerequisites
- You must install the OpenShift API for Data Protection (OADP) Operator.
-
The CR must be in a
DataProtectionApplicationstate.Ready
Procedure
Retrieve the
CRs:backupStorageLocations$ oc get backupStorageLocations -n openshift-adpExample output
NAMESPACE NAME PHASE LAST VALIDATED AGE DEFAULT openshift-adp velero-sample-1 Available 11s 31mCreate a
CR, as in the following example:Schedule$ cat << EOF | oc apply -f - apiVersion: velero.io/v1 kind: Schedule metadata: name: <schedule> namespace: openshift-adp spec: schedule: 0 7 * * *1 template: hooks: {} includedNamespaces: - <namespace>2 storageLocation: <velero-sample-1>3 defaultVolumesToFsBackup: true4 ttl: 720h0m0s5 EOFNoteTo schedule a backup at specific intervals, enter the
in the following format:<duration_in_minutes>schedule: "*/10 * * * *"Enter the minutes value between quotation marks (
)." "- 1
cronexpression to schedule the backup, for example,0 7 * * *to perform a backup every day at 7:00.- 2
- Array of namespaces to back up.
- 3
- Name of the
backupStorageLocationsCR. - 4
- Optional: In OADP version 1.2 and later, add the
defaultVolumesToFsBackup: truekey-value pair to your configuration when performing backups of volumes with Restic. In OADP version 1.1, add thedefaultVolumesToRestic: truekey-value pair when you back up volumes with Restic. - 5
- The
ttlfield defines the retention time of the created backup and the backed up data. For example, if you are using Restic as the backup tool, the backed up data items and data contents of the persistent volumes (PVs) are stored until the backup expires. But storing this data consumes more space in the target backup locations. An additional storage is consumed with frequent backups, which are created even before other unexpired completed backups might have timed out.
Verification
Verify that the status of the
CR isScheduleafter the scheduled backup runs:Completed$ oc get schedule -n openshift-adp <schedule> -o jsonpath='{.status.phase}'
4.17.7. Deleting backups Copier lienLien copié sur presse-papiers!
You can delete a backup by creating the
DeleteBackupRequest
velero backup delete
The volume backup artifacts are deleted at different times depending on the backup method:
- Restic: The artifacts are deleted in the next full maintenance cycle, after the backup is deleted.
- Container Storage Interface (CSI): The artifacts are deleted immediately when the backup is deleted.
- Kopia: The artifacts are deleted after three full maintenance cycles of the Kopia repository, after the backup is deleted.
4.17.7.1. Deleting a backup by creating a DeleteBackupRequest CR Copier lienLien copié sur presse-papiers!
You can delete a backup by creating a
DeleteBackupRequest
Prerequisites
- You have run a backup of your application.
Procedure
Create a
CR manifest file:DeleteBackupRequestapiVersion: velero.io/v1 kind: DeleteBackupRequest metadata: name: deletebackuprequest namespace: openshift-adp spec: backupName: <backup_name>1 - 1
- Specify the name of the backup.
Apply the
CR to delete the backup:DeleteBackupRequest$ oc apply -f <deletebackuprequest_cr_filename>
4.17.7.2. Deleting a backup by using the Velero CLI Copier lienLien copié sur presse-papiers!
You can delete a backup by using the Velero CLI.
Prerequisites
- You have run a backup of your application.
- You downloaded the Velero CLI and can access the Velero binary in your cluster.
Procedure
To delete the backup, run the following Velero command:
$ velero backup delete <backup_name> -n openshift-adp1 - 1
- Specify the name of the backup.
4.17.7.3. About Kopia repository maintenance Copier lienLien copié sur presse-papiers!
There are two types of Kopia repository maintenance:
- Quick maintenance
- Runs every hour to keep the number of index blobs (n) low. A high number of indexes negatively affects the performance of Kopia operations.
- Does not delete any metadata from the repository without ensuring that another copy of the same metadata exists.
- Full maintenance
- Runs every 24 hours to perform garbage collection of repository contents that are no longer needed.
-
, a full maintenance task, finds all files and directory listings that are no longer accessible from snapshot manifests and marks them as deleted.
snapshot-gc - A full maintenance is a resource-costly operation, as it requires scanning all directories in all snapshots that are active in the cluster.
4.17.7.3.1. Kopia maintenance in OADP Copier lienLien copié sur presse-papiers!
The
repo-maintain-job
pod/repo-maintain-job-173...2527-2nbls 0/1 Completed 0 168m
pod/repo-maintain-job-173....536-fl9tm 0/1 Completed 0 108m
pod/repo-maintain-job-173...2545-55ggx 0/1 Completed 0 48m
You can check the logs of the
repo-maintain-job
repo-maintain-job
not due for full maintenance cycle until 2024-00-00 18:29:4
Three successful executions of a full maintenance cycle are required for the objects to be deleted from the backup object storage. This means you can expect up to 72 hours for all the artifacts in the backup object storage to be deleted.
4.17.7.4. Deleting a backup repository Copier lienLien copié sur presse-papiers!
After you delete the backup, and after the Kopia repository maintenance cycles to delete the related artifacts are complete, the backup is no longer referenced by any metadata or manifest objects. You can then delete the
backuprepository
Prerequisites
- You have deleted the backup of your application.
- You have waited up to 72 hours after the backup is deleted. This time frame allows Kopia to run the repository maintenance cycles.
Procedure
To get the name of the backup repository CR for a backup, run the following command:
$ oc get backuprepositories.velero.io -n openshift-adpTo delete the backup repository CR, run the following command:
$ oc delete backuprepository <backup_repository_name> -n openshift-adp1 - 1
- Specify the name of the backup repository from the earlier step.
4.17.8. About Kopia Copier lienLien copié sur presse-papiers!
Kopia is a fast and secure open-source backup and restore tool that allows you to create encrypted snapshots of your data and save the snapshots to remote or cloud storage of your choice.
Kopia supports network and local storage locations, and many cloud or remote storage locations, including:
- Amazon S3 and any cloud storage that is compatible with S3
- Azure Blob Storage
- Google Cloud Storage platform
Kopia uses content-addressable storage for snapshots:
- Snapshots are always incremental; data that is already included in previous snapshots is not re-uploaded to the repository. A file is only uploaded to the repository again if it is modified.
- Stored data is deduplicated; if multiple copies of the same file exist, only one of them is stored.
- If files are moved or renamed, Kopia can recognize that they have the same content and does not upload them again.
4.17.8.1. OADP integration with Kopia Copier lienLien copié sur presse-papiers!
OADP 1.3 supports Kopia as the backup mechanism for pod volume backup in addition to Restic. You must choose one or the other at installation by setting the
uploaderType
DataProtectionApplication
restic
kopia
uploaderType
Using the Kopia client to modify the Kopia backup repositories is not supported and can affect the integrity of Kopia backups. OADP does not support directly connecting to the Kopia repository and can offer support only on a best-effort basis.
The following example shows a
DataProtectionApplication
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: dpa-sample
spec:
configuration:
nodeAgent:
enable: true
uploaderType: kopia
# ...
4.18. OADP restoring Copier lienLien copié sur presse-papiers!
4.18.1. Restoring applications Copier lienLien copié sur presse-papiers!
You restore application backups by creating a
Restore
You can create restore hooks to run commands in a container in a pod by editing the
Restore
4.18.1.1. Previewing resources before running backup and restore Copier lienLien copié sur presse-papiers!
OADP backs up application resources based on the type, namespace, or label. This means that you can view the resources after the backup is complete. Similarly, you can view the restored objects based on the namespace, persistent volume (PV), or label after a restore operation is complete. To preview the resources in advance, you can do a dry run of the backup and restore operations.
Prerequisites
- You have installed the OADP Operator.
Procedure
To preview the resources included in the backup before running the actual backup, run the following command:
$ velero backup create <backup-name> --snapshot-volumes false1 - 1
- Specify the value of
--snapshot-volumesparameter asfalse.
To know more details about the backup resources, run the following command:
$ velero describe backup <backup_name> --details1 - 1
- Specify the name of the backup.
To preview the resources included in the restore before running the actual restore, run the following command:
$ velero restore create --from-backup <backup-name>1 - 1
- Specify the name of the backup created to review the backup resources.
ImportantThe
command creates restore resources in the cluster. You must delete the resources created as part of the restore, after you review the resources.velero restore createTo know more details about the restore resources, run the following command:
$ velero describe restore <restore_name> --details1 - 1
- Specify the name of the restore.
4.18.1.2. Creating a Restore CR Copier lienLien copié sur presse-papiers!
You restore a
Backup
Restore
When you restore a stateful application that uses the
azurefile-csi
Finalizing
Prerequisites
- You must install the OpenShift API for Data Protection (OADP) Operator.
-
The CR must be in a
DataProtectionApplicationstate.Ready -
You must have a Velero CR.
Backup - The persistent volume (PV) capacity must match the requested size at backup time. Adjust the requested size if needed.
Procedure
Create a
CR, as in the following example:RestoreapiVersion: velero.io/v1 kind: Restore metadata: name: <restore> namespace: openshift-adp spec: backupName: <backup>1 includedResources: []2 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io restorePVs: true3 - 1
- Name of the
BackupCR. - 2
- Optional: Specify an array of resources to include in the restore process. Resources might be shortcuts (for example,
poforpods) or fully-qualified. If unspecified, all resources are included. - 3
- Optional: The
restorePVsparameter can be set tofalseto turn off restore ofPersistentVolumesfromVolumeSnapshotof Container Storage Interface (CSI) snapshots or from native snapshots whenVolumeSnapshotLocationis configured.
Verify that the status of the
CR isRestoreby entering the following command:Completed$ oc get restores.velero.io -n openshift-adp <restore> -o jsonpath='{.status.phase}'Verify that the backup resources have been restored by entering the following command:
$ oc get all -n <namespace>1 - 1
- Namespace that you backed up.
If you restore
with volumes or if you use post-restore hooks, run theDeploymentConfigcleanup script by entering the following command:dc-post-restore.sh$ bash dc-restic-post-restore.sh -> dc-post-restore.shNoteDuring the restore process, the OADP Velero plug-ins scale down the
objects and restore the pods as standalone pods. This is done to prevent the cluster from deleting the restoredDeploymentConfigpods immediately on restore and to allow the restore and post-restore hooks to complete their actions on the restored pods. The cleanup script shown below removes these disconnected pods and scales anyDeploymentConfigobjects back up to the appropriate number of replicas.DeploymentConfigdc-restic-post-restore.sh → dc-post-restore.shcleanup script#!/bin/bash set -e # if sha256sum exists, use it to check the integrity of the file if command -v sha256sum >/dev/null 2>&1; then CHECKSUM_CMD="sha256sum" else CHECKSUM_CMD="shasum -a 256" fi label_name () { if [ "${#1}" -le "63" ]; then echo $1 return fi sha=$(echo -n $1|$CHECKSUM_CMD) echo "${1:0:57}${sha:0:6}" } if [[ $# -ne 1 ]]; then echo "usage: ${BASH_SOURCE} restore-name" exit 1 fi echo "restore: $1" label=$(label_name $1) echo "label: $label" echo Deleting disconnected restore pods oc delete pods --all-namespaces -l oadp.openshift.io/disconnected-from-dc=$label for dc in $(oc get dc --all-namespaces -l oadp.openshift.io/replicas-modified=$label -o jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{","}{.metadata.annotations.oadp\.openshift\.io/original-replicas}{","}{.metadata.annotations.oadp\.openshift\.io/original-paused}{"\n"}') do IFS=',' read -ra dc_arr <<< "$dc" if [ ${#dc_arr[0]} -gt 0 ]; then echo Found deployment ${dc_arr[0]}/${dc_arr[1]}, setting replicas: ${dc_arr[2]}, paused: ${dc_arr[3]} cat <<EOF | oc patch dc -n ${dc_arr[0]} ${dc_arr[1]} --patch-file /dev/stdin spec: replicas: ${dc_arr[2]} paused: ${dc_arr[3]} EOF fi done
4.18.1.3. Creating restore hooks Copier lienLien copié sur presse-papiers!
You create restore hooks to run commands in a container in a pod by editing the
Restore
You can create two types of restore hooks:
An
hook adds an init container to a pod to perform setup tasks before the application container starts.initIf you restore a Restic backup, the
init container is added before the restore hook init container.restic-wait-
An hook runs commands or scripts in a container of a restored pod.
exec
Procedure
Add a hook to the
block of thespec.hooksCR, as in the following example:RestoreapiVersion: velero.io/v1 kind: Restore metadata: name: <restore> namespace: openshift-adp spec: hooks: resources: - name: <hook_name> includedNamespaces: - <namespace>1 excludedNamespaces: - <namespace> includedResources: - pods2 excludedResources: [] labelSelector:3 matchLabels: app: velero component: server postHooks: - init: initContainers: - name: restore-hook-init image: alpine:latest volumeMounts: - mountPath: /restores/pvc1-vm name: pvc1-vm command: - /bin/ash - -c timeout:4 - exec: container: <container>5 command: - /bin/bash6 - -c - "psql < /backup/backup.sql" waitTimeout: 5m7 execTimeout: 1m8 onError: Continue9 - 1
- Optional: Array of namespaces to which the hook applies. If this value is not specified, the hook applies to all namespaces.
- 2
- Currently, pods are the only supported resource that hooks can apply to.
- 3
- Optional: This hook only applies to objects matching the label selector.
- 4
- Optional: Timeout specifies the maximum length of time Velero waits for
initContainersto complete. - 5
- Optional: If the container is not specified, the command runs in the first container in the pod.
- 6
- This is the entrypoint for the init container being added.
- 7
- Optional: How long to wait for a container to become ready. This should be long enough for the container to start and for any preceding hooks in the same container to complete. If not set, the restore process waits indefinitely.
- 8
- Optional: How long to wait for the commands to run. The default is
30s. - 9
- Allowed values for error handling are
FailandContinue:-
: Only command failures are logged.
Continue -
: No more restore hooks run in any container in any pod. The status of the
FailCR will beRestore.PartiallyFailed
-
During a File System Backup (FSB) restore operation, a
Deployment
ImageStream
postHook
This happens because, during the restore operation, OpenShift controller updates the
spec.template.spec.containers[0].image
Deployment
ImageStreamTag
velero
The workaround for this behavior is a two-step restore process:
First, perform a restore excluding the
resources, for example:Deployment$ velero restore create <RESTORE_NAME> \ --from-backup <BACKUP_NAME> \ --exclude-resources=deployment.appsAfter the first restore is successful, perform a second restore by including these resources, for example:
$ velero restore create <RESTORE_NAME> \ --from-backup <BACKUP_NAME> \ --include-resources=deployment.apps
4.19. OADP and ROSA Copier lienLien copié sur presse-papiers!
4.19.1. Backing up applications on ROSA clusters using OADP Copier lienLien copié sur presse-papiers!
Use OpenShift API for Data Protection (OADP) with Red Hat OpenShift Service on AWS (ROSA) clusters to back up and restore application data.
ROSA is a fully-managed, turnkey application platform that allows you to deliver value to your customers by building and deploying applications.
ROSA provides seamless integration with a wide range of Amazon Web Services (AWS) compute, database, analytics, machine learning, networking, mobile, and other services to speed up the building and delivery of differentiating experiences to your customers.
You can subscribe to the service directly from your AWS account.
After you create your clusters, you can operate your clusters with the OpenShift Container Platform web console or through Red Hat OpenShift Cluster Manager. You can also use ROSA with OpenShift APIs and command-line interface (CLI) tools.
For additional information about ROSA installation, see Installing Red Hat OpenShift Service on AWS (ROSA) interactive walk-through.
Before installing OpenShift API for Data Protection (OADP), you must set up role and policy credentials for OADP so that it can use the Amazon Web Services API.
This process is performed in the following two stages:
- Prepare AWS credentials
- Install the OADP Operator and give it an IAM role
4.19.1.1. Preparing AWS credentials for OADP Copier lienLien copié sur presse-papiers!
Prepare and configure an Amazon Web Services account to install OpenShift API for Data Protection (OADP).
Procedure
Create the following environment variables by running the following commands:
ImportantChange the cluster name to match your ROSA cluster, and ensure you are logged into the cluster as an administrator. Ensure that all fields are outputted correctly before continuing.
$ export CLUSTER_NAME=<my_cluster>Replace
with your cluster name.<my_cluster>$ export ROSA_CLUSTER_ID=$(rosa describe cluster -c ${CLUSTER_NAME} --output json | jq -r .id)$ export REGION=$(rosa describe cluster -c ${CLUSTER_NAME} --output json | jq -r .region.id)$ export OIDC_ENDPOINT=$(oc get authentication.config.openshift.io cluster -o jsonpath='{.spec.serviceAccountIssuer}' | sed 's|^https://||')$ export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)$ export CLUSTER_VERSION=$(rosa describe cluster -c ${CLUSTER_NAME} -o json | jq -r .version.raw_id | cut -f -2 -d '.')$ export ROLE_NAME="${CLUSTER_NAME}-openshift-oadp-aws-cloud-credentials"$ export SCRATCH="/tmp/${CLUSTER_NAME}/oadp"$ mkdir -p ${SCRATCH}$ echo "Cluster ID: ${ROSA_CLUSTER_ID}, Region: ${REGION}, OIDC Endpoint: ${OIDC_ENDPOINT}, AWS Account ID: ${AWS_ACCOUNT_ID}"On the AWS account, create an IAM policy to allow access to AWS S3:
Check to see if the policy exists by running the following command:
$ POLICY_ARN=$(aws iam list-policies --query "Policies[?PolicyName=='RosaOadpVer1'].{ARN:Arn}" --output text)-
: Replace
RosaOadpwith your policy name.RosaOadp
-
Enter the following command to create the policy JSON file and then create the policy:
NoteIf the policy ARN is not found, the command creates the policy. If the policy ARN already exists, the
statement intentionally skips the policy creation.if$ if [[ -z "${POLICY_ARN}" ]]; then cat << EOF > ${SCRATCH}/policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration", "s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration", "s3:GetBucketLocation", "s3:ListBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUploads", "s3:ListMultipartUploadParts", "s3:DescribeSnapshots", "ec2:DescribeVolumes", "ec2:DescribeVolumeAttribute", "ec2:DescribeVolumesModifications", "ec2:DescribeVolumeStatus", "ec2:CreateTags", "ec2:CreateVolume", "ec2:CreateSnapshot", "ec2:DeleteSnapshot" ], "Resource": "*" } ]} EOF POLICY_ARN=$(aws iam create-policy --policy-name "RosaOadpVer1" \ --policy-document file:///${SCRATCH}/policy.json --query Policy.Arn \ --tags Key=rosa_openshift_version,Value=${CLUSTER_VERSION} Key=rosa_role_prefix,Value=ManagedOpenShift Key=operator_namespace,Value=openshift-oadp Key=operator_name,Value=openshift-oadp \ --output text) fi-
:
SCRATCHis a name for a temporary directory created for the environment variables.SCRATCH
-
View the policy ARN by running the following command:
$ echo ${POLICY_ARN}
Create an IAM role trust policy for the cluster:
Create the trust policy file by running the following command:
$ cat <<EOF > ${SCRATCH}/trust-policy.json { "Version":2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_ENDPOINT}:sub": [ "system:serviceaccount:openshift-adp:openshift-adp-controller-manager", "system:serviceaccount:openshift-adp:velero"] } } }] } EOFCreate the role by running the following command:
$ ROLE_ARN=$(aws iam create-role --role-name \ "${ROLE_NAME}" \ --assume-role-policy-document file://${SCRATCH}/trust-policy.json \ --tags Key=rosa_cluster_id,Value=${ROSA_CLUSTER_ID} Key=rosa_openshift_version,Value=${CLUSTER_VERSION} Key=rosa_role_prefix,Value=ManagedOpenShift Key=operator_namespace,Value=openshift-adp Key=operator_name,Value=openshift-oadp \ --query Role.Arn --output text)View the role ARN by running the following command:
$ echo ${ROLE_ARN}
Attach the IAM policy to the IAM role by running the following command:
$ aws iam attach-role-policy --role-name "${ROLE_NAME}" \ --policy-arn ${POLICY_ARN}
4.19.1.2. Installing the OADP Operator and providing the IAM role Copier lienLien copié sur presse-papiers!
Install OpenShift API for Data Protection (OADP) on clusters with AWS STS. AWS Security Token Service (AWS STS) is a global web service that provides short-term credentials for IAM or federated users. OpenShift Container Platform with STS is the recommended credential mode.
Restic is unsupported.
Kopia file system backup (FSB) is supported when backing up file systems that do not have Container Storage Interface (CSI) snapshotting support.
Example file systems include the following:
- Amazon Elastic File System (EFS)
- Network File System (NFS)
-
volumes
emptyDir - Local volumes
For backing up volumes, OADP on ROSA with AWS STS supports only native snapshots and Container Storage Interface (CSI) snapshots.
In an Amazon ROSA cluster that uses STS authentication, restoring backed-up data in a different AWS region is not supported.
The Data Mover feature is not currently supported in ROSA clusters. You can use native AWS S3 tools for moving data.
Prerequisites
-
An OpenShift Container Platform ROSA cluster with the required access and tokens. For instructions, see the previous procedure Preparing AWS credentials for OADP. If you plan to use two different clusters for backing up and restoring, you must prepare AWS credentials, including , for each cluster.
ROLE_ARN
Procedure
Create an OpenShift Container Platform secret from your AWS token file by entering the following commands:
Create the credentials file:
$ cat <<EOF > ${SCRATCH}/credentials [default] role_arn = ${ROLE_ARN} web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token region = <aws_region> EOFReplace
with the AWS region to use for the STS endpoint.<aws_region>Create a namespace for OADP:
$ oc create namespace openshift-adpCreate the OpenShift Container Platform secret:
$ oc -n openshift-adp create secret generic cloud-credentials \ --from-file=${SCRATCH}/credentialsNoteIn OpenShift Container Platform versions 4.14 and later, the OADP Operator supports a new standardized STS workflow through the Operator Lifecycle Manager (OLM) and Cloud Credentials Operator (CCO). In this workflow, you do not need to create the above secret, you only need to supply the role ARN during the installation of OLM-managed operators using the OpenShift Container Platform web console, for more information see Installing from OperatorHub using the web console.
The preceding secret is created automatically by CCO.
Install the OADP Operator:
- In the OpenShift Container Platform web console, browse to Operators → OperatorHub.
- Search for the OADP Operator.
- In the role_ARN field, paste the role_arn that you created previously and click Install.
Create AWS cloud storage using your AWS credentials by entering the following command:
$ cat << EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: CloudStorage metadata: name: ${CLUSTER_NAME}-oadp namespace: openshift-adp spec: creationSecret: key: credentials name: cloud-credentials enableSharedConfig: true name: ${CLUSTER_NAME}-oadp provider: aws region: $REGION EOFCheck your application’s storage default storage class by entering the following command:
$ oc get pvc -n <namespace>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE applog Bound pvc-351791ae-b6ab-4e8b-88a4-30f73caf5ef8 1Gi RWO gp3-csi 4d19h mysql Bound pvc-16b8e009-a20a-4379-accc-bc81fedd0621 1Gi RWO gp3-csi 4d19hGet the storage class by running the following command:
$ oc get storageclassNAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 4d21h gp2-csi ebs.csi.aws.com Delete WaitForFirstConsumer true 4d21h gp3 ebs.csi.aws.com Delete WaitForFirstConsumer true 4d21h gp3-csi (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 4d21hNoteThe following storage classes will work:
- gp3-csi
- gp2-csi
- gp3
- gp2
If the application or applications that are being backed up are all using persistent volumes (PVs) with Container Storage Interface (CSI), it is advisable to include the CSI plugin in the OADP DPA configuration.
Create the
resource to configure the connection to the storage where the backups and volume snapshots are stored:DataProtectionApplicationIf you are using only CSI volumes, deploy a Data Protection Application by entering the following command:
$ cat << EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: ${CLUSTER_NAME}-dpa namespace: openshift-adp spec: backupImages: true features: dataMover: enable: false backupLocations: - bucket: cloudStorageRef: name: ${CLUSTER_NAME}-oadp credential: key: credentials name: cloud-credentials prefix: velero default: true config: region: ${REGION} configuration: velero: defaultPlugins: - openshift - aws - csi nodeAgent: enable: false uploaderType: kopia EOFwhere:
backupImages-
ROSA supports internal image backup. Set this field to
falseif you do not want to use image backup. nodeAgent-
See the important note regarding the
nodeAgentattribute at the end of this procedure. uploaderType-
Specifies the type of uploader. The built-in Data Mover uses Kopia as the default uploader mechanism regardless of the value of the
uploaderTypefield.
If you are using CSI or non-CSI volumes, deploy a Data Protection Application by entering the following command:
$ cat << EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: ${CLUSTER_NAME}-dpa namespace: openshift-adp spec: backupImages: true backupLocations: - bucket: cloudStorageRef: name: ${CLUSTER_NAME}-oadp credential: key: credentials name: cloud-credentials prefix: velero default: true config: region: ${REGION} configuration: velero: defaultPlugins: - openshift - aws nodeAgent: enable: false uploaderType: restic snapshotLocations: - velero: config: credentialsFile: /tmp/credentials/openshift-adp/cloud-credentials-credentials enableSharedConfig: "true" profile: default region: ${REGION} provider: aws EOFwhere:
backupImages-
ROSA supports internal image backup. Set this field to
falseif you do not want to use image backup. nodeAgent-
See the important note regarding the
nodeAgentattribute at the end of this procedure. credentialsFile- Specifies the mounted location of the bucket credential on the pod.
enableSharedConfig-
Specifies whether the
snapshotLocationscan share or reuse the credential defined for the bucket. profile- Specifies the profile name set in the AWS credentials file.
regionSpecifies your AWS region. This must be the same as the cluster region.
You are now ready to back up and restore OpenShift Container Platform applications, as described in Backing up applications.
ImportantThe
parameter ofenableis set toresticin this configuration, because OADP does not support Restic in ROSA environments.false
If you want to use two different clusters for backing up and restoring, the two clusters must have the same AWS S3 storage names in both the cloud storage CR and the OADP
configuration.DataProtectionApplication
4.19.1.3. Updating the IAM role ARN in the OADP Operator subscription Copier lienLien copié sur presse-papiers!
Update the OADP Operator subscription to fix an installation error due to incorrect IAM role Amazon Resource Name (ARN).
While installing the OADP Operator on a ROSA Security Token Service (STS) cluster, if you provide an incorrect IAM role Amazon Resource Name (ARN), the
openshift-adp-controller
Prerequisites
- You have a Red Hat OpenShift Service on AWS STS cluster with the required access and tokens.
- You have installed OADP on the ROSA STS cluster.
Procedure
To verify that the OADP subscription has the wrong IAM role ARN environment variable set, run the following command:
$ oc get sub -o yaml redhat-oadp-operatorExample subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: annotations: creationTimestamp: "2025-01-15T07:18:31Z" generation: 1 labels: operators.coreos.com/redhat-oadp-operator.openshift-adp: "" name: redhat-oadp-operator namespace: openshift-adp resourceVersion: "77363" uid: 5ba00906-5ad2-4476-ae7b-ffa90986283d spec: channel: stable-1.4 config: env: - name: ROLEARN value: arn:aws:iam::11111111:role/wrong-role-arn installPlanApproval: Manual name: redhat-oadp-operator source: prestage-operators sourceNamespace: openshift-marketplace startingCSV: oadp-operator.v1.4.2where:
ROLEARN-
Verify the value of
ROLEARNyou want to update.
Update the
field of the subscription with the correct role ARN by running the following command:ROLEARN$ oc patch subscription redhat-oadp-operator -p '{"spec": {"config": {"env": [{"name": "ROLEARN", "value": "<role_arn>"}]}}}' --type='merge'where:
<role_arn>-
Specifies the IAM role ARN to be updated. For example,
arn:aws:iam::160…..6956:role/oadprosa…..8wlf.
Verify that the
object is updated with correct role ARN value by running the following command:secret$ oc get secret cloud-credentials -o jsonpath='{.data.credentials}' | base64 -dExample output
[default] sts_regional_endpoints = regional role_arn = arn:aws:iam::160.....6956:role/oadprosa.....8wlf web_identity_token_file = /var/run/secrets/openshift/serviceaccount/tokenConfigure the
custom resource (CR) manifest file as shown in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: test-rosa-dpa namespace: openshift-adp spec: backupLocations: - bucket: config: region: us-east-1 cloudStorageRef: name: <cloud_storage> credential: name: cloud-credentials key: credentials prefix: velero default: true configuration: velero: defaultPlugins: - aws - openshiftwhere:
<cloud_storage>-
Specifies the
CloudStorageCR.
Create the
CR by running the following command:DataProtectionApplication$ oc create -f <dpa_manifest_file>Verify that the
CR is reconciled and theDataProtectionApplicationis set tostatusby running the following command:"True"$ oc get dpa -n openshift-adp -o yamlExample
DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... status: conditions: - lastTransitionTime: "2023-07-31T04:48:12Z" message: Reconcile complete reason: Complete status: "True" type: ReconciledVerify that the
CR is in an available state by running the following command:BackupStorageLocation$ oc get backupstoragelocations.velero.io -n openshift-adpExample
BackupStorageLocationNAME PHASE LAST VALIDATED AGE DEFAULT ts-dpa-1 Available 3s 6s true
4.19.1.4. Example: Performing a backup with OADP and OpenShift Container Platform Copier lienLien copié sur presse-papiers!
Perform a backup by using OpenShift API for Data Protection (OADP) with OpenShift Container Platform. The following example
hello-world
Either Data Protection Application (DPA) configuration will work.
Procedure
Create a workload to back up by running the following commands:
$ oc create namespace hello-world$ oc new-app -n hello-world --image=docker.io/openshift/hello-openshiftExpose the route by running the following command:
$ oc expose service/hello-openshift -n hello-worldCheck that the application is working by running the following command:
$ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`You should see an output similar to the following example:
Hello OpenShift!Back up the workload by running the following command:
$ cat << EOF | oc create -f - apiVersion: velero.io/v1 kind: Backup metadata: name: hello-world namespace: openshift-adp spec: includedNamespaces: - hello-world storageLocation: ${CLUSTER_NAME}-dpa-1 ttl: 720h0m0s EOFWait until the backup is completed and then run the following command:
$ watch "oc -n openshift-adp get backup hello-world -o json | jq .status"You should see an output similar to the following example:
{ "completionTimestamp": "2022-09-07T22:20:44Z", "expiration": "2022-10-07T22:20:22Z", "formatVersion": "1.1.0", "phase": "Completed", "progress": { "itemsBackedUp": 58, "totalItems": 58 }, "startTimestamp": "2022-09-07T22:20:22Z", "version": 1 }Delete the demo workload by running the following command:
$ oc delete ns hello-worldRestore the workload from the backup by running the following command:
$ cat << EOF | oc create -f - apiVersion: velero.io/v1 kind: Restore metadata: name: hello-world namespace: openshift-adp spec: backupName: hello-world EOFWait for the Restore to finish by running the following command:
$ watch "oc -n openshift-adp get restore hello-world -o json | jq .status"You should see an output similar to the following example:
{ "completionTimestamp": "2022-09-07T22:25:47Z", "phase": "Completed", "progress": { "itemsRestored": 38, "totalItems": 38 }, "startTimestamp": "2022-09-07T22:25:28Z", "warnings": 9 }Check that the workload is restored by running the following command:
$ oc -n hello-world get podsYou should see an output similar to the following example:
NAME READY STATUS RESTARTS AGE hello-openshift-9f885f7c6-kdjpj 1/1 Running 0 90sCheck the JSONPath by running the following command:
$ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`You should see an output similar to the following example:
Hello OpenShift!NoteFor troubleshooting tips, see the troubleshooting documentation.
4.19.1.5. Cleaning up a cluster after a backup with OADP and ROSA STS Copier lienLien copié sur presse-papiers!
Uninstall the OpenShift API for Data Protection (OADP) Operator together with the backups and the S3 bucket from the hello-world example.
Procedure
Delete the workload by running the following command:
$ oc delete ns hello-worldDelete the Data Protection Application (DPA) by running the following command:
$ oc -n openshift-adp delete dpa ${CLUSTER_NAME}-dpaDelete the cloud storage by running the following command:
$ oc -n openshift-adp delete cloudstorage ${CLUSTER_NAME}-oadpWarningIf this command hangs, you might need to delete the finalizer by running the following command:
$ oc -n openshift-adp patch cloudstorage ${CLUSTER_NAME}-oadp -p '{"metadata":{"finalizers":null}}' --type=mergeIf the Operator is no longer required, remove it by running the following command:
$ oc -n openshift-adp delete subscription oadp-operatorRemove the namespace from the Operator:
$ oc delete ns openshift-adpIf the backup and restore resources are no longer required, remove them from the cluster by running the following command:
$ oc delete backups.velero.io hello-worldTo delete backup, restore and remote objects in AWS S3 run the following command:
$ velero backup delete hello-worldIf you no longer need the Custom Resource Definitions (CRD), remove them from the cluster by running the following command:
$ for CRD in `oc get crds | grep velero | awk '{print $1}'`; do oc delete crd $CRD; doneDelete the AWS S3 bucket by running the following commands:
$ aws s3 rm s3://${CLUSTER_NAME}-oadp --recursive$ aws s3api delete-bucket --bucket ${CLUSTER_NAME}-oadpDetach the policy from the role by running the following command:
$ aws iam detach-role-policy --role-name "${ROLE_NAME}" --policy-arn "${POLICY_ARN}"Delete the role by running the following command:
$ aws iam delete-role --role-name "${ROLE_NAME}"
4.20. OADP and AWS STS Copier lienLien copié sur presse-papiers!
4.20.1. Backing up applications on AWS STS using OADP Copier lienLien copié sur presse-papiers!
You install the OpenShift API for Data Protection (OADP) with Amazon Web Services (AWS) by installing the OADP Operator. The Operator installs Velero 1.14.
Starting from OADP 1.0.4, all OADP 1.0.z versions can only be used as a dependency of the Migration Toolkit for Containers Operator and are not available as a standalone Operator.
You configure AWS for Velero, create a default
Secret
To install the OADP Operator in a restricted network environment, you must first disable the default OperatorHub sources and mirror the Operator catalog. See Using Operator Lifecycle Manager on restricted networks for details.
You can install OADP on an AWS Security Token Service (STS) (AWS STS) cluster manually. Amazon AWS provides AWS STS as a web service that enables you to request temporary, limited-privilege credentials for users. You use STS to provide trusted users with temporary access to resources via API calls, your AWS console, or the AWS command-line interface (CLI).
Before installing OpenShift API for Data Protection (OADP), you must set up role and policy credentials for OADP so that it can use the Amazon Web Services API.
This process is performed in the following two stages:
- Prepare AWS credentials.
- Install the OADP Operator and give it an IAM role.
4.20.1.1. Preparing AWS STS credentials for OADP Copier lienLien copié sur presse-papiers!
An Amazon Web Services account must be prepared and configured to accept an OpenShift API for Data Protection (OADP) installation. Prepare the AWS credentials by using the following procedure.
Procedure
Define the
environment variable by running the following command:cluster_name$ export CLUSTER_NAME= <AWS_cluster_name>1 - 1
- The variable can be set to any value.
Retrieve all of the details of the
such as theclusterby running the following command:AWS_ACCOUNT_ID, OIDC_ENDPOINT$ export CLUSTER_VERSION=$(oc get clusterversion version -o jsonpath='{.status.desired.version}{"\n"}')$ export AWS_CLUSTER_ID=$(oc get clusterversion version -o jsonpath='{.spec.clusterID}{"\n"}')$ export OIDC_ENDPOINT=$(oc get authentication.config.openshift.io cluster -o jsonpath='{.spec.serviceAccountIssuer}' | sed 's|^https://||')$ export REGION=$(oc get infrastructures cluster -o jsonpath='{.status.platformStatus.aws.region}' --allow-missing-template-keys=false || echo us-east-2)$ export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)$ export ROLE_NAME="${CLUSTER_NAME}-openshift-oadp-aws-cloud-credentials"Create a temporary directory to store all of the files by running the following command:
$ export SCRATCH="/tmp/${CLUSTER_NAME}/oadp" mkdir -p ${SCRATCH}Display all of the gathered details by running the following command:
$ echo "Cluster ID: ${AWS_CLUSTER_ID}, Region: ${REGION}, OIDC Endpoint: ${OIDC_ENDPOINT}, AWS Account ID: ${AWS_ACCOUNT_ID}"On the AWS account, create an IAM policy to allow access to AWS S3:
Check to see if the policy exists by running the following commands:
$ export POLICY_NAME="OadpVer1"-
: The variable can be set to any value.
POLICY_NAME
$ POLICY_ARN=$(aws iam list-policies --query "Policies[?PolicyName=='$POLICY_NAME'].{ARN:Arn}" --output text)-
Enter the following command to create the policy JSON file and then create the policy:
NoteIf the policy ARN is not found, the command creates the policy. If the policy ARN already exists, the
statement intentionally skips the policy creation.if$ if [[ -z "${POLICY_ARN}" ]]; then cat << EOF > ${SCRATCH}/policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration", "s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration", "s3:GetBucketLocation", "s3:ListBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts", "ec2:DescribeSnapshots", "ec2:DescribeVolumes", "ec2:DescribeVolumeAttribute", "ec2:DescribeVolumesModifications", "ec2:DescribeVolumeStatus", "ec2:CreateTags", "ec2:CreateVolume", "ec2:CreateSnapshot", "ec2:DeleteSnapshot" ], "Resource": "*" } ]} EOF POLICY_ARN=$(aws iam create-policy --policy-name $POLICY_NAME \ --policy-document file:///${SCRATCH}/policy.json --query Policy.Arn \ --tags Key=openshift_version,Value=${CLUSTER_VERSION} Key=operator_namespace,Value=openshift-adp Key=operator_name,Value=oadp \ --output text) fi-
: The name for a temporary directory created for storing the files.
SCRATCH
-
View the policy ARN by running the following command:
$ echo ${POLICY_ARN}
Create an IAM role trust policy for the cluster:
Create the trust policy file by running the following command:
$ cat <<EOF > ${SCRATCH}/trust-policy.json { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_ENDPOINT}:sub": [ "system:serviceaccount:openshift-adp:openshift-adp-controller-manager", "system:serviceaccount:openshift-adp:velero"] } } }] } EOFCreate an IAM role trust policy for the cluster by running the following command:
$ ROLE_ARN=$(aws iam create-role --role-name \ "${ROLE_NAME}" \ --assume-role-policy-document file://${SCRATCH}/trust-policy.json \ --tags Key=cluster_id,Value=${AWS_CLUSTER_ID} Key=openshift_version,Value=${CLUSTER_VERSION} Key=operator_namespace,Value=openshift-adp Key=operator_name,Value=oadp --query Role.Arn --output text)View the role ARN by running the following command:
$ echo ${ROLE_ARN}
Attach the IAM policy to the IAM role by running the following command:
$ aws iam attach-role-policy --role-name "${ROLE_NAME}" --policy-arn ${POLICY_ARN}
4.20.1.1.1. Setting Velero CPU and memory resource allocations Copier lienLien copié sur presse-papiers!
You set the CPU and memory resource allocations for the
Velero
DataProtectionApplication
Prerequisites
- You must have the OpenShift API for Data Protection (OADP) Operator installed.
Procedure
Edit the values in the
block of thespec.configuration.velero.podConfig.ResourceAllocationsCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_sample> spec: # ... configuration: velero: podConfig: nodeSelector: <node_selector> resourceAllocations: limits: cpu: "1" memory: 1024Mi requests: cpu: 200m memory: 256Miwhere:
nodeSelector- Specifies the node selector to be supplied to Velero podSpec.
resourceAllocationsSpecifies the resource allocations listed for average usage.
NoteKopia is an option in OADP 1.3 and later releases. You can use Kopia for file system backups, and Kopia is your only option for Data Mover cases with the built-in Data Mover.
Kopia is more resource intensive than Restic, and you might need to adjust the CPU and memory requirements accordingly.
4.20.1.2. Installing the OADP Operator and providing the IAM role Copier lienLien copié sur presse-papiers!
AWS Security Token Service (AWS STS) is a global web service that provides short-term credentials for IAM or federated users. This document describes how to install OpenShift API for Data Protection (OADP) on an AWS STS cluster manually.
Restic and Kopia are not supported in the OADP AWS STS environment. Verify that the Restic and Kopia node agent is disabled. For backing up volumes, OADP on AWS STS supports only native snapshots and Container Storage Interface (CSI) snapshots.
In an AWS cluster that uses STS authentication, restoring backed-up data in a different AWS region is not supported.
The Data Mover feature is not currently supported in AWS STS clusters. You can use native AWS S3 tools for moving data.
Prerequisites
-
An OpenShift Container Platform AWS STS cluster with the required access and tokens. For instructions, see the previous procedure Preparing AWS credentials for OADP. If you plan to use two different clusters for backing up and restoring, you must prepare AWS credentials, including , for each cluster.
ROLE_ARN
Procedure
Create an OpenShift Container Platform secret from your AWS token file by entering the following commands:
Create the credentials file:
$ cat <<EOF > ${SCRATCH}/credentials [default] role_arn = ${ROLE_ARN} web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token EOFCreate a namespace for OADP:
$ oc create namespace openshift-adpCreate the OpenShift Container Platform secret:
$ oc -n openshift-adp create secret generic cloud-credentials \ --from-file=${SCRATCH}/credentialsNoteIn OpenShift Container Platform versions 4.14 and later, the OADP Operator supports a new standardized STS workflow through the Operator Lifecycle Manager (OLM) and Cloud Credentials Operator (CCO). In this workflow, you do not need to create the above secret, you only need to supply the role ARN during the installation of OLM-managed operators using the OpenShift Container Platform web console, for more information see Installing from OperatorHub using the web console.
The preceding secret is created automatically by CCO.
Install the OADP Operator:
- In the OpenShift Container Platform web console, browse to Operators → OperatorHub.
- Search for the OADP Operator.
- In the role_ARN field, paste the role_arn that you created previously and click Install.
Create AWS cloud storage using your AWS credentials by entering the following command:
$ cat << EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: CloudStorage metadata: name: ${CLUSTER_NAME}-oadp namespace: openshift-adp spec: creationSecret: key: credentials name: cloud-credentials enableSharedConfig: true name: ${CLUSTER_NAME}-oadp provider: aws region: $REGION EOFCheck your application’s storage default storage class by entering the following command:
$ oc get pvc -n <namespace>Example output
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE applog Bound pvc-351791ae-b6ab-4e8b-88a4-30f73caf5ef8 1Gi RWO gp3-csi 4d19h mysql Bound pvc-16b8e009-a20a-4379-accc-bc81fedd0621 1Gi RWO gp3-csi 4d19hGet the storage class by running the following command:
$ oc get storageclassExample output
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 4d21h gp2-csi ebs.csi.aws.com Delete WaitForFirstConsumer true 4d21h gp3 ebs.csi.aws.com Delete WaitForFirstConsumer true 4d21h gp3-csi (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 4d21hNoteThe following storage classes will work:
- gp3-csi
- gp2-csi
- gp3
- gp2
If the application or applications that are being backed up are all using persistent volumes (PVs) with Container Storage Interface (CSI), it is advisable to include the CSI plugin in the OADP DPA configuration.
Create the
resource to configure the connection to the storage where the backups and volume snapshots are stored:DataProtectionApplicationIf you are using only CSI volumes, deploy a Data Protection Application by entering the following command:
$ cat << EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: ${CLUSTER_NAME}-dpa namespace: openshift-adp spec: backupImages: true1 features: dataMover: enable: false backupLocations: - bucket: cloudStorageRef: name: ${CLUSTER_NAME}-oadp credential: key: credentials name: cloud-credentials prefix: velero default: true config: region: ${REGION} configuration: velero: defaultPlugins: - openshift - aws - csi restic: enable: false EOF- 1
- Set this field to
falseif you do not want to use image backup.
If you are using CSI or non-CSI volumes, deploy a Data Protection Application by entering the following command:
$ cat << EOF | oc create -f - apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: ${CLUSTER_NAME}-dpa namespace: openshift-adp spec: backupImages: true1 features: dataMover: enable: false backupLocations: - bucket: cloudStorageRef: name: ${CLUSTER_NAME}-oadp credential: key: credentials name: cloud-credentials prefix: velero default: true config: region: ${REGION} configuration: velero: defaultPlugins: - openshift - aws nodeAgent:2 enable: false uploaderType: restic snapshotLocations: - velero: config: credentialsFile: /tmp/credentials/openshift-adp/cloud-credentials-credentials3 enableSharedConfig: "true"4 profile: default5 region: ${REGION}6 provider: aws EOF- 1
- Set this field to
falseif you do not want to use image backup. - 2
- See the important note regarding the
nodeAgentattribute. - 3
- The
credentialsFilefield is the mounted location of the bucket credential on the pod. - 4
- The
enableSharedConfigfield allows thesnapshotLocationsto share or reuse the credential defined for the bucket. - 5
- Use the profile name set in the AWS credentials file.
- 6
- Specify
regionas your AWS region. This must be the same as the cluster region.
You are now ready to back up and restore OpenShift Container Platform applications, as described in Backing up applications.
If you use OADP 1.2, replace this configuration:
nodeAgent:
enable: false
uploaderType: restic
with the following configuration:
restic:
enable: false
If you want to use two different clusters for backing up and restoring, the two clusters must have the same AWS S3 storage names in both the cloud storage CR and the OADP
DataProtectionApplication
4.20.1.3. Backing up workload on OADP AWS STS, with an optional cleanup Copier lienLien copié sur presse-papiers!
4.20.1.3.1. Performing a backup with OADP and AWS STS Copier lienLien copié sur presse-papiers!
The following example
hello-world
Either Data Protection Application (DPA) configuration will work.
Create a workload to back up by running the following commands:
$ oc create namespace hello-world$ oc new-app -n hello-world --image=docker.io/openshift/hello-openshiftExpose the route by running the following command:
$ oc expose service/hello-openshift -n hello-worldCheck that the application is working by running the following command:
$ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`Example output
Hello OpenShift!Back up the workload by running the following command:
$ cat << EOF | oc create -f - apiVersion: velero.io/v1 kind: Backup metadata: name: hello-world namespace: openshift-adp spec: includedNamespaces: - hello-world storageLocation: ${CLUSTER_NAME}-dpa-1 ttl: 720h0m0s EOFWait until the backup has completed and then run the following command:
$ watch "oc -n openshift-adp get backup hello-world -o json | jq .status"Example output
{ "completionTimestamp": "2022-09-07T22:20:44Z", "expiration": "2022-10-07T22:20:22Z", "formatVersion": "1.1.0", "phase": "Completed", "progress": { "itemsBackedUp": 58, "totalItems": 58 }, "startTimestamp": "2022-09-07T22:20:22Z", "version": 1 }Delete the demo workload by running the following command:
$ oc delete ns hello-worldRestore the workload from the backup by running the following command:
$ cat << EOF | oc create -f - apiVersion: velero.io/v1 kind: Restore metadata: name: hello-world namespace: openshift-adp spec: backupName: hello-world EOFWait for the Restore to finish by running the following command:
$ watch "oc -n openshift-adp get restore hello-world -o json | jq .status"Example output
{ "completionTimestamp": "2022-09-07T22:25:47Z", "phase": "Completed", "progress": { "itemsRestored": 38, "totalItems": 38 }, "startTimestamp": "2022-09-07T22:25:28Z", "warnings": 9 }Check that the workload is restored by running the following command:
$ oc -n hello-world get podsExample output
NAME READY STATUS RESTARTS AGE hello-openshift-9f885f7c6-kdjpj 1/1 Running 0 90sCheck the JSONPath by running the following command:
$ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`Example output
Hello OpenShift!
For troubleshooting tips, see the OADP team’s troubleshooting documentation.
4.20.1.3.2. Cleaning up a cluster after a backup with OADP and AWS STS Copier lienLien copié sur presse-papiers!
If you need to uninstall the OpenShift API for Data Protection (OADP) Operator together with the backups and the S3 bucket from this example, follow these instructions.
Procedure
Delete the workload by running the following command:
$ oc delete ns hello-worldDelete the Data Protection Application (DPA) by running the following command:
$ oc -n openshift-adp delete dpa ${CLUSTER_NAME}-dpaDelete the cloud storage by running the following command:
$ oc -n openshift-adp delete cloudstorage ${CLUSTER_NAME}-oadpImportantIf this command hangs, you might need to delete the finalizer by running the following command:
$ oc -n openshift-adp patch cloudstorage ${CLUSTER_NAME}-oadp -p '{"metadata":{"finalizers":null}}' --type=mergeIf the Operator is no longer required, remove it by running the following command:
$ oc -n openshift-adp delete subscription oadp-operatorRemove the namespace from the Operator by running the following command:
$ oc delete ns openshift-adpIf the backup and restore resources are no longer required, remove them from the cluster by running the following command:
$ oc delete backups.velero.io hello-worldTo delete backup, restore and remote objects in AWS S3, run the following command:
$ velero backup delete hello-worldIf you no longer need the Custom Resource Definitions (CRD), remove them from the cluster by running the following command:
$ for CRD in `oc get crds | grep velero | awk '{print $1}'`; do oc delete crd $CRD; doneDelete the AWS S3 bucket by running the following commands:
$ aws s3 rm s3://${CLUSTER_NAME}-oadp --recursive$ aws s3api delete-bucket --bucket ${CLUSTER_NAME}-oadpDetach the policy from the role by running the following command:
$ aws iam detach-role-policy --role-name "${ROLE_NAME}" --policy-arn "${POLICY_ARN}"Delete the role by running the following command:
$ aws iam delete-role --role-name "${ROLE_NAME}"
4.21. OADP and 3scale Copier lienLien copié sur presse-papiers!
4.21.1. Backing up and restoring 3scale API Management by using OADP Copier lienLien copié sur presse-papiers!
With Red Hat 3scale API Management, you can manage your APIs for internal or external users. You can deploy 3scale components on-premise, in the cloud, as a managed service, or in any combination based on your requirements.
With OpenShift API for Data Protection (OADP), you can safeguard 3scale API Management deployments by backing up application resources, persistent volumes, and configurations.
You can use the OpenShift API for Data Protection (OADP) Operator to back up and restore your 3scale API Management on-cluster storage databases without affecting your running services
You can configure OADP to perform the following operations with 3scale API Management:
- Create a backup of 3scale components by following the steps in Backing up 3scale API Management.
- Restore the components to scale up the 3scale operator and deployment by following the steps in Restoring 3scale API Management.
4.21.2. Backing up 3scale API Management by using OADP Copier lienLien copié sur presse-papiers!
Back up Red Hat 3scale API Management components, including the 3scale Operator, MySQL database, and Redis database, by using OpenShift API for Data Protection (OADP). This helps you protect your API management infrastructure and provides recovery in case of data loss.
For more information about installing and configuring Red Hat 3scale API Management, see Installing 3scale API Management on OpenShift and Red Hat 3scale API Management.
4.21.2.1. Creating the Data Protection Application Copier lienLien copié sur presse-papiers!
Create a Data Protection Application (DPA) custom resource (CR) to configure backup storage and Velero settings for Red Hat 3scale API Management. This helps you set up the backup infrastructure required for protecting your 3scale components.
Procedure
Create a YAML file with the following configuration:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: dpa-sample namespace: openshift-adp spec: configuration: velero: defaultPlugins: - openshift - aws - csi resourceTimeout: 10m nodeAgent: enable: true uploaderType: kopia backupLocations: - name: default velero: provider: aws default: true objectStorage: bucket: <bucket_name> prefix: <prefix> config: region: <region> profile: "default" s3ForcePathStyle: "true" s3Url: <s3_url> credential: key: cloud name: cloud-credentialswhere:
<bucket_name>- Specifies a bucket as the backup storage location. If the bucket is not a dedicated bucket for Velero backups, you must specify a prefix.
<prefix>-
Specifies a prefix for Velero backups, for example,
velero, if the bucket is used for multiple purposes. <region>- Specifies a region for backup storage location.
<s3_url>- Specifies the URL of the object store that you are using to store backups.
Create the DPA CR by running the following command:
$ oc create -f dpa.yaml
4.21.2.2. Backing up the 3scale API Management operator, secret, and APIManager Copier lienLien copié sur presse-papiers!
Back up the Red Hat 3scale API Management operator resources, including the
Secret
Prerequisites
- You created the Data Protection Application (DPA).
Procedure
Back up your 3scale operator CRs, such as
,operatorgroup, andnamespaces, by creating a YAML file with the following configuration:subscriptionsapiVersion: velero.io/v1 kind: Backup metadata: name: operator-install-backup namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: false includedNamespaces: - threescale includedResources: - operatorgroups - subscriptions - namespaces itemOperationTimeout: 1h0m0s snapshotMoveData: false ttl: 720h0m0swhere:
operator-install-backup-
Specifies the value of the
metadata.nameparameter in the backup. This is the same value used in themetadata.backupNameparameter used when restoring the 3scale operator. threescaleSpecifies the namespace where the 3scale operator is installed.
NoteYou can also back up and restore
,ReplicationControllers, andDeploymentobjects to ensure that all manually set environments are backed up and restored. This does not affect the flow of restoration.Pod
Create a backup CR by running the following command:
$ oc create -f backup.yamlExample output
backup.velero.io/operator-install-backup createdBack up the
CR by creating a YAML file with the following configuration:SecretapiVersion: velero.io/v1 kind: Backup metadata: name: operator-resources-secrets namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: false includedNamespaces: - threescale includedResources: - secrets itemOperationTimeout: 1h0m0s labelSelector: matchLabels: app: 3scale-api-management snapshotMoveData: false snapshotVolumes: false ttl: 720h0m0sname-
Specifies the value of the
metadata.nameparameter in the backup. Use this value in themetadata.backupNameparameter when restoring theSecret.
Create the
backup CR by running the following command:Secret$ oc create -f backup-secret.yamlExample output
backup.velero.io/operator-resources-secrets createdBack up the APIManager CR by creating a YAML file with the following configuration:
apiVersion: velero.io/v1 kind: Backup metadata: name: operator-resources-apim namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: false includedNamespaces: - threescale includedResources: - apimanagers itemOperationTimeout: 1h0m0s snapshotMoveData: false snapshotVolumes: false storageLocation: ts-dpa-1 ttl: 720h0m0s volumeSnapshotLocations: - ts-dpa-1name-
Specifies the value of the
metadata.nameparameter in the backup. Use this value in themetadata.backupNameparameter when restoring the APIManager.
Create the APIManager CR by running the following command:
$ oc create -f backup-apimanager.yamlExample output
backup.velero.io/operator-resources-apim created
4.21.2.3. Backing up a MySQL database Copier lienLien copié sur presse-papiers!
Back up a MySQL database by creating a persistent volume claim (PVC) to store the database dump. This helps you preserve your 3scale system database data for recovery scenarios.
Prerequisites
- You have backed up the Red Hat 3scale API Management operator.
Procedure
Create a YAML file with the following configuration for adding an additional PVC:
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: example-claim namespace: threescale spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: gp3-csi volumeMode: FilesystemCreate the additional PVC by running the following command:
$ oc create -f ts_pvc.ymlAttach the PVC to the system database pod by editing the
deployment to use the MySQL dump:system-mysql$ oc edit deployment system-mysql -n threescalevolumeMounts: - name: example-claim mountPath: /var/lib/mysqldump/data - name: mysql-storage mountPath: /var/lib/mysql/data - name: mysql-extra-conf mountPath: /etc/my-extra.d - name: mysql-main-conf mountPath: /etc/my-extra ... serviceAccount: amp volumes: - name: example-claim persistentVolumeClaim: claimName: example-claim ...claimName- Specifies the PVC that contains the dumped data.
Create a YAML file with following configuration to back up the MySQL database:
apiVersion: velero.io/v1 kind: Backup metadata: name: mysql-backup namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: true hooks: resources: - name: dumpdb pre: - exec: command: - /bin/sh - -c - mysqldump -u $MYSQL_USER --password=$MYSQL_PASSWORD system --no-tablespaces > /var/lib/mysqldump/data/dump.sql container: system-mysql onError: Fail timeout: 5m includedNamespaces: - threescale includedResources: - deployment - pods - replicationControllers - persistentvolumeclaims - persistentvolumes itemOperationTimeout: 1h0m0s labelSelector: matchLabels: app: 3scale-api-management threescale_component_element: mysql snapshotMoveData: false ttl: 720h0m0swhere:
mysql-backup-
Specifies the value of the
metadata.nameparameter in the backup. Use this value in themetadata.backupNameparameter when restoring the MySQL database. /var/lib/mysqldump/data/dump.sql- Specifies the directory where the data is backed up.
includedResources- Specifies the resources to back up.
Back up the MySQL database by running the following command:
$ oc create -f mysql.yamlExample output
backup.velero.io/mysql-backup created
Verification
Verify that the MySQL backup is completed by running the following command:
$ oc get backups.velero.io mysql-backup -o yamlExample output
status: completionTimestamp: "2025-04-17T13:25:19Z" errors: 1 expiration: "2025-05-17T13:25:16Z" formatVersion: 1.1.0 hookStatus: {} phase: Completed progress: {} startTimestamp: "2025-04-17T13:25:16Z" version: 1
4.21.2.4. Backing up the back-end Redis database Copier lienLien copié sur presse-papiers!
Back up the back-end Redis database by configuring Velero annotations and creating a backup CR with the required resources. This helps you preserve your 3scale back-end Redis data for recovery scenarios.
Prerequisites
- You backed up the Red Hat 3scale API Management operator.
- You backed up your MySQL database.
- The Redis queues have been drained before performing the backup.
Procedure
Edit the annotations on the
deployment by running the following command:backend-redis$ oc edit deployment backend-redis -n threescaleannotations: post.hook.backup.velero.io/command: >- ["/bin/bash", "-c", "redis-cli CONFIG SET auto-aof-rewrite-percentage 100"] pre.hook.backup.velero.io/command: >- ["/bin/bash", "-c", "redis-cli CONFIG SET auto-aof-rewrite-percentage 0"]Create a YAML file with the following configuration to back up the Redis database:
apiVersion: velero.io/v1 kind: Backup metadata: name: redis-backup namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: true includedNamespaces: - threescale includedResources: - deployment - pods - replicationcontrollers - persistentvolumes - persistentvolumeclaims itemOperationTimeout: 1h0m0s labelSelector: matchLabels: app: 3scale-api-management threescale_component: backend threescale_component_element: redis snapshotMoveData: false snapshotVolumes: false ttl: 720h0m0sname-
Specifies the value of the
metadata.nameparameter in the backup. Use this value in themetadata.backupNameparameter when restoring the Redis database.
Back up the Redis database by running the following command:
$ oc create -f redis-backup.yamlExample output
backup.velero.io/redis-backup created
Verification
Verify that the Redis backup is completed by running the following command:
$ oc get backups.velero.io redis-backup -o yamlExample output
status: completionTimestamp: "2025-04-17T13:25:19Z" errors: 1 expiration: "2025-05-17T13:25:16Z" formatVersion: 1.1.0 hookStatus: {} phase: Completed progress: {} startTimestamp: "2025-04-17T13:25:16Z" version: 1
4.21.3. Restoring 3scale API Management by using OADP Copier lienLien copié sur presse-papiers!
You can restore Red Hat 3scale API Management components by restoring the backed up 3scale operator resources. You can also restore databases such as MySQL and Redis.
After the data has been restored, you can scale up the 3scale operator and deployment.
Prerequisites
- You installed and configured Red Hat 3scale API Management. For more information, see Installing 3scale API Management on OpenShift and Red Hat 3scale API Management.
- You backed up the 3scale operator, and databases such as MySQL and Redis.
- Ensure that you are restoring 3scale on the same cluster where it was backed up from.
- If you want to restore 3scale on a different cluster, ensure that the original backed-up cluster and the cluster you want to restore the operator on are using the same custom domain.
4.21.3.1. Restoring the 3scale API Management operator, secrets, and APIManager Copier lienLien copié sur presse-papiers!
You can restore the Red Hat 3scale API Management operator resources, and both the
Secret
Prerequisites
- You backed up the 3scale operator.
- You backed up the MySQL and Redis databases.
You are restoring the database on the same cluster, where it was backed up.
If you are restoring the operator to a different cluster that you backed up from, install and configure OADP with
enabled on the destination cluster. Ensure that the OADP configuration is same as it was on the source cluster.nodeAgent
Procedure
Delete the 3scale operator custom resource definitions (CRDs) along with the
namespace by running the following command:threescale$ oc delete project threescaleExample output
"threescale" project deleted successfullyCreate a YAML file with the following configuration to restore the 3scale operator:
Example
restore.yamlfileapiVersion: velero.io/v1 kind: Restore metadata: name: operator-installation-restore namespace: openshift-adp spec: backupName: operator-install-backup1 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io - backuprepositories.velero.io itemOperationTimeout: 4h0m0s- 1
- Restoring the 3scale operator’s backup
Restore the 3scale operator by running the following command:
$ oc create -f restore.yamlExample output
restore.velerio.io/operator-installation-restore createdManually create the
s3-credentialsobject by running the following command:Secret$ oc apply -f - <<EOF --- apiVersion: v1 kind: Secret metadata: name: s3-credentials namespace: threescale stringData: AWS_ACCESS_KEY_ID: <ID_123456>1 AWS_SECRET_ACCESS_KEY: <ID_98765544>2 AWS_BUCKET: <mybucket.example.com>3 AWS_REGION: <us-east-1>4 type: Opaque EOFScale down the 3scale operator by running the following command:
$ oc scale deployment threescale-operator-controller-manager-v2 --replicas=0 -n threescaleExample output
deployment.apps/threescale-operator-controller-manager-v2 scaledCreate a YAML file with the following configuration to restore the
:SecretExample
restore-secret.yamlfileapiVersion: velero.io/v1 kind: Restore metadata: name: operator-resources-secrets namespace: openshift-adp spec: backupName: operator-resources-secrets1 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io - backuprepositories.velero.io itemOperationTimeout: 4h0m0s- 1
- Restoring the
Secretbackup.
Restore the
by running the following command:Secret$ oc create -f restore-secrets.yamlExample output
restore.velerio.io/operator-resources-secrets createdCreate a YAML file with the following configuration to restore APIManager:
Example
restore-apimanager.yamlfileapiVersion: velero.io/v1 kind: Restore metadata: name: operator-resources-apim namespace: openshift-adp spec: backupName: operator-resources-apim1 excludedResources:2 - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io - backuprepositories.velero.io itemOperationTimeout: 4h0m0sRestore the APIManager by running the following command:
$ oc create -f restore-apimanager.yamlExample output
restore.velerio.io/operator-resources-apim createdScale up the 3scale operator by running the following command:
$ oc scale deployment threescale-operator-controller-manager-v2 --replicas=1 -n threescaleExample output
deployment.apps/threescale-operator-controller-manager-v2 scaled
4.21.3.2. Restoring a MySQL database Copier lienLien copié sur presse-papiers!
Restoring a MySQL database re-creates the following resources:
-
The ,
Pod, andReplicationControllerobjects.Deployment - The additional persistent volumes (PVs) and associated persistent volume claims (PVCs).
-
The MySQL dump, which the PVC contains.
example-claim
Do not delete the default PV and PVC associated with the database. If you do, your backups are deleted.
Prerequisites
-
You restored the and APIManager custom resources (CRs).
Secret
Procedure
Scale down the Red Hat 3scale API Management operator by running the following command:
$ oc scale deployment threescale-operator-controller-manager-v2 --replicas=0 -n threescaleExample output
deployment.apps/threescale-operator-controller-manager-v2 scaledCreate the following script to scale down the 3scale operator:
$ vi ./scaledowndeployment.shExample script:
for deployment in apicast-production apicast-staging backend-cron backend-listener backend-redis backend-worker system-app system-memcache system-mysql system-redis system-searchd system-sidekiq zync zync-database zync-que; do oc scale deployment/$deployment --replicas=0 -n threescale doneScale down all the deployment 3scale components by running the following script:
$ ./scaledowndeployment.shExample output
deployment.apps.openshift.io/apicast-production scaled deployment.apps.openshift.io/apicast-staging scaled deployment.apps.openshift.io/backend-cron scaled deployment.apps.openshift.io/backend-listener scaled deployment.apps.openshift.io/backend-redis scaled deployment.apps.openshift.io/backend-worker scaled deployment.apps.openshift.io/system-app scaled deployment.apps.openshift.io/system-memcache scaled deployment.apps.openshift.io/system-mysql scaled deployment.apps.openshift.io/system-redis scaled deployment.apps.openshift.io/system-searchd scaled deployment.apps.openshift.io/system-sidekiq scaled deployment.apps.openshift.io/zync scaled deployment.apps.openshift.io/zync-database scaled deployment.apps.openshift.io/zync-que scaledDelete the
system-mysqlobject by running the following command:Deployment$ oc delete deployment system-mysql -n threescaleExample output
Warning: apps.openshift.io/v1 deployment is deprecated in v4.14+, unavailable in v4.10000+ deployment.apps.openshift.io "system-mysql" deletedCreate the following YAML file to restore the MySQL database:
Example
restore-mysql.yamlfileapiVersion: velero.io/v1 kind: Restore metadata: name: restore-mysql namespace: openshift-adp spec: backupName: mysql-backup1 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io - backuprepositories.velero.io - resticrepositories.velero.io hooks: resources: - name: restoreDB postHooks: - exec: command: - /bin/sh - '-c' - > sleep 30 mysql -h 127.0.0.1 -D system -u root --password=$MYSQL_ROOT_PASSWORD < /var/lib/mysqldump/data/dump.sql2 container: system-mysql execTimeout: 80s onError: Fail waitTimeout: 5m itemOperationTimeout: 1h0m0s restorePVs: trueRestore the MySQL database by running the following command:
$ oc create -f restore-mysql.yamlExample output
restore.velerio.io/restore-mysql created
Verification
Verify that the
restore is completed by running the following command:PodVolumeRestore$ oc get podvolumerestores.velero.io -n openshift-adpExample output
NAME NAMESPACE POD UPLOADER TYPE VOLUME STATUS TOTALBYTES BYTESDONE AGE restore-mysql-rbzvm threescale system-mysql-2-kjkhl kopia mysql-storage Completed 771879108 771879108 40m restore-mysql-z7x7l threescale system-mysql-2-kjkhl kopia example-claim Completed 380415 380415 40mVerify that the additional PVC has been restored by running the following command:
$ oc get pvc -n threescaleExample output
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE backend-redis-storage Bound pvc-3dca410d-3b9f-49d4-aebf-75f47152e09d 1Gi RWO gp3-csi <unset> 68m example-claim Bound pvc-cbaa49b0-06cd-4b1a-9e90-0ef755c67a54 1Gi RWO gp3-csi <unset> 57m mysql-storage Bound pvc-4549649f-b9ad-44f7-8f67-dd6b9dbb3896 1Gi RWO gp3-csi <unset> 68m system-redis-storage Bound pvc-04dadafd-8a3e-4d00-8381-6041800a24fc 1Gi RWO gp3-csi <unset> 68m system-searchd Bound pvc-afbf606c-d4a8-4041-8ec6-54c5baf1a3b9 1Gi RWO gp3-csi <unset> 68m
4.21.3.3. Restoring the back-end Redis database Copier lienLien copié sur presse-papiers!
You can restore the back-end Redis database by deleting the deployment and specifying which resources you do not want to restore.
Prerequisites
-
You restored the Red Hat 3scale API Management operator resources, , and APIManager custom resources.
Secret - You restored the MySQL database.
Procedure
Delete the
deployment by running the following command:backend-redis$ oc delete deployment backend-redis -n threescaleExample output
Warning: apps.openshift.io/v1 deployment is deprecated in v4.14+, unavailable in v4.10000+ deployment.apps.openshift.io "backend-redis" deletedCreate a YAML file with the following configuration to restore the Redis database:
Example
restore-backend.yamlfileapiVersion: velero.io/v1 kind: Restore metadata: name: restore-backend namespace: openshift-adp spec: backupName: redis-backup1 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io - backuprepositories.velero.io itemOperationTimeout: 1h0m0s restorePVs: true- 1
- Restoring the Redis backup.
Restore the Redis database by running the following command:
$ oc create -f restore-backend.yamlExample output
restore.velerio.io/restore-backend created
Verification
Verify that the
restore is completed by running the following command:PodVolumeRestore$ oc get podvolumerestores.velero.io -n openshift-adpExample output:
NAME NAMESPACE POD UPLOADER TYPE VOLUME STATUS TOTALBYTES BYTESDONE AGE restore-backend-jmrwx threescale backend-redis-1-bsfmv kopia backend-redis-storage Completed 76123 76123 21m
4.21.3.4. Scaling up the 3scale API Management operator and deployment Copier lienLien copié sur presse-papiers!
You can scale up the Red Hat 3scale API Management operator and any deployment that was manually scaled down. After a few minutes, 3scale installation should be fully functional, and its state should match the backed-up state.
Prerequisites
-
You restored the 3scale operator resources, and both the and APIManager custom resources (CRs).
Secret - You restored the MySQL and back-end Redis databases.
-
Ensure that there are no scaled up deployments or no extra pods running. There might be some or
system-mysqlpods running detached from deployments after restoration, which can be removed after the restoration is successful.backend-redis
Procedure
Scale up the 3scale operator by running the following command:
$ oc scale deployment threescale-operator-controller-manager-v2 --replicas=1 -n threescaleExample output
deployment.apps/threescale-operator-controller-manager-v2 scaledEnsure that the 3scale pod is running to verify if the 3scale operator was deployed by running the following command:
$ oc get pods -n threescaleExample output
NAME READY STATUS RESTARTS AGE threescale-operator-controller-manager-v2-79546bd8c-b4qbh 1/1 Running 0 2m5sCreate the following script to scale up the deployments:
$ vi ./scaledeployment.shExample script file:
for deployment in apicast-production apicast-staging backend-cron backend-listener backend-redis backend-worker system-app system-memcache system-mysql system-redis system-searchd system-sidekiq zync zync-database zync-que; do oc scale deployment/$deployment --replicas=1 -n threescale doneScale up the deployments by running the following script:
$ ./scaledeployment.shExample output
deployment.apps.openshift.io/apicast-production scaled deployment.apps.openshift.io/apicast-staging scaled deployment.apps.openshift.io/backend-cron scaled deployment.apps.openshift.io/backend-listener scaled deployment.apps.openshift.io/backend-redis scaled deployment.apps.openshift.io/backend-worker scaled deployment.apps.openshift.io/system-app scaled deployment.apps.openshift.io/system-memcache scaled deployment.apps.openshift.io/system-mysql scaled deployment.apps.openshift.io/system-redis scaled deployment.apps.openshift.io/system-searchd scaled deployment.apps.openshift.io/system-sidekiq scaled deployment.apps.openshift.io/zync scaled deployment.apps.openshift.io/zync-database scaled deployment.apps.openshift.io/zync-que scaledGet the
route to log in to the 3scale UI by running the following command:3scale-admin$ oc get routes -n threescaleExample output
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD backend backend-3scale.apps.custom-cluster-name.openshift.com backend-listener http edge/Allow None zync-3scale-api-b4l4d api-3scale-apicast-production.apps.custom-cluster-name.openshift.com apicast-production gateway edge/Redirect None zync-3scale-api-b6sns api-3scale-apicast-staging.apps.custom-cluster-name.openshift.com apicast-staging gateway edge/Redirect None zync-3scale-master-7sc4j master.apps.custom-cluster-name.openshift.com system-master http edge/Redirect None zync-3scale-provider-7r2nm 3scale-admin.apps.custom-cluster-name.openshift.com system-provider http edge/Redirect None zync-3scale-provider-mjxlb 3scale.apps.custom-cluster-name.openshift.com system-developer http edge/Redirect NoneIn this example,
is the 3scale-admin URL.3scale-admin.apps.custom-cluster-name.openshift.com- Use the URL from this output to log in to the 3scale operator as an administrator. You can verify that the data, when you took backup, is available.
4.22. OADP Data Mover Copier lienLien copié sur presse-papiers!
4.22.1. About the OADP Data Mover Copier lienLien copié sur presse-papiers!
Use the OpenShift API for Data Protection (OADP) built-in Data Mover to move Container Storage Interface (CSI) volume snapshots to remote object storage and restore stateful applications after cluster failures. This provides disaster recovery capabilities for both containerized and virtual machine workloads.
The Data Mover uses Kopia as the uploader mechanism to read the snapshot data and write to the unified repository.
OADP supports CSI snapshots on the following:
- Red Hat OpenShift Data Foundation
- Any other cloud storage provider with the Container Storage Interface (CSI) driver that supports the Kubernetes Volume Snapshot API
4.22.1.1. Data Mover support Copier lienLien copié sur presse-papiers!
Review Data Mover support and compatibility across OADP versions to understand which backups can be restored. This helps you plan version upgrades and backup strategies.
The OADP built-in Data Mover, which was introduced in OADP 1.3 as a Technology Preview, is now fully supported for both containerized and virtual machine workloads.
- Supported
- The Data Mover backups taken with OADP 1.3 can be restored using OADP 1.3 and later.
- Not supported
- Backups taken with OADP 1.1 or OADP 1.2 using the Data Mover feature cannot be restored using OADP 1.3 and later.
OADP 1.1 and OADP 1.2 are no longer supported. The DataMover feature in OADP 1.1 or OADP 1.2 was a Technology Preview and was never supported. DataMover backups taken with OADP 1.1 or OADP 1.2 cannot be restored on later versions of OADP.
4.22.1.2. Enabling the built-in Data Mover Copier lienLien copié sur presse-papiers!
Enable the built-in Data Mover by configuring the CSI plugin and node agent in the
DataProtectionApplication
Procedure
Include the CSI plugin and enable the node agent in the
custom resource (CR) as shown in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: dpa-sample spec: configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - csi defaultSnapshotMoveData: true defaultVolumesToFSBackup: featureFlags: - EnableCSI # ...where:
enable- Specifies the flag to enable the node agent.
uploaderType-
Specifies the type of uploader. The possible values are
resticorkopia. The built-in Data Mover uses Kopia as the default uploader mechanism regardless of the value of theuploaderTypefield. csi- Specifies the CSI plugin included in the list of default plugins.
defaultVolumesToFSBackup-
Specifies the default behavior for volumes. In OADP 1.3.1 and later, set to
trueif you use Data Mover only for volumes that opt out offs-backup. Set tofalseif you use Data Mover by default for volumes.
4.22.1.3. Built-in Data Mover controller and custom resource definitions (CRDs) Copier lienLien copié sur presse-papiers!
Review the custom resource definitions (CRDs) that the built-in Data Mover uses to manage volume snapshot backup and restore operations. This helps you understand how Data Mover handles data upload, download, and repository management.
The built-in Data Mover feature introduces three new API objects defined as CRDs for managing backup and restore:
-
: Represents a data download of a volume snapshot. The CSI plugin creates one
DataDownloadobject per volume to be restored. TheDataDownloadCR includes information about the target volume, the specified Data Mover, the progress of the current data download, the specified backup repository, and the result of the current data download after the process is complete.DataDownload -
: Represents a data upload of a volume snapshot. The CSI plugin creates one
DataUploadobject per CSI snapshot. TheDataUploadCR includes information about the specified snapshot, the specified Data Mover, the specified backup repository, the progress of the current data upload, and the result of the current data upload after the process is complete.DataUpload -
: Represents and manages the lifecycle of the backup repositories. OADP creates a backup repository per namespace when the first CSI snapshot backup or restore for a namespace is requested.
BackupRepository
4.22.1.4. About incremental back up support Copier lienLien copié sur presse-papiers!
OADP supports incremental backups of
block
Filesystem
| Volume mode | FSB - Restic | FSB - Kopia | CSI | CSI Data Mover |
|---|---|---|---|---|
| Filesystem | S [1], I [2] | S [1], I [2] | S [1] | S [1], I [2] |
| Block | N [3] | N [3] | S [1] | S [1], I [2] |
| Volume mode | FSB - Restic | FSB - Kopia | CSI | CSI Data Mover |
|---|---|---|---|---|
| Filesystem | N [3] | N [3] | S [1] | S [1], I [2] |
| Block | N [3] | N [3] | S [1] | S [1], I [2] |
- Backup supported
- Incremental backup supported
- Not supported
The CSI Data Mover backups use Kopia regardless of
uploaderType
4.22.2. Backing up and restoring CSI snapshots data movement Copier lienLien copié sur presse-papiers!
You can back up and restore persistent volumes by using the OADP 1.3 Data Mover.
4.22.2.1. Backing up persistent volumes with CSI snapshots Copier lienLien copié sur presse-papiers!
You can use the OADP Data Mover to back up Container Storage Interface (CSI) volume snapshots to a remote object store.
Prerequisites
-
You have access to the cluster with the role.
cluster-admin - You have installed the OADP Operator.
-
You have included the CSI plugin and enabled the node agent in the custom resource (CR).
DataProtectionApplication - You have an application with persistent volumes running in a separate namespace.
-
You have added the key-value pair to the
metadata.labels.velero.io/csi-volumesnapshot-class: "true"CR.VolumeSnapshotClass
Procedure
Create a YAML file for the
object, as in the following example:Backupkind: Backup apiVersion: velero.io/v1 metadata: name: backup namespace: openshift-adp spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: includedNamespaces: - mysql-persistent itemOperationTimeout: 4h0m0s snapshotMoveData: true storageLocation: default ttl: 720h0m0s volumeSnapshotLocations: - dpa-sample-1 # ...where:
defaultVolumesToFsBackup-
Set to
trueif you use Data Mover only for volumes that opt out offs-backup. Set tofalseif you use Data Mover by default for volumes. snapshotMoveData-
Set to
trueto enable movement of CSI snapshots to remote object storage. ttl-
The
ttlfield defines the retention time of the created backup and the backed up data. For example, if you are using Restic as the backup tool, the backed up data items and data contents of the persistent volumes (PVs) are stored until the backup expires. But storing this data consumes more space in the target backup locations. An additional storage is consumed with frequent backups, which are created even before other unexpired completed backups might have timed out.
Apply the manifest:
$ oc create -f backup.yamlA
CR is created after the snapshot creation is complete.DataUploadNoteIf you format the volume by using XFS filesystem and the volume is at 100% capacity, the backup fails with a
error. For example:no space left on deviceError: relabel failed /var/lib/kubelet/pods/3ac..34/volumes/ \ kubernetes.io~csi/pvc-684..12c/mount: lsetxattr /var/lib/kubelet/ \ pods/3ac..34/volumes/kubernetes.io~csi/pvc-68..2c/mount/data-xfs-103: \ no space left on deviceIn this scenario, consider resizing the volume or using a different filesystem type, for example,
, so that the backup completes successfully.ext4
Verification
Verify that the snapshot data is successfully transferred to the remote object store by monitoring the
field of thestatus.phaseCR. Possible values areDataUpload,In Progress,Completed, orFailed. The object store is configured in theCanceledstanza of thebackupLocationsCR.DataProtectionApplicationRun the following command to get a list of all
objects:DataUpload$ oc get datauploads -AExample output
NAMESPACE NAME STATUS STARTED BYTES DONE TOTAL BYTES STORAGE LOCATION AGE NODE openshift-adp backup-test-1-sw76b Completed 9m47s 108104082 108104082 dpa-sample-1 9m47s ip-10-0-150-57.us-west-2.compute.internal openshift-adp mongo-block-7dtpf Completed 14m 1073741824 1073741824 dpa-sample-1 14m ip-10-0-150-57.us-west-2.compute.internalCheck the value of the
field of the specificstatus.phaseobject by running the following command:DataUpload$ oc get datauploads <dataupload_name> -o yamlExample output
apiVersion: velero.io/v2alpha1 kind: DataUpload metadata: name: backup-test-1-sw76b namespace: openshift-adp spec: backupStorageLocation: dpa-sample-1 csiSnapshot: snapshotClass: "" storageClass: gp3-csi volumeSnapshot: velero-mysql-fq8sl operationTimeout: 10m0s snapshotType: CSI sourceNamespace: mysql-persistent sourcePVC: mysql status: completionTimestamp: "2023-11-02T16:57:02Z" node: ip-10-0-150-57.us-west-2.compute.internal path: /host_pods/15116bac-cc01-4d9b-8ee7-609c3bef6bde/volumes/kubernetes.io~csi/pvc-eead8167-556b-461a-b3ec-441749e291c4/mount phase: Completed progress: bytesDone: 108104082 totalBytes: 108104082 snapshotID: 8da1c5febf25225f4577ada2aeb9f899 startTimestamp: "2023-11-02T16:56:22Z"where:
phase: Completed- Indicates that snapshot data is successfully transferred to the remote object store.
4.22.2.2. Restoring CSI volume snapshots Copier lienLien copié sur presse-papiers!
You can restore a volume snapshot by creating a
Restore
You cannot restore Volsync backups from OADP 1.2 with the OAPD 1.3 built-in Data Mover. It is recommended to do a file system backup of all of your workloads with Restic before upgrading to OADP 1.3.
Prerequisites
-
You have access to the cluster with the role.
cluster-admin -
You have an OADP CR from which to restore the data.
Backup
Procedure
Create a YAML file for the
CR, as in the following example:RestoreExample
RestoreCRapiVersion: velero.io/v1 kind: Restore metadata: name: restore namespace: openshift-adp spec: backupName: <backup> # ...Apply the manifest:
$ oc create -f restore.yamlA
CR is created when the restore starts.DataDownload
Verification
You can monitor the status of the restore process by checking the
field of thestatus.phaseCR. Possible values areDataDownload,In Progress,Completed, orFailed.CanceledTo get a list of all
objects, run the following command:DataDownload$ oc get datadownloads -AExample output
NAMESPACE NAME STATUS STARTED BYTES DONE TOTAL BYTES STORAGE LOCATION AGE NODE openshift-adp restore-test-1-sk7lg Completed 7m11s 108104082 108104082 dpa-sample-1 7m11s ip-10-0-150-57.us-west-2.compute.internalEnter the following command to check the value of the
field of the specificstatus.phaseobject:DataDownload$ oc get datadownloads <datadownload_name> -o yamlExample output
apiVersion: velero.io/v2alpha1 kind: DataDownload metadata: name: restore-test-1-sk7lg namespace: openshift-adp spec: backupStorageLocation: dpa-sample-1 operationTimeout: 10m0s snapshotID: 8da1c5febf25225f4577ada2aeb9f899 sourceNamespace: mysql-persistent targetVolume: namespace: mysql-persistent pv: "" pvc: mysql status: completionTimestamp: "2023-11-02T17:01:24Z" node: ip-10-0-150-57.us-west-2.compute.internal phase: Completed progress: bytesDone: 108104082 totalBytes: 108104082 startTimestamp: "2023-11-02T17:00:52Z"where:
phase: Completed- Indicates that the CSI snapshot data is successfully restored.
4.22.2.3. Deletion policy for OADP 1.3 Copier lienLien copié sur presse-papiers!
The deletion policy determines rules for removing data from a system, specifying when and how deletion occurs based on factors such as retention periods, data sensitivity, and compliance requirements. It manages data removal effectively while meeting regulations and preserving valuable information.
4.22.2.3.1. Deletion policy guidelines for OADP 1.3 Copier lienLien copié sur presse-papiers!
Review the following deletion policy guidelines for the OADP 1.3:
-
In OADP 1.3.x, when using any type of backup and restore methods, you can set the field to
deletionPolicyorRetainin theDeletecustom resource (CR).VolumeSnapshotClass
4.22.3. Overriding Kopia hashing, encryption, and splitter algorithms Copier lienLien copié sur presse-papiers!
Override the default values of Kopia hashing, encryption, and splitter algorithms by using specific environment variables in the Data Protection Application (DPA).
4.22.3.1. Configuring the DPA to override Kopia hashing, encryption, and splitter algorithms Copier lienLien copié sur presse-papiers!
Configure the Data Protection Application (DPA) to override the default Kopia hashing, encryption, and splitter algorithms by setting environment variables in the Velero pod configuration. This helps you improve Kopia performance and compare performance metrics for your backup operations.
The configuration of the Kopia algorithms for splitting, hashing, and encryption in the Data Protection Application (DPA) apply only during the initial Kopia repository creation, and cannot be changed later.
To use different Kopia algorithms, ensure that the object storage does not contain any previous Kopia repositories of backups. Configure a new object storage in the Backup Storage Location (BSL) or specify a unique prefix for the object storage in the BSL configuration.
Prerequisites
- You have installed the OADP Operator.
- You have created the secret by using the credentials provided by the cloud provider.
Procedure
Configure the DPA with the environment variables for hashing, encryption, and splitter as shown in the following example.
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication #... configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - csi defaultSnapshotMoveData: true podConfig: env: - name: KOPIA_HASHING_ALGORITHM value: <hashing_algorithm_name> - name: KOPIA_ENCRYPTION_ALGORITHM value: <encryption_algorithm_name> - name: KOPIA_SPLITTER_ALGORITHM value: <splitter_algorithm_name>where:
enable-
Set to
trueto enable thenodeAgent. uploaderType-
Specifies the uploader type as
kopia. csi-
Include the
csiplugin. <hashing_algorithm_name>-
Specifies a hashing algorithm. For example,
BLAKE3-256. <encryption_algorithm_name>-
Specifies an encryption algorithm. For example,
CHACHA20-POLY1305-HMAC-SHA256. <splitter_algorithm_name>-
Specifies a splitter algorithm. For example,
DYNAMIC-8M-RABINKARP.
4.22.3.2. Use case for overriding Kopia hashing, encryption, and splitter algorithms Copier lienLien copié sur presse-papiers!
Back up an application by using Kopia environment variables for hashing, encryption, and splitter. Store the backup in an AWS S3 bucket and verify the environment variables by connecting to the Kopia repository.
Prerequisites
- You have installed the OADP Operator.
- You have an AWS S3 bucket configured as the backup storage location.
- You have created the secret by using the credentials provided by the cloud provider.
- You have installed the Kopia client.
- You have an application with persistent volumes running in a separate namespace.
Procedure
Configure the Data Protection Application (DPA) as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> namespace: openshift-adp spec: backupLocations: - name: aws velero: config: profile: default region: <region_name> credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: <bucket_name> prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - csi defaultSnapshotMoveData: true podConfig: env: - name: KOPIA_HASHING_ALGORITHM value: BLAKE3-256 - name: KOPIA_ENCRYPTION_ALGORITHM value: CHACHA20-POLY1305-HMAC-SHA256 - name: KOPIA_SPLITTER_ALGORITHM value: DYNAMIC-8M-RABINKARPwhere:
<dpa_name>- Specifies a name for the DPA.
<region_name>- Specifies the region for the backup storage location.
cloud-credentials-
Specifies the name of the default
Secretobject. <bucket_name>- Specifies the AWS S3 bucket name.
csi-
Include the
csiplugin. BLAKE3-256-
Specifies the hashing algorithm as
BLAKE3-256. CHACHA20-POLY1305-HMAC-SHA256-
Specifies the encryption algorithm as
CHACHA20-POLY1305-HMAC-SHA256. DYNAMIC-8M-RABINKARP-
Specifies the splitter algorithm as
DYNAMIC-8M-RABINKARP.
Create the DPA by running the following command:
$ oc create -f <dpa_file_name>Replace
with the file name of the DPA you configured.<dpa_file_name>Verify that the DPA has reconciled by running the following command:
$ oc get dpa -o yamlCreate a backup CR as shown in the following example:
apiVersion: velero.io/v1 kind: Backup metadata: name: test-backup namespace: openshift-adp spec: includedNamespaces: - <application_namespace> defaultVolumesToFsBackup: trueReplace
with the namespace for the application installed in the cluster.<application_namespace>Create a backup by running the following command:
$ oc apply -f <backup_file_name>Replace
with the name of the backup CR file.<backup_file_name>Verify that the backup completed by running the following command:
$ oc get backups.velero.io <backup_name> -o yamlReplace
with the name of the backup.<backup_name>
Verification
Connect to the Kopia repository by running the following command:
$ kopia repository connect s3 \ --bucket=<bucket_name> \ --prefix=velero/kopia/<application_namespace> \ --password=static-passw0rd \ --access-key="<aws_s3_access_key>" \ --secret-access-key="<aws_s3_secret_access_key>"where:
<bucket_name>- Specifies the AWS S3 bucket name.
<application_namespace>- Specifies the namespace for the application.
static-passw0rd- This is the Kopia password to connect to the repository.
<aws_s3_access_key>- Specifies the AWS S3 access key.
<aws_s3_secret_access_key>- Specifies the AWS S3 storage provider secret access key.
If you are using a storage provider other than AWS S3, you will need to add
, the bucket endpoint URL parameter, to the command.--endpointVerify that Kopia uses the environment variables that are configured in the DPA for the backup by running the following command:
$ kopia repository statusExample output
Hash: BLAKE3-256 Encryption: CHACHA20-POLY1305-HMAC-SHA256 Splitter: DYNAMIC-8M-RABINKARP Format version: 3
4.22.3.3. Benchmarking Kopia hashing, encryption, and splitter algorithms Copier lienLien copié sur presse-papiers!
Run Kopia commands to benchmark the hashing, encryption, and splitter algorithms. Based on the benchmarking results, you can select the most suitable algorithm for your workload. You run the Kopia benchmarking commands from a pod on the cluster. The benchmarking results can vary depending on CPU speed, available RAM, disk speed, current I/O load, and so on.
The configuration of the Kopia algorithms for splitting, hashing, and encryption in the Data Protection Application (DPA) apply only during the initial Kopia repository creation, and cannot be changed later.
To use different Kopia algorithms, ensure that the object storage does not contain any previous Kopia repositories of backups. Configure a new object storage in the Backup Storage Location (BSL) or specify a unique prefix for the object storage in the BSL configuration.
Prerequisites
- You have installed the OADP Operator.
- You have an application with persistent volumes running in a separate namespace.
- You have run a backup of the application with Container Storage Interface (CSI) snapshots.
Procedure
Configure the
pod as shown in the following example. Make sure you are using themust-gatherimage for OADP version 1.3 and later.oadp-mustgatherExample pod configuration
apiVersion: v1 kind: Pod metadata: name: oadp-mustgather-pod labels: purpose: user-interaction spec: containers: - name: oadp-mustgather-container image: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3 command: ["sleep"] args: ["infinity"]The Kopia client is available in the
image.oadp-mustgatherCreate the pod by running the following command:
$ oc apply -f <pod_config_file_name>Replace
with the name of the YAML file for the pod configuration.<pod_config_file_name>Verify that the Security Context Constraints (SCC) on the pod is
, so that Kopia can connect to the repository.anyuid$ oc describe pod/oadp-mustgather-pod | grep sccExample output
openshift.io/scc: anyuidConnect to the pod via SSH by running the following command:
$ oc -n openshift-adp rsh pod/oadp-mustgather-podConnect to the Kopia repository by running the following command:
sh-5.1# kopia repository connect s3 \ --bucket=<bucket_name> \ --prefix=velero/kopia/<application_namespace> \ --password=static-passw0rd \ --access-key="<access_key>" \ --secret-access-key="<secret_access_key>" \ --endpoint=<bucket_endpoint>where:
<bucket_name>- Specifies the object storage provider bucket name.
<application_namespace>- Specifies the namespace for the application.
static-passw0rd- This is the Kopia password to connect to the repository.
<access_key>- Specifies the object storage provider access key.
<secret_access_key>- Specifies the object storage provider secret access key.
<bucket_endpoint>- Specifies the bucket endpoint. You do not need to specify the bucket endpoint, if you are using AWS S3 as the storage provider.
This is an example command. The command can vary based on the object storage provider.
To benchmark the hashing algorithm, run the following command:
sh-5.1# kopia benchmark hashingExample output
Benchmarking hash 'BLAKE2B-256' (100 x 1048576 bytes, parallelism 1) Benchmarking hash 'BLAKE2B-256-128' (100 x 1048576 bytes, parallelism 1) Fastest option for this machine is: --block-hash=BLAKE3-256To benchmark the encryption algorithm, run the following command:
sh-5.1# kopia benchmark encryptionExample output
Benchmarking encryption 'AES256-GCM-HMAC-SHA256' Benchmarking encryption 'CHACHA20-POLY1305-HMAC-SHA256' Fastest option for this machine is: --encryption=AES256-GCM-HMAC-SHA256To benchmark the splitter algorithm, run the following command:
sh-5.1# kopia benchmark splitterExample output
splitting 16 blocks of 32MiB each, parallelism 1 DYNAMIC 747.6 MB/s count:107 min:9467 10th:2277562 25th:2971794 50th:4747177 75th:7603998 90th:8388608 max:8388608 DYNAMIC-128K-BUZHASH 718.5 MB/s count:3183 min:3076 10th:80896 25th:104312 50th:157621 75th:249115 90th:262144 max:262144 DYNAMIC-128K-RABINKARP 164.4 MB/s count:3160 min:9667 10th:80098 25th:106626 50th:162269 75th:250655 90th:262144 max:262144
4.23. APIs used with OADP Copier lienLien copié sur presse-papiers!
You can use the following APIs with OADP:
- Velero API
- Velero API documentation is maintained by Velero and is not maintained by Red Hat. For more information, see API types (Velero documentation).
- OADP API
The following are the OADP APIs:
-
DataProtectionApplicationSpec -
BackupLocation -
SnapshotLocation -
ApplicationConfig -
VeleroConfig -
CustomPlugin -
ResticConfig -
PodConfig -
Features DataMoverFor more information, see in OADP Operator(Go documentation).
-
4.23.1. DataProtectionApplicationSpec type Copier lienLien copié sur presse-papiers!
The following are
DataProtectionApplicationSpec
| Property | Type | Description |
|---|---|---|
|
| Defines the list of configurations to use for
| |
|
| Defines the list of configurations to use for
| |
|
| map [ UnsupportedImageKey ] string | Can be used to override the deployed dependent images for development. Options are
|
|
| Used to add annotations to pods deployed by Operators. | |
|
| Defines the configuration of the DNS of a pod. | |
|
| Defines the DNS parameters of a pod in addition to those generated from
| |
|
| *bool | Used to specify whether or not you want to deploy a registry for enabling backup and restore of images. |
|
| Used to define the data protection application’s server configuration. | |
|
| Defines the configuration for the DPA to enable the Technology Preview features. |
4.23.2. BackupLocation type Copier lienLien copié sur presse-papiers!
The following are
BackupLocation
| Property | Type | Description |
|---|---|---|
|
| Location to store volume snapshots, as described in Backup Storage Location. | |
|
| Automates creation of a bucket at some cloud storage providers for use as a backup storage location. |
The
bucket
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
4.23.3. SnapshotLocation type Copier lienLien copié sur presse-papiers!
The following are
SnapshotLocation
| Property | Type | Description |
|---|---|---|
|
| Location to store volume snapshots, as described in Volume Snapshot Location. |
4.23.4. ApplicationConfig type Copier lienLien copié sur presse-papiers!
The following are
ApplicationConfig
| Property | Type | Description |
|---|---|---|
|
| Defines the configuration for the Velero server. | |
|
| Defines the configuration for the Restic server. |
4.23.5. VeleroConfig type Copier lienLien copié sur presse-papiers!
The following are
VeleroConfig
| Property | Type | Description |
|---|---|---|
|
| [] string | Defines the list of features to enable for the Velero instance. |
|
| [] string | The following types of default Velero plugins can be installed:
|
|
| Used for installation of custom Velero plugins. | |
|
| Represents a config map that is created if defined for use in conjunction with the
| |
|
| To install Velero without a default backup storage location, you must set the
| |
|
| Defines the configuration of the
| |
|
| Velero server’s log level (use
|
4.23.6. CustomPlugin type Copier lienLien copié sur presse-papiers!
The following are
CustomPlugin
4.23.7. ResticConfig type Copier lienLien copié sur presse-papiers!
The following are
ResticConfig
| Property | Type | Description |
|---|---|---|
|
| *bool | If set to
|
|
| []int64 | Defines the Linux groups to be applied to the
|
|
| A user-supplied duration string that defines the Restic timeout. Default value is
| |
|
| Defines the configuration of the
|
4.23.8. PodConfig type Copier lienLien copié sur presse-papiers!
The following are
PodConfig
| Property | Type | Description |
|---|---|---|
|
| Defines the
| |
|
| Defines the list of tolerations to be applied to a Velero deployment or a Restic
| |
|
| Set specific resource
| |
|
| Labels to add to pods. |
4.23.9. Features type Copier lienLien copié sur presse-papiers!
The following are
Features
| Property | Type | Description |
|---|---|---|
|
| Defines the configuration of the Data Mover. |
4.23.10. DataMover type Copier lienLien copié sur presse-papiers!
The following are
DataMover
| Property | Type | Description |
|---|---|---|
|
| If set to
| |
|
| User-supplied Restic
| |
|
| A user-supplied duration string for
|
4.24. Advanced OADP features and functionalities Copier lienLien copié sur presse-papiers!
This document provides information about advanced features and functionalities of OpenShift API for Data Protection (OADP).
4.24.1. Working with different Kubernetes API versions on the same cluster Copier lienLien copié sur presse-papiers!
4.24.1.1. Listing the Kubernetes API group versions on a cluster Copier lienLien copié sur presse-papiers!
A source cluster might offer multiple versions of an API, where one of these versions is the preferred API version. For example, a source cluster with an API named
Example
example.com/v1
example.com/v1beta2
If you use Velero to back up and restore such a source cluster, Velero backs up only the version of that resource that uses the preferred version of its Kubernetes API.
To return to the above example, if
example.com/v1
example.com/v1
example.com/v1
Therefore, you need to generate a list of the Kubernetes API group versions on your target cluster to be sure the preferred API version is registered in its set of available API resources.
Procedure
- Enter the following command:
$ oc api-resources
4.24.1.2. About Enable API Group Versions Copier lienLien copié sur presse-papiers!
By default, Velero only backs up resources that use the preferred version of the Kubernetes API. However, Velero also includes a feature, Enable API Group Versions, that overcomes this limitation. When enabled on the source cluster, this feature causes Velero to back up all Kubernetes API group versions that are supported on the cluster, not only the preferred one. After the versions are stored in the backup .tar file, they are available to be restored on the destination cluster.
For example, a source cluster with an API named
Example
example.com/v1
example.com/v1beta2
example.com/v1
Without the Enable API Group Versions feature enabled, Velero backs up only the preferred API group version for
Example
example.com/v1
example.com/v1beta2
When the Enable API Group Versions feature is enabled on the destination cluster, Velero selects the version to restore on the basis of the order of priority of API group versions.
Enable API Group Versions is still in beta.
Velero uses the following algorithm to assign priorities to API versions, with
1
- Preferred version of the destination cluster
- Preferred version of the source_ cluster
- Common non-preferred supported version with the highest Kubernetes version priority
4.24.1.3. Using Enable API Group Versions Copier lienLien copié sur presse-papiers!
You can use Velero’s Enable API Group Versions feature to back up all Kubernetes API group versions that are supported on a cluster, not only the preferred one.
Enable API Group Versions is still in beta.
Procedure
-
Configure the feature flag:
EnableAPIGroupVersions
apiVersion: oadp.openshift.io/vialpha1
kind: DataProtectionApplication
...
spec:
configuration:
velero:
featureFlags:
- EnableAPIGroupVersions
4.24.2. Backing up data from one cluster and restoring it to another cluster Copier lienLien copié sur presse-papiers!
4.24.2.1. About backing up data from one cluster and restoring it on another cluster Copier lienLien copié sur presse-papiers!
OpenShift API for Data Protection (OADP) is designed to back up and restore application data in the same OpenShift Container Platform cluster. Migration Toolkit for Containers (MTC) is designed to migrate containers, including application data, from one OpenShift Container Platform cluster to another cluster.
You can use OADP to back up application data from one OpenShift Container Platform cluster and restore it on another cluster. However, doing so is more complicated than using MTC or using OADP to back up and restore on the same cluster.
To successfully use OADP to back up data from one cluster and restore it to another cluster, you must take into account the following factors, in addition to the prerequisites and procedures that apply to using OADP to back up and restore data on the same cluster:
- Operators
- Use of Velero
- UID and GID ranges
4.24.2.1.1. Operators Copier lienLien copié sur presse-papiers!
You must exclude Operators from the backup of an application for backup and restore to succeed.
4.24.2.1.2. Use of Velero Copier lienLien copié sur presse-papiers!
Velero, which OADP is built upon, does not natively support migrating persistent volume snapshots across cloud providers. To migrate volume snapshot data between cloud platforms, you must either enable the Velero Restic file system backup option, which backs up volume contents at the file system level, or use the OADP Data Mover for CSI snapshots.
In OADP 1.1 and earlier, the Velero Restic file system backup option is called
restic
file-system-backup
- You must also use Velero’s File System Backup to migrate data between AWS regions or between Microsoft Azure regions.
- Velero does not support restoring data to a cluster with an earlier Kubernetes version than the source cluster.
- It is theoretically possible to migrate workloads to a destination with a later Kubernetes version than the source, but you must consider the compatibility of API groups between clusters for each custom resource. If a Kubernetes version upgrade breaks the compatibility of core or native API groups, you must first update the impacted custom resources.
4.24.2.2. About determining which pod volumes to back up Copier lienLien copié sur presse-papiers!
Before you start a backup operation by using File System Backup (FSB), you must specify which pods contain a volume that you want to back up. Velero refers to this process as "discovering" the appropriate pod volumes.
Velero supports two approaches for determining pod volumes. Use the opt-in or the opt-out approach to allow Velero to decide between an FSB, a volume snapshot, or a Data Mover backup.
- Opt-in approach: With the opt-in approach, volumes are backed up using snapshot or Data Mover by default. FSB is used on specific volumes that are opted-in by annotations.
- Opt-out approach: With the opt-out approach, volumes are backed up using FSB by default. Snapshots or Data Mover is used on specific volumes that are opted-out by annotations.
4.24.2.2.1. Limitations Copier lienLien copié sur presse-papiers!
-
FSB does not support backing up and restoring volumes. However, FSB does support backing up and restoring local volumes.
hostpath - Velero uses a static, common encryption key for all backup repositories it creates. This static key means that anyone who can access your backup storage can also decrypt your backup data. It is essential that you limit access to backup storage.
For PVCs, every incremental backup chain is maintained across pod reschedules.
For pod volumes that are not PVCs, such as
volumes, if a pod is deleted or recreated, for example, by aemptyDiror a deployment, the next backup of those volumes will be a full backup and not an incremental backup. It is assumed that the lifecycle of a pod volume is defined by its pod.ReplicaSet- Even though backup data can be kept incrementally, backing up large files, such as a database, can take a long time. This is because FSB uses deduplication to find the difference that needs to be backed up.
- FSB reads and writes data from volumes by accessing the file system of the node on which the pod is running. For this reason, FSB can only back up volumes that are mounted from a pod and not directly from a PVC. Some Velero users have overcome this limitation by running a staging pod, such as a BusyBox or Alpine container with an infinite sleep, to mount these PVC and PV pairs before performing a Velero backup..
-
FSB expects volumes to be mounted under , with
<hostPath>/<pod UID>being configurable. Some Kubernetes systems, for example, vCluster, do not mount volumes under the<hostPath>subdirectory, and VFSB does not work with them as expected.<pod UID>
4.24.2.2.2. Backing up pod volumes by using the opt-in method Copier lienLien copié sur presse-papiers!
You can use the opt-in method to specify which volumes need to be backed up by File System Backup (FSB). You can do this by using the
backup.velero.io/backup-volumes
Procedure
On each pod that contains one or more volumes that you want to back up, enter the following command:
$ oc -n <your_pod_namespace> annotate pod/<your_pod_name> \ backup.velero.io/backup-volumes=<your_volume_name_1>, \ <your_volume_name_2>>,...,<your_volume_name_n>where:
<your_volume_name_x>- specifies the name of the xth volume in the pod specification.
4.24.2.2.3. Backing up pod volumes by using the opt-out method Copier lienLien copié sur presse-papiers!
When using the opt-out approach, all pod volumes are backed up by using File System Backup (FSB), although there are some exceptions:
- Volumes that mount the default service account token, secrets, and configuration maps.
-
volumes
hostPath
You can use the opt-out method to specify which volumes not to back up. You can do this by using the
backup.velero.io/backup-volumes-excludes
Procedure
On each pod that contains one or more volumes that you do not want to back up, run the following command:
$ oc -n <your_pod_namespace> annotate pod/<your_pod_name> \ backup.velero.io/backup-volumes-excludes=<your_volume_name_1>, \ <your_volume_name_2>>,...,<your_volume_name_n>where:
<your_volume_name_x>- specifies the name of the xth volume in the pod specification.
You can enable this behavior for all Velero backups by running the
velero install
--default-volumes-to-fs-backup
4.24.2.3. UID and GID ranges Copier lienLien copié sur presse-papiers!
If you back up data from one cluster and restore it to another cluster, problems might occur with UID (User ID) and GID (Group ID) ranges. The following section explains these potential issues and mitigations:
- Summary of the issues
- The namespace UID and GID ranges might change depending on the destination cluster. OADP does not back up and restore OpenShift UID range metadata. If the backed up application requires a specific UID, ensure the range is availableupon restore. For more information about OpenShift’s UID and GID ranges, see A Guide to OpenShift and UIDs.
- Detailed description of the issues
When you create a namespace in OpenShift Container Platform by using the shell command
, OpenShift Container Platform assigns the namespace a unique User ID (UID) range from its available pool of UIDs, a Supplemental Group (GID) range, and unique SELinux MCS labels. This information is stored in theoc create namespacefield of the cluster. This information is part of the Security Context Constraints (SCC) annotations, which comprise of the following components:metadata.annotations-
openshift.io/sa.scc.mcs -
openshift.io/sa.scc.supplemental-groups -
openshift.io/sa.scc.uid-range
-
When you use OADP to restore the namespace, it automatically uses the information in
metadata.annotations
- There is an existing namespace with other SCC annotations, for example, on another cluster. In this case, OADP uses the existing namespace during the backup instead of the namespace you want to restore.
A label selector was used during the backup, but the namespace in which the workloads are executed does not have the label. In this case, OADP does not back up the namespace, but creates a new namespace during the restore that does not contain the annotations of the backed up namespace. This results in a new UID range being assigned to the namespace.
This can be an issue for customer workloads if OpenShift Container Platform assigns a pod a
UID to a pod based on namespace annotations that have changed since the persistent volume data was backed up.securityContext- The UID of the container no longer matches the UID of the file owner.
An error occurs because OpenShift Container Platform has not changed the UID range of the destination cluster to match the backup cluster data. As a result, the backup cluster has a different UID than the destination cluster, which means that the application cannot read or write data on the destination cluster.
- Mitigations
- You can use one or more of the following mitigations to resolve the UID and GID range issues:
Simple mitigations:
-
If you use a label selector in the CR to filter the objects to include in the backup, be sure to add this label selector to the namespace that contains the workspace.
Backup - Remove any pre-existing version of a namespace on the destination cluster before attempting to restore a namespace with the same name.
-
If you use a label selector in the
Advanced mitigations:
- Fix UID ranges after migration by Resolving overlapping UID ranges in OpenShift namespaces after migration. Step 1 is optional.
For an in-depth discussion of UID and GID ranges in OpenShift Container Platform with an emphasis on overcoming issues in backing up data on one cluster and restoring it on another, see A Guide to OpenShift and UIDs.
4.24.2.4. Backing up data from one cluster and restoring it to another cluster Copier lienLien copié sur presse-papiers!
In general, you back up data from one OpenShift Container Platform cluster and restore it on another OpenShift Container Platform cluster in the same way that you back up and restore data to the same cluster. However, there are some additional prerequisites and differences in the procedure when backing up data from one OpenShift Container Platform cluster and restoring it on another.
Prerequisites
- All relevant prerequisites for backing up and restoring on your platform (for example, AWS, Microsoft Azure, Google Cloud, and so on), especially the prerequisites for the Data Protection Application (DPA), are described in the relevant sections of this guide.
Procedure
Make the following additions to the procedures given for your platform:
- Ensure that the backup store location (BSL) and volume snapshot location have the same names and paths to restore resources to another cluster.
- Share the same object storage location credentials across the clusters.
- For best results, use OADP to create the namespace on the destination cluster.
If you use the Velero
option, enable thefile-system-backupflag for use during backup by running the following command:--default-volumes-to-fs-backup$ velero backup create <backup_name> --default-volumes-to-fs-backup <any_other_options>NoteIn OADP 1.2 and later, the Velero Restic option is called
.file-system-backup
Before restoring a CSI back up, edit the
VolumeSnapshotClass
snapshot.storage.kubernetes.io/is-default-class parameter
VolumeSnapshotClass
4.24.3. OADP storage class mapping Copier lienLien copié sur presse-papiers!
4.24.3.1. Storage class mapping Copier lienLien copié sur presse-papiers!
Storage class mapping allows you to define rules or policies specifying which storage class should be applied to different types of data. This feature automates the process of determining storage classes based on access frequency, data importance, and cost considerations. It optimizes storage efficiency and cost-effectiveness by ensuring that data is stored in the most suitable storage class for its characteristics and usage patterns.
You can use the
change-storage-class-config
4.24.3.1.1. Storage class mapping with Migration Toolkit for Containers Copier lienLien copié sur presse-papiers!
You can use the Migration Toolkit for Containers (MTC) to migrate containers, including application data, from one OpenShift Container Platform cluster to another cluster and for storage class mapping and conversion. You can convert the storage class of a persistent volume (PV) by migrating it within the same cluster. To do so, you must create and run a migration plan in the MTC web console.
4.24.3.1.2. Mapping storage classes with OADP Copier lienLien copié sur presse-papiers!
You can use OpenShift API for Data Protection (OADP) with the Velero plugin v1.1.0 and later to change the storage class of a persistent volume (PV) during restores, by configuring a storage class mapping in the config map in the Velero namespace.
To deploy ConfigMap with OADP, use the
change-storage-class-config
Procedure
Change the storage class mapping by running the following command:
$ cat change-storageclass.yamlCreate a config map in the Velero namespace as shown in the following example:
Example
apiVersion: v1 kind: ConfigMap metadata: name: change-storage-class-config namespace: openshift-adp labels: velero.io/plugin-config: "" velero.io/change-storage-class: RestoreItemAction data: standard-csi: ssd-csiSave your storage class mapping preferences by running the following command:
$ oc create -f change-storage-class-config
4.25. OADP troubleshooting Copier lienLien copié sur presse-papiers!
4.25.1. Troubleshooting Copier lienLien copié sur presse-papiers!
Troubleshoot OpenShift API for Data Protection (OADP) issues by using diagnostic tools such as the Velero CLI, webhooks,
must-gather
You can troubleshoot OADP issues by using the following methods:
- Debug Velero custom resources (CRs) by using the OpenShift CLI tool or the Velero CLI tool. The Velero CLI tool provides more detailed logs and information.
- Debug Velero or Restic pod crashes, which are caused due to a lack of memory or CPU by using Pods crash or restart due to lack of memory or CPU.
- Debug issues with Velero and admission webhooks by using Issues with Velero and admission webhooks.
- Check OADP installation issues, OADP Operator issues, backup and restore CR issues, and Restic issues.
- Use the available OADP timeouts to reduce errors, retries, or failures.
-
Collect logs and CR information by using the
must-gathertool. - Monitor and analyze the workload performance with the help of OADP monitoring.
4.25.2. Velero CLI tool Copier lienLien copié sur presse-papiers!
Download the
velero
velero
Backup
Restore
4.25.2.1. Downloading the Velero CLI tool Copier lienLien copié sur presse-papiers!
Download and install the
velero
velero
Prerequisites
- You have access to a Kubernetes cluster, v1.16 or later, with DNS and container networking enabled.
-
You have installed locally.
kubectl
Procedure
- Open a browser and navigate to "Install the CLI" on the Velero website.
- Follow the appropriate procedure for macOS, GitHub, or Windows.
- Download the Velero version appropriate for your version of OADP and OpenShift Container Platform.
4.25.2.1.1. OADP-Velero-OpenShift Container Platform version relationship Copier lienLien copié sur presse-papiers!
Review the version relationship between OADP, Velero, and OpenShift Container Platform to decide compatible version combinations. This helps you select the appropriate OADP version for your cluster environment.
4.25.2.2. Accessing the Velero binary in the Velero deployment in the cluster Copier lienLien copié sur presse-papiers!
Use a shell command to access the Velero binary in the Velero deployment in the cluster.
Prerequisites
-
Your custom resource has a status of
DataProtectionApplication.Reconcile complete
Procedure
Enter the following command to set the needed alias:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'
4.25.2.3. Debugging Velero resources with the OpenShift CLI tool Copier lienLien copié sur presse-papiers!
Debug a failed backup or restore by checking Velero custom resources (CRs) and the
Velero
Procedure
Retrieve a summary of warnings and errors associated with a
orBackupCR by using the followingRestorecommand:oc describe$ oc describe <velero_cr> <cr_name>Retrieve the
pod logs by using the followingVelerocommand:oc logs$ oc logs pod/<velero>Specify the Velero log level in the
resource as shown in the following example.DataProtectionApplicationNoteThis option is available starting from OADP 1.0.3.
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: velero-sample spec: configuration: velero: logLevel: warningThe following
values are available:logLevel-
trace -
debug -
info -
warning -
error -
fatal panicUse the
infovalue for most logs.logLevel
-
4.25.2.4. Debugging Velero resources with the Velero CLI tool Copier lienLien copié sur presse-papiers!
Debug
Backup
Restore
Procedure
Use the
command to run a Velero CLI command:oc exec$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ <backup_restore_cr> <command> <cr_name>Example
oc execcommand$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8qlList all Velero CLI commands by using the following
option:velero --help$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ --helpRetrieve the logs of a
orBackupCR by using the followingRestorecommand:velero logs$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ <backup_restore_cr> logs <cr_name>Example
velero logscommand$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbfRetrieve a summary of warnings and errors associated with a
orBackupCR by using the followingRestorecommand:velero describe$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ <backup_restore_cr> describe <cr_name>Example
velero describecommand$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \ backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8qlThe following types of restore errors and warnings are shown in the output of a
request:velero describe-
: A list of messages related to the operation of Velero itself, for example, messages related to connecting to the cloud, reading a backup file, and so on
Velero -
: A list of messages related to backing up or restoring cluster-scoped resources
Cluster - : A list of list of messages related to backing up or restoring resources stored in namespaces
NamespacesOne or more errors in one of these categories results in a
operation receiving the status ofRestoreand notPartiallyFailed. Warnings do not lead to a change in the completion status.CompletedConsider the following points for these restore errors:
-
For resource-specific errors, that is, and
Clustererrors, theNamespacesoutput includes a resource list that includes all resources that Velero restored. For any resource that has such an error, check if the resource is actually in the cluster.restore describe --details -
For resource-specific errors, that is, and
Clustererrors, theNamespacesoutput includes a resource list that includes all resources that Velero restored. For any resource that has such an error, check if the resource is actually in the cluster.restore describe --details If there are
errors but no resource-specific errors in the output of aVelerocommand, it is possible that the restore completed without any actual problems in restoring workloads. In this case, carefully validate post-restore applications.describeFor example, if the output contains
or node agent-related errors, check the status ofPodVolumeRestoreandPodVolumeRestores. If none of these are failed or still running, then volume data might have been fully restored.DataDownloads
4.25.3. Pods crash or restart due to lack of memory or CPU Copier lienLien copié sur presse-papiers!
Resolve Velero or Restic pod crashes caused by insufficient memory or CPU by configuring resource requests in the
DataProtectionApplication
Ensure that the values for the resource request fields follow the same format as Kubernetes resource requirements.
If you do not specify
configuration.velero.podConfig.resourceAllocations
configuration.restic.podConfig.resourceAllocations
resources
requests:
cpu: 500m
memory: 128Mi
4.25.3.1. Setting resource requests for a Velero pod Copier lienLien copié sur presse-papiers!
Use the
configuration.velero.podConfig.resourceAllocations
oadp_v1alpha1_dpa.yaml
Velero
Procedure
Set the
andcpuresource requests as shown in the following example:memoryapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... configuration: velero: podConfig: resourceAllocations: requests: cpu: 200m memory: 256MiThe
listed are for average usage.resourceAllocations
4.25.3.2. Setting resource requests for a Restic pod Copier lienLien copié sur presse-papiers!
Use the
configuration.restic.podConfig.resourceAllocations
Restic
Procedure
Set the
andcpuresource requests as shown in the following example:memoryapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication ... configuration: restic: podConfig: resourceAllocations: requests: cpu: 1000m memory: 16GiThe
listed are for average usage.resourceAllocations
4.25.4. Restoring workarounds for Velero backups that use admission webhooks Copier lienLien copié sur presse-papiers!
Resolve restore failures caused by admission webhooks by applying workarounds for workloads such as Knative and IBM AppConnect resources. This helps you to successfully restore workloads that have mutating or validating admission webhooks.
Velero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload. Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
For example, creating or restoring a top-level object such as
service.serving.knative.dev
Velero plugins are started as separate processes. After a Velero operation has completed, either successfully or not, it exits. Receiving a
received EOF, stopping recv loop
4.25.4.1. Restoring Knative resources Copier lienLien copié sur presse-papiers!
Resolve issues with restoring Knative resources that use admission webhooks by restoring the top-level
service.serving.knative.dev
Procedure
Restore the top level
resource by using the following command:service.serving.knative.dev Service$ velero restore <restore_name> \ --from-backup=<backup_name> --include-resources \ service.serving.knative.dev
4.25.4.2. Restoring IBM AppConnect resources Copier lienLien copié sur presse-papiers!
Troubleshoot Velero restore failures for IBM® AppConnect resources that use admission webhooks. Verify your webhook rules and check that the installed Operator supports the backup’s version to successfully complete the restore.
Procedure
Check if you have any mutating admission plugins of
in the cluster:kind: MutatingWebhookConfiguration$ oc get mutatingwebhookconfigurations-
Examine the YAML file of each to ensure that none of its rules block creation of the objects that are experiencing issues. For more information, see the official Kubernetes documentation.
kind: MutatingWebhookConfiguration -
Check that any in
spec.versionused at backup time is supported by the installed Operator.type: Configuration.appconnect.ibm.com/v1beta1
4.25.4.3. Avoiding the Velero plugin panic error Copier lienLien copié sur presse-papiers!
Label a custom Backup Storage Location (BSL) to resolve Velero plugin panic errors during
imagestream
DataProtectionApplication
A missing secret can cause a panic error for the Velero plugin during image stream backups. When the backup and the BSL are managed outside the scope of the DPA, the OADP controller does not create the relevant
oadp-<bsl_name>-<bsl_provider>-registry-secret
During the backup operation, the OpenShift Velero plugin panics on the
imagestream
024-02-27T10:46:50.028951744Z time="2024-02-27T10:46:50Z" level=error msg="Error backing up item"
backup=openshift-adp/<backup name> error="error executing custom action (groupResource=imagestreams.image.openshift.io,
namespace=<BSL Name>, name=postgres): rpc error: code = Aborted desc = plugin panicked:
runtime error: index out of range with length 1, stack trace: goroutine 94…
Procedure
Label the custom BSL with the relevant label by using the following command:
$ oc label backupstoragelocations.velero.io <bsl_name> app.kubernetes.io/component=bslAfter the BSL is labeled, wait until the DPA reconciles.
NoteYou can force the reconciliation by making any minor change to the DPA itself.
Verification
After the DPA is reconciled, confirm that the parameter has been created and that the correct registry data has been populated into it by entering the following command:
$ oc -n openshift-adp get secret/oadp-<bsl_name>-<bsl_provider>-registry-secret -o json | jq -r '.data'
4.25.4.4. Workaround for OpenShift ADP Controller segmentation fault Copier lienLien copié sur presse-papiers!
Define either
velero
cloudstorage
openshift-adp-controller-manager
Define either
velero
cloudstorage
openshift-adp-controller-manager
-
If you define both and
velero, thecloudstoragefails.openshift-adp-controller-manager -
If you do not define both and
velero, thecloudstoragefails.openshift-adp-controller-manager
For more information about this issue, see OADP-1054.
4.25.5. OADP installation issues Copier lienLien copié sur presse-papiers!
Resolve common installation issues with the Data Protection Application (DPA), such as invalid backup storage directories and incorrect cloud provider credentials. This helps you successfully install and configure OADP in your environment.
4.25.5.1. Resolving invalid directories in backup storage Copier lienLien copié sur presse-papiers!
Resolve the
Backup storage contains invalid top-level directories
Procedure
-
If the object storage is not dedicated to Velero, you must specify a prefix for the bucket by setting the parameter in the
spec.backupLocations.velero.objectStorage.prefixmanifest.DataProtectionApplication
4.25.5.2. Resolving incorrect AWS credentials Copier lienLien copié sur presse-papiers!
Resolve credential errors such as
InvalidAccessKeyId
NoCredentialProviders
credentials-velero
If you incorrectly format the
credentials-velero
Secret
The
pod log displays the following error message:oadp-aws-registry`InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`The
pod log displays the following error message:VeleroNoCredentialProviders: no valid providers in chain.
Procedure
Ensure that the
file is correctly formatted, as shown in the following example:credentials-velero[default] aws_access_key_id=AKIAIOSFODNN7EXAMPLE aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYwhere:
[default]- Specifies the AWS default profile.
aws_access_key_id-
Do not enclose the values with quotation marks (
",').
4.25.6. OADP Operator issues Copier lienLien copié sur presse-papiers!
Resolve issues with the OpenShift API for Data Protection (OADP) Operator, such as silent failures that prevent proper operation. This helps you restore normal Operator functionality and ensure successful backup and restore operations.
4.25.6.1. Resolving silent failure of the OADP Operator Copier lienLien copié sur presse-papiers!
Resolve the silent failure issue where the OADP Operator reports a
Running
To fix this issue, retrieve a list of backup storage locations (BSLs) and check the manifest of each BSL for credential issues.
Procedure
Retrieve a list of BSLs by using either the OpenShift or Velero command-line interface (CLI):
Retrieve a list of BSLs by using the OpenShift CLI (
):oc$ oc get backupstoragelocations.velero.io -ARetrieve a list of BSLs by using the
CLI:velero$ velero backup-location get -n <oadp_operator_namespace>
Use the list of BSLs from the previous step and run the following command to examine the manifest of each BSL for an error:
$ oc get backupstoragelocations.velero.io -n <namespace> -o yamlapiVersion: v1 items: - apiVersion: velero.io/v1 kind: BackupStorageLocation metadata: creationTimestamp: "2023-11-03T19:49:04Z" generation: 9703 name: example-dpa-1 namespace: openshift-adp-operator ownerReferences: - apiVersion: oadp.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: DataProtectionApplication name: example-dpa uid: 0beeeaff-0287-4f32-bcb1-2e3c921b6e82 resourceVersion: "24273698" uid: ba37cd15-cf17-4f7d-bf03-8af8655cea83 spec: config: enableSharedConfig: "true" region: us-west-2 credential: key: credentials name: cloud-credentials default: true objectStorage: bucket: example-oadp-operator prefix: example provider: aws status: lastValidationTime: "2023-11-10T22:06:46Z" message: "BackupStorageLocation \"example-dpa-1\" is unavailable: rpc error: code = Unknown desc = WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: d3f2e099-70a0-467b-997e-ff62345e3b54" phase: Unavailable kind: List metadata: resourceVersion: ""
4.25.7. OADP timeouts Copier lienLien copié sur presse-papiers!
Configure OADP timeout parameters for Restic, Velero, Data Mover, CSI snapshots, and item operations to allow complex or resource-intensive processes to complete successfully. This helps you reduce errors, retries, and failures caused by premature termination of backup and restore operations.
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
Review the following OADP timeout instructions:
4.25.7.1. Restic timeout Copier lienLien copié sur presse-papiers!
Configure the Restic timeout parameter to prevent backup failures for large persistent volumes or long-running backup operations. This helps you avoid timeout errors when backing up data greater than 500GB or when backups exceed the default one-hour limit.
Use the
spec.configuration.nodeAgent.timeout
1h
Use the Restic
timeout
nodeAgent
- For Restic backups with total PV data usage that is greater than 500GB.
If backups are timing out with the following error:
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete"
Procedure
Edit the values in the
block of thespec.configuration.nodeAgent.timeoutcustom resource (CR) manifest, as shown in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: configuration: nodeAgent: enable: true uploaderType: restic timeout: 1h # ...
4.25.7.2. Velero resource timeout Copier lienLien copié sur presse-papiers!
Configure the
resourceTimeout
DataProtectionApplication
Use the
resourceTimeout
For backups with total PV data usage that is greater than 1 TB. Use the parameter as a timeout value when Velero tries to clean up or delete the Container Storage Interface (CSI) snapshots, before marking the backup as complete.
- A sub-task of this cleanup tries to patch VSC, and this timeout can be used for that task.
- To create or ensure a backup repository is ready for filesystem based backups for Restic or Kopia.
- To check if the Velero CRD is available in the cluster before restoring the custom resource (CR) or resource from the backup.
Procedure
Edit the values in the
block of thespec.configuration.velero.resourceTimeoutCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: configuration: velero: resourceTimeout: 10m # ...
4.25.7.2.1. Velero default item operation timeout Copier lienLien copié sur presse-papiers!
Configure the
defaultItemOperationTimeout
The default value is
1h
Use the
defaultItemOperationTimeout
- Only with Data Mover 1.2.x.
-
When is defined in the Data Protection Application (DPA) using the
defaultItemOperationTimeout, it applies to both backup and restore operations. You can usedefaultItemOperationTimeoutto define only the backup or only the restore of those CRs, as described in the following "Item operation timeout - restore", and "Item operation timeout - backup" sections.itemOperationTimeout
Procedure
Edit the values in the
block of thespec.configuration.velero.defaultItemOperationTimeoutCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: configuration: velero: defaultItemOperationTimeout: 1h # ...
4.25.7.3. Data Mover timeout Copier lienLien copié sur presse-papiers!
Configure the Data Mover
timeout
DataProtectionApplication
VolumeSnapshotMover
10m
Use the Data Mover
timeout
-
If creation of (VSBs) and
VolumeSnapshotBackups(VSRs), times out after 10 minutes.VolumeSnapshotRestores -
For large scale environments with total PV data usage that is greater than 500GB. Set the timeout for .
1h -
With the (VSM) plugin.
VolumeSnapshotMover
Procedure
Edit the values in the
block of thespec.features.dataMover.timeoutCR manifest, as in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: features: dataMover: timeout: 10m # ...
4.25.7.4. CSI snapshot timeout Copier lienLien copié sur presse-papiers!
Configure the
CSISnapshotTimeout
Backup
10m
Typically, the default value for
CSISnapshotTimeout
Procedure
Edit the values in the
block of thespec.csiSnapshotTimeoutCR manifest, as in the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: <backup_name> spec: csiSnapshotTimeout: 10m # ...
4.25.7.5. Item operation timeout - restore Copier lienLien copié sur presse-papiers!
Configure the
ItemOperationTimeout
Restore
1h
Procedure
Edit the values in the
block of theRestore.spec.itemOperationTimeoutCR manifest, as in the following example:RestoreapiVersion: velero.io/v1 kind: Restore metadata: name: <restore_name> spec: itemOperationTimeout: 1h # ...
4.25.7.6. Item operation timeout - backup Copier lienLien copié sur presse-papiers!
Configure the
ItemOperationTimeout
Backup
BackupItemAction
1h
Procedure
Edit the values in the
block of theBackup.spec.itemOperationTimeoutCR manifest, as in the following example:BackupapiVersion: velero.io/v1 kind: Backup metadata: name: <backup_name> spec: itemOperationTimeout: 1h # ...
4.25.8. Backup and Restore CR issues Copier lienLien copié sur presse-papiers!
Resolve common issues with
Backup
Restore
4.25.8.1. Troubleshooting issue where backup CR cannot retrieve volume Copier lienLien copié sur presse-papiers!
Resolve the
InvalidVolume.NotFound
Backup
If the PV and the snapshot locations are in different regions, the
Backup
InvalidVolume.NotFound: The volume vol-xxxx does not exist.
Procedure
-
Edit the value of the key in the
spec.snapshotLocations.velero.config.regionmanifest so that the snapshot location is in the same region as the PV.DataProtectionApplication -
Create a new CR.
Backup
4.25.8.2. Troubleshooting issue where backup CR status remains in progress Copier lienLien copié sur presse-papiers!
Resolve the issue where an interrupted backup causes the
Backup
InProgress
Procedure
Retrieve the details of the
CR by running the following command:Backup$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \ backup describe <backup>Delete the
CR by running the following command:Backup$ oc delete backups.velero.io <backup> -n openshift-adpYou do not need to clean up the backup location because an in progress
CR has not uploaded files to object storage.Backup-
Create a new CR.
Backup View the Velero backup details by running the following command:
$ velero backup describe <backup_name> --details
4.25.8.3. Troubleshooting issue where backup CR status remains partially failed Copier lienLien copié sur presse-papiers!
Resolve the
PartiallyFailed
Backup
VolumeSnapshotClass
If the backup created based on the CSI snapshot class is missing a label, the CSI snapshot plugin fails to create a snapshot. As a result, the
Velero
time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=openshift-adp/user1-backup-check5 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=busy1, name=pvc1-user1): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass ocs-storagecluster-ceph-rbd: failed to get volumesnapshotclass for provisioner openshift-storage.rbd.csi.ceph.com, ensure that the desired volumesnapshot class has the velero.io/csi-volumesnapshot-class label" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=busybox-79799557b5-vprq
Procedure
Delete the
CR by running the following command::Backup$ oc delete backups.velero.io <backup> -n openshift-adp-
If required, clean up the stored data on the resource to free up space.
BackupStorageLocation Apply the
label to thevelero.io/csi-volumesnapshot-class=trueobject by running the following command:VolumeSnapshotClass$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=true-
Create a new CR.
Backup
4.25.9. Restic issues Copier lienLien copié sur presse-papiers!
Troubleshoot common Restic issues during application backups and restores to maintain reliable data protection. Common Restic issues include NFS permission errors, backup custom resource re-creation failures, and restore failures caused by pod security admission policy changes.
4.25.9.1. Troubleshooting Restic permission errors for NFS data volumes Copier lienLien copié sur presse-papiers!
Create a supplemental group and add its group ID to the
DataProtectionApplication
Restic
root_squash
If your NFS data volumes have the
root_squash
Restic
nfsnobody
Restic
controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied".
Procedure
-
Create a supplemental group for on the NFS data volume.
Restic -
Set the bit on the NFS directories so that group ownership is inherited.
setgid Add the
parameter and the group ID to thespec.configuration.nodeAgent.supplementalGroupsmanifest, as shown in the following example:DataProtectionApplicationapiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication # ... spec: configuration: nodeAgent: enable: true uploaderType: restic supplementalGroups: - <group_id> # ...where:
<group_id>- Specifies the supplemental group ID.
-
Wait for the pods to restart so that the changes are applied.
Restic
4.25.9.2. Troubleshooting Restic Backup CR issue that cannot be re-created after bucket is emptied Copier lienLien copié sur presse-papiers!
Resolve the
Backup
ResticRepository
For more information, see Velero issue 4421.
The
velero
stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?
Procedure
Remove the related Restic repository from the namespace by running the following command:
$ oc delete resticrepository openshift-adp <name_of_the_restic_repository>In the following error log,
is the problematic Restic repository. The name of the repository is displayed in italics for clarity.mysql-persistenttime="2021-12-29T18:29:14Z" level=info msg="1 errors encountered backup up item" backup=velero/backup65 logSource="pkg/backup/backup.go:431" name=mysql-7d99fc949-qbkds time="2021-12-29T18:29:14Z" level=error msg="Error backing up item" backup=velero/backup65 error="pod volume backup failed: error running restic backup, stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?\ns3:http://minio-minio.apps.mayap-oadp- veleo-1234.qe.devcluster.openshift.com/mayapvelerooadp2/velero1/ restic/mysql-persistent\n: exit status 1" error.file="/remote-source/ src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184" error.function="github.com/vmware-tanzu/velero/ pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkds
4.25.9.3. Troubleshooting restic restore partially failed issue on OpenShift Container Platform 4.14 onward due to changed PSA policy Copier lienLien copié sur presse-papiers!
Resolve a partial failure of Restic restore on OpenShift Container Platform 4.14 onward caused by Pod Security Admission (PSA) policy enforcement by adjusting the
restore-resource-priorities
DataProtectionApplication
SecurityContextConstraints
From 4.14 onward, OpenShift Container Platform enforces a PSA policy that can hinder the readiness of pods during a Restic restore process. If an SCC resource is not found when a pod is created, and the PSA policy on the pod is not set up to meet the required standards, pod admission is denied.
Review the following example error:
\"level=error\" in line#2273: time=\"2023-06-12T06:50:04Z\"
level=error msg=\"error restoring mysql-869f9f44f6-tp5lv: pods\\\
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity\\\
"restricted:v1.24\\\": privil eged (container \\\"mysql\\\
" must not set securityContext.privileged=true),
allowPrivilegeEscalation != false (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
"RuntimeDefault\\\" or \\\"Localhost\\\")\" logSource=\"/remote-source/velero/app/pkg/restore/restore.go:1388\" restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n
velero container contains \"level=error\" in line#2447: time=\"2023-06-12T06:50:05Z\"
level=error msg=\"Namespace todolist-mariadb,
resource restore error: error restoring pods/todolist-mariadb/mysql-869f9f44f6-tp5lv: pods \\\
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity \\\"restricted:v1.24\\\": privileged (container \\\
"mysql\\\" must not set securityContext.privileged=true),
allowPrivilegeEscalation != false (containers \\\
"restic-wait\\\",\\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
"RuntimeDefault\\\" or \\\"Localhost\\\")\"
logSource=\"/remote-source/velero/app/pkg/controller/restore_controller.go:510\"
restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n]",
Procedure
In your DPA custom resource (CR), check or set the
field on the Velero server to ensure thatrestore-resource-prioritiesis listed in order beforesecuritycontextconstraintsin the list of resources:pods$ oc get dpa -o yaml# ... configuration: restic: enable: true velero: args: restore-resource-priorities: 'securitycontextconstraints,customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io' defaultPlugins: - gcp - openshiftwhere:
restore-resource-priorities- If you have an existing restore resource priority list, ensure you combine that existing list with the complete list.
Ensure that the security standards for the application pods are aligned, as provided in Fixing PodSecurity Admission warnings for deployments, to prevent deployment warnings. If the application is not aligned with security standards, an error can occur regardless of the SCC.
NoteThis solution is temporary, and ongoing discussions are in progress to address it.
4.25.10. Using the must-gather tool Copier lienLien copié sur presse-papiers!
Collect logs and information about OADP custom resources by using the
must-gather
must-gather
The
must-gather
must-gather
4.25.10.1. Using the must-gather tool Copier lienLien copié sur presse-papiers!
Run the
must-gather
must-gather
- Default configuration
-
This configuration collects pod logs, OADP and
Velerocustom resource (CR) information for all namespaces where the OADP Operator is installed. - Timeout
-
Data collection can take a long time if there are many failed
BackupCRs. You can improve performance by setting a timeout value. - Insecure TLS connections
-
If a custom CA certificate is used, use the
must-gathertool with insecure TLS connections.
The
must-gather
Prerequisites
-
You have logged in to the OpenShift Container Platform cluster as a user with the role.
cluster-admin -
You have installed the OpenShift CLI ().
oc - You are using OADP 1.3 or 1.4.
Procedure
-
Navigate to the directory where you want to store the data.
must-gather Run the
command for one of the following data collection options:oc adm must-gatherTo use the default configuration of the
tool, run one of the following commands:must-gatherFor OADP 1.3, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3For OADP 1.4, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4
To use the timeout flag with the
tool, run one of the following commands:must-gatherFor OADP 1.3, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3 -- /usr/bin/gather --request-timeout <timeout>Replace <timeout> with a timeout value.
For OADP 1.4, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4 -- /usr/bin/gather --request-timeout 1mIn this example, the timeout is 1 minute.
To use the insecure TLS connection flag with the
tool, run one of the following commands:must-gatherFor OADP 1.3, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3 -- /usr/bin/gather --skip-tlsFor OADP 1.4, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4 -- /usr/bin/gather --skip-tls
To use a combination of the insecure TLS connection, and the timeout flags with the
tool, run one of the following commands:must-gatherFor OADP 1.3, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3 -- /usr/bin/gather --request-timeout 15s --skip-tls=trueBy default, the
flag value is--skip-tls. Set the value tofalseto allow insecure TLS connections. Specify a timeout value.trueFor OADP 1.4, run the following command:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4 -- /usr/bin/gather --request-timeout 15s --skip-tlsIn this example, the timeout is 15 seconds. By default, the
flag value is--skip-tls. Set the value tofalseto allow insecure TLS connections.true
Verification
-
Verify that the Markdown output file is generated at the following location:
must-gather.local.89…054550/registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.5-sha256-0…84/clusters/a4…86/oadp-must-gather-summary.md Review the
data in the Markdown file by opening the file in a Markdown previewer. For an example output, refer to the following image. You can upload this output file to a support case on the Red Hat Customer Portal.must-gatherFigure 4.1. Example markdown output of must-gather tool
4.25.11. OADP monitoring Copier lienLien copié sur presse-papiers!
Monitor OADP operations by using the OpenShift Container Platform monitoring stack to create service monitors, configure alerting rules, and view metrics. This helps you track backup and restore performance, manage clusters, and receive alerts for important events.
4.25.11.1. OADP monitoring setup Copier lienLien copié sur presse-papiers!
Set up OADP monitoring by enabling User Workload Monitoring and configuring the OpenShift Container Platform monitoring stack to retrieve Velero metrics. This helps you create alerting rules, query metrics, and optionally visualize data by using Prometheus-compatible tools such as Grafana.
Monitoring metrics requires enabling monitoring for the user-defined projects and creating a
ServiceMonitor
openshift-adp
The OADP support for Prometheus metrics is offered on a best-effort basis and is not fully supported.
For more information about setting up the monitoring stack, see Configuring user workload monitoring.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with permissions.
cluster-admin - You have created a cluster monitoring config map.
Procedure
Edit the
cluster-monitoring-configobject in theConfigMapnamespace:openshift-monitoring$ oc edit configmap cluster-monitoring-config -n openshift-monitoringAdd or enable the
option in theenableUserWorkloadsection’sdatafield:config.yamlapiVersion: v1 kind: ConfigMap data: config.yaml: | enableUserWorkload: true metadata: # ...where:
enableUserWorkload-
Add this option or set to
true.
Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the
namespace:openshift-user-workload-monitoring$ oc get pods -n openshift-user-workload-monitoringNAME READY STATUS RESTARTS AGE prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s prometheus-user-workload-0 5/5 Running 0 32s prometheus-user-workload-1 5/5 Running 0 32s thanos-ruler-user-workload-0 3/3 Running 0 32s thanos-ruler-user-workload-1 3/3 Running 0 32sVerify the existence of the
ConfigMap in theuser-workload-monitoring-config. If it exists, skip the remaining steps in this procedure.openshift-user-workload-monitoring$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoringError from server (NotFound): configmaps "user-workload-monitoring-config" not foundCreate a
user-workload-monitoring-configobject for the User Workload Monitoring, and save it under theConfigMapfile name:2_configure_user_workload_monitoring.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: |Apply the
file:2_configure_user_workload_monitoring.yaml$ oc apply -f 2_configure_user_workload_monitoring.yaml configmap/user-workload-monitoring-config created
4.25.11.2. Creating OADP service monitor Copier lienLien copié sur presse-papiers!
Create a
ServiceMonitor
OADP provides an
openshift-adp-velero-metrics-svc
openshift-adp-velero-metrics-svc
Procedure
Ensure the
service exists. It should containopenshift-adp-velero-metrics-svclabel, which will be used as selector for theapp.kubernetes.io/name=veleroobject.ServiceMonitor$ oc get svc -n openshift-adp -l app.kubernetes.io/name=veleroExample output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE openshift-adp-velero-metrics-svc ClusterIP 172.30.38.244 <none> 8085/TCP 1hCreate a
YAML file that matches the existing service label, and save the file asServiceMonitor. The service monitor is created in the3_create_oadp_service_monitor.yamlnamespace which has theopenshift-adpservice.openshift-adp-velero-metrics-svcapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: oadp-service-monitor name: oadp-service-monitor namespace: openshift-adp spec: endpoints: - interval: 30s path: /metrics targetPort: 8085 scheme: http selector: matchLabels: app.kubernetes.io/name: "velero"Apply the
file:3_create_oadp_service_monitor.yaml$ oc apply -f 3_create_oadp_service_monitor.yamlExample output
servicemonitor.monitoring.coreos.com/oadp-service-monitor created
Verification
Confirm that the new service monitor is in an Up state by using the Administrator perspective of the OpenShift Container Platform web console. Wait a few minutes for the service monitor to reach the Up state.
- Navigate to the Observe → Targets page.
-
Ensure the Filter is unselected or that the User source is selected and type in the
openshift-adpsearch field.Text Verify that the status for the Status for the service monitor is Up.
Figure 4.2. OADP metrics targets
4.25.11.3. Creating an alerting rule Copier lienLien copié sur presse-papiers!
Create a
PrometheusRule
The OpenShift Container Platform monitoring stack receives alerts configured by using alerting rules. To create an alerting rule for the OADP project, use one of the metrics scraped with the user workload monitoring.
Procedure
Create a
YAML file with the samplePrometheusRulealert and save it asOADPBackupFailing.4_create_oadp_alert_rule.yamlapiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: sample-oadp-alert namespace: openshift-adp spec: groups: - name: sample-oadp-backup-alert rules: - alert: OADPBackupFailing annotations: description: 'OADP had {{$value | humanize}} backup failures over the last 2 hours.' summary: OADP has issues creating backups expr: | increase(velero_backup_failure_total{job="openshift-adp-velero-metrics-svc"}[2h]) > 0 for: 5m labels: severity: warningIn this sample, the Alert displays under the following conditions:
- There is an increase of new failing backups during the 2 last hours that is greater than 0 and the state persists for at least 5 minutes.
-
If the time of the first increase is less than 5 minutes, the Alert will be in a state, after which it will turn into a
Pendingstate.Firing
Apply the
file, which creates the4_create_oadp_alert_rule.yamlobject in thePrometheusRulenamespace:openshift-adp$ oc apply -f 4_create_oadp_alert_rule.yamlExample output
prometheusrule.monitoring.coreos.com/sample-oadp-alert created
Verification
After the Alert is triggered, you can view it in the following ways:
- In the Developer perspective, select the Observe menu.
In the Administrator perspective under the Observe → Alerting menu, select User in the Filter box. Otherwise, by default only the Platform Alerts are displayed.
Figure 4.3. OADP backup failing alert
4.25.11.4. List of available metrics Copier lienLien copié sur presse-papiers!
Review the following table for a list of
Velero
| Metric name | Description | Type |
|---|---|---|
|
| Size, in bytes, of a backup | Gauge |
|
| Current number of existent backups | Gauge |
|
| Total number of attempted backups | Counter |
|
| Total number of successful backups | Counter |
|
| Total number of partially failed backups | Counter |
|
| Total number of failed backups | Counter |
|
| Total number of validation failed backups | Counter |
|
| Time taken to complete backup, in seconds | Histogram |
|
| Total count of observations for a bucket in the histogram for the metric
| Counter |
|
| Total count of observations for the metric
| Counter |
|
| Total sum of observations for the metric
| Counter |
|
| Total number of attempted backup deletions | Counter |
|
| Total number of successful backup deletions | Counter |
|
| Total number of failed backup deletions | Counter |
|
| Last time a backup ran successfully, UNIX timestamp in seconds | Gauge |
|
| Total number of items backed up | Gauge |
|
| Total number of errors encountered during backup | Gauge |
|
| Total number of warned backups | Counter |
|
| Last status of the backup. A value of 1 is success, 0 is failure | Gauge |
|
| Current number of existent restores | Gauge |
|
| Total number of attempted restores | Counter |
|
| Total number of failed restores failing validations | Counter |
|
| Total number of successful restores | Counter |
|
| Total number of partially failed restores | Counter |
|
| Total number of failed restores | Counter |
|
| Total number of attempted volume snapshots | Counter |
|
| Total number of successful volume snapshots | Counter |
|
| Total number of failed volume snapshots | Counter |
|
| Total number of CSI attempted volume snapshots | Counter |
|
| Total number of CSI successful volume snapshots | Counter |
|
| Total number of CSI failed volume snapshots | Counter |
4.25.11.5. Viewing metrics using the Observe UI Copier lienLien copié sur presse-papiers!
Review metrics in the OpenShift Container Platform web console from the Administrator or Developer perspective, which must have access to the
openshift-adp
Procedure
Navigate to the Observe → Metrics page:
If you are using the Developer perspective, follow these steps:
- Select Custom query, or click the Show PromQL link.
- Type the query and click Enter.
If you are using the Administrator perspective, type the expression in the text field and select Run Queries.
Figure 4.4. OADP metrics query
Chapter 5. Control plane backup and restore Copier lienLien copié sur presse-papiers!
5.1. Backing up etcd Copier lienLien copié sur presse-papiers!
etcd is the key-value store for OpenShift Container Platform, which persists the state of all resource objects.
Back up your cluster’s etcd data regularly and store in a secure location ideally outside the OpenShift Container Platform environment. Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation, otherwise the backup will contain expired certificates. It is also recommended to take etcd backups during non-peak usage hours because the etcd snapshot has a high I/O cost.
Be sure to take an etcd backup before you update your cluster. Taking a backup before you update is important because when you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an OpenShift Container Platform 4.17.5 cluster must use an etcd backup that was taken from 4.17.5.
Back up your cluster’s etcd data by performing a single invocation of the backup script on a control plane host. Do not take a backup for each control plane host.
After you have an etcd backup, you can restore to a previous cluster state.
5.1.1. Backing up etcd data Copier lienLien copié sur presse-papiers!
Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.
Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin You have checked whether the cluster-wide proxy is enabled.
TipYou can check whether the proxy is enabled by reviewing the output of
. The proxy is enabled if theoc get proxy cluster -o yaml,httpProxy, andhttpsProxyfields have values set.noProxy
Procedure
Start a debug session as root for a control plane node:
$ oc debug --as-root node/<node_name>Change your root directory to
in the debug shell:/hostsh-4.4# chroot /hostIf the cluster-wide proxy is enabled, export the
,NO_PROXY, andHTTP_PROXYenvironment variables by running the following commands:HTTPS_PROXY$ export HTTP_PROXY=http://<your_proxy.example.com>:8080$ export HTTPS_PROXY=https://<your_proxy.example.com>:8080$ export NO_PROXY=<example.com>Run the
script in the debug shell and pass in the location to save the backup to.cluster-backup.shTipThe
script is maintained as a component of the etcd Cluster Operator and is a wrapper around thecluster-backup.shcommand.etcdctl snapshot savesh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backupExample script output
found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6 found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7 found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6 found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3 ede95fe6b88b87ba86a03c15e669fb4aa5bf0991c180d3c6895ce72eaade54a1 etcdctl version: 3.4.14 API version: 3.4 {"level":"info","ts":1624647639.0188997,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db.part"} {"level":"info","ts":"2021-06-25T19:00:39.030Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"} {"level":"info","ts":1624647639.0301006,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.0.5:2379"} {"level":"info","ts":"2021-06-25T19:00:40.215Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"} {"level":"info","ts":1624647640.6032252,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.0.5:2379","size":"114 MB","took":1.584090459} {"level":"info","ts":1624647640.6047094,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db"} Snapshot saved at /home/core/assets/backup/snapshot_2021-06-25_190035.db {"hash":3866667823,"revision":31407,"totalKey":12828,"totalSize":114446336} snapshot db and kube resources are successfully saved to /home/core/assets/backupIn this example, two files are created in the
directory on the control plane host:/home/core/assets/backup/-
: This file is the etcd snapshot. The
snapshot_<datetimestamp>.dbscript confirms its validity.cluster-backup.sh - : This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.
static_kuberesources_<datetimestamp>.tar.gzNoteIf etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required to restore from the etcd snapshot.
Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted.
-
5.1.3. Creating automated etcd backups Copier lienLien copié sur presse-papiers!
The automated backup feature for etcd supports both recurring and single backups. Recurring backups create a cron job that starts a single backup each time the job triggers.
Automating etcd backups is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
5.1.3.1. Enabling automated etcd backups Copier lienLien copié sur presse-papiers!
Follow these steps to enable automated backups for etcd.
Enabling the
TechPreviewNoUpgrade
TechPreviewNoUpgrade
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin -
You have access to the OpenShift CLI ().
oc
Procedure
Create a
custom resource (CR) file namedFeatureGatewith the following contents:enable-tech-preview-no-upgrade.yamlapiVersion: config.openshift.io/v1 kind: FeatureGate metadata: name: cluster spec: featureSet: TechPreviewNoUpgradeApply the CR and enable automated backups:
$ oc apply -f enable-tech-preview-no-upgrade.yamlIt takes time to enable the related APIs. Verify the creation of the custom resource definition (CRD) by running the following command:
$ oc get crd | grep backupExample output
backups.config.openshift.io 2023-10-25T13:32:43Z etcdbackups.operator.openshift.io 2023-10-25T13:32:04Z
5.1.3.2. Creating a single etcd backup Copier lienLien copié sur presse-papiers!
Follow these steps to create a single etcd backup by creating and applying a custom resource (CR).
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin -
You have access to the OpenShift CLI ().
oc - You have a PVC to save backup data to.
Procedure
Create a CR file named
with contents such as the following example:etcd-single-backup.yamlapiVersion: operator.openshift.io/v1alpha1 kind: EtcdBackup metadata: name: etcd-single-backup namespace: openshift-etcd spec: pvcName: etcd-backup-pvc1 - 1
- The name of the persistent volume claim (PVC) to save the backup to. Adjust this value according to your environment.
Apply the CR to start a single backup:
$ oc apply -f etcd-single-backup.yaml
5.1.3.3. Creating recurring etcd backups Copier lienLien copié sur presse-papiers!
Follow these steps to create automated recurring backups of etcd.
Use dynamically-provisioned storage to keep the created etcd backup data in a safe, external location if possible. If dynamically-provisioned storage is not available, consider storing the backup data on an NFS share to make backup recovery more accessible.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin -
You have access to the OpenShift CLI ().
oc
Procedure
If dynamically-provisioned storage is available, complete the following steps to create automated recurring backups:
Create a persistent volume claim (PVC) named
with contents such as the following example:etcd-backup-pvc.yamlkind: PersistentVolumeClaim apiVersion: v1 metadata: name: etcd-backup-pvc namespace: openshift-etcd spec: accessModes: - ReadWriteOnce resources: requests: storage: 200Gi1 storageClassName: standard-csi2 volumeMode: FilesystemNoteEach of the following providers require changes to the
andaccessModeskeys:storageClassNameExpand Provider accessModesvaluestorageClassNamevalueAWS with the
profileversioned-installer-efc_operator-ci- ReadWriteManyefs-scGoogle Cloud Platform
- ReadWriteManyfilestore-csiMicrosoft Azure
- ReadWriteManyazurefile-csiApply the PVC by running the following command:
$ oc apply -f etcd-backup-pvc.yamlVerify the creation of the PVC by running the following command:
$ oc get pvcExample output
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE etcd-backup-pvc Pending standard-csi 51sNoteDynamic PVCs stay in the
state until they are mounted.Pending
If dynamically-provisioned storage is unavailable, create a local storage PVC by completing the following steps:
WarningIf you delete or otherwise lose access to the node that contains the stored backup data, you can lose data.
Create a
CR file namedStorageClasswith the following contents:etcd-backup-local-storage.yamlapiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: etcd-backup-local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumerApply the
CR by running the following command:StorageClass$ oc apply -f etcd-backup-local-storage.yamlCreate a PV named
from the appliedetcd-backup-pv-fs.yamlwith content such as the following example:StorageClassapiVersion: v1 kind: PersistentVolume metadata: name: etcd-backup-pv-fs spec: capacity: storage: 100Gi1 volumeMode: Filesystem accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Delete storageClassName: local-storage local: path: /mnt/ nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - <example-master-node>2 TipRun the following command to list the available nodes:
$ oc get nodesVerify the creation of the PV by running the following command:
$ oc get pvExample output
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE etcd-backup-pv-fs 100Gi RWX Delete Available local-storage 10sCreate a PVC named
with contents such as the following example:etcd-backup-pvc.yamlkind: PersistentVolumeClaim apiVersion: v1 metadata: name: etcd-backup-pvc spec: accessModes: - ReadWriteMany volumeMode: Filesystem resources: requests: storage: 10Gi1 storageClassName: local-storage- 1
- The amount of storage available to the PVC. Adjust this value for your requirements.
Apply the PVC by running the following command:
$ oc apply -f etcd-backup-pvc.yaml
Create a custom resource definition (CRD) file named
. The contents of the created CRD define the schedule and retention type of automated backups.etcd-recurring-backups.yamlFor the default retention type of
with 15 retained backups, use contents such as the following example:RetentionNumberapiVersion: config.openshift.io/v1alpha1 kind: Backup metadata: name: etcd-recurring-backup spec: etcd: schedule: "20 4 * * *"1 timeZone: "UTC" pvcName: etcd-backup-pvc- 1
- The
CronTabschedule for recurring backups. Adjust this value for your needs.
To use retention based on the maximum number of backups, add the following key-value pairs to the
key:etcdspec: etcd: retentionPolicy: retentionType: RetentionNumber1 retentionNumber: maxNumberOfBackups: 52 WarningA known issue causes the number of retained backups to be one greater than the configured value.
For retention based on the file size of backups, use the following:
spec: etcd: retentionPolicy: retentionType: RetentionSize retentionSize: maxSizeOfBackupsGb: 201 - 1
- The maximum file size of the retained backups in gigabytes. Adjust this value for your needs. Defaults to 10 GB if unspecified.
WarningA known issue causes the maximum size of retained backups to be up to 10 GB greater than the configured value.
Create the cron job defined by the CRD by running the following command:
$ oc create -f etcd-recurring-backup.yamlTo find the created cron job, run the following command:
$ oc get cronjob -n openshift-etcd
5.2. Replacing an unhealthy etcd member Copier lienLien copié sur presse-papiers!
This document describes the process to replace a single unhealthy etcd member.
This process depends on whether the etcd member is unhealthy because the machine is not running or the node is not ready, or whether it is unhealthy because the etcd pod is crashlooping.
If you have lost the majority of your control plane hosts, follow the disaster recovery procedure to restore to a previous cluster state instead of this procedure.
If the control plane certificates are not valid on the member being replaced, then you must follow the procedure to recover from expired control plane certificates instead of this procedure.
If a control plane node is lost and a new one is created, the etcd cluster Operator handles generating the new TLS certificates and adding the node as an etcd member.
5.2.1. Prerequisites Copier lienLien copié sur presse-papiers!
- Take an etcd backup prior to replacing an unhealthy etcd member.
5.2.2. Identifying an unhealthy etcd member Copier lienLien copié sur presse-papiers!
You can identify if your cluster has an unhealthy etcd member.
Prerequisites
-
Access to the cluster as a user with the role.
cluster-admin
Procedure
Check the status of the
status condition using the following command:EtcdMembersAvailable$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}{end}'Review the output:
2 of 3 members are available, ip-10-0-131-183.ec2.internal is unhealthyThis example output shows that the
etcd member is unhealthy.ip-10-0-131-183.ec2.internal
5.2.3. Determining the state of the unhealthy etcd member Copier lienLien copié sur presse-papiers!
The steps to replace an unhealthy etcd member depend on which of the following states your etcd member is in:
- The machine is not running or the node is not ready
- The etcd pod is crashlooping
This procedure determines which state your etcd member is in. This enables you to know which procedure to follow to replace the unhealthy etcd member.
If you are aware that the machine is not running or the node is not ready, but you expect it to return to a healthy state soon, then you do not need to perform a procedure to replace the etcd member. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin - You have identified an unhealthy etcd member.
Procedure
Determine if the machine is not running:
$ oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v runningExample output
ip-10-0-131-183.ec2.internal stopped1 - 1
- This output lists the node and the status of the node’s machine. If the status is anything other than
running, then the machine is not running.
If the machine is not running, then follow the Replacing an unhealthy etcd member whose machine is not running or whose node is not ready procedure.
Determine if the node is not ready.
If either of the following scenarios are true, then the node is not ready.
If the machine is running, then check whether the node is unreachable:
$ oc get nodes -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{"\t"}{range .spec.taints[*]}{.key}{" "}' | grep unreachableExample output
ip-10-0-131-183.ec2.internal node-role.kubernetes.io/master node.kubernetes.io/unreachable node.kubernetes.io/unreachable1 - 1
- If the node is listed with an
unreachabletaint, then the node is not ready.
If the node is still reachable, then check whether the node is listed as
:NotReady$ oc get nodes -l node-role.kubernetes.io/master | grep "NotReady"Example output
ip-10-0-131-183.ec2.internal NotReady master 122m v1.27.31 - 1
- If the node is listed as
NotReady, then the node is not ready.
If the node is not ready, then follow the Replacing an unhealthy etcd member whose machine is not running or whose node is not ready procedure.
Determine if the etcd pod is crashlooping.
If the machine is running and the node is ready, then check whether the etcd pod is crashlooping.
Verify that all control plane nodes are listed as
:Ready$ oc get nodes -l node-role.kubernetes.io/masterExample output
NAME STATUS ROLES AGE VERSION ip-10-0-131-183.ec2.internal Ready master 6h13m v1.27.3 ip-10-0-164-97.ec2.internal Ready master 6h13m v1.27.3 ip-10-0-154-204.ec2.internal Ready master 6h13m v1.27.3Check whether the status of an etcd pod is either
orError:CrashloopBackoff$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
etcd-ip-10-0-131-183.ec2.internal 2/3 Error 7 6h9m1 etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 6h6m etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 6h6m- 1
- Since this status of this pod is
Error, then the etcd pod is crashlooping.
If the etcd pod is crashlooping, then follow the Replacing an unhealthy etcd member whose etcd pod is crashlooping procedure.
5.2.4. Replacing the unhealthy etcd member Copier lienLien copié sur presse-papiers!
Depending on the state of your unhealthy etcd member, use one of the following procedures:
5.2.4.1. Replacing an unhealthy etcd member whose machine is not running or whose node is not ready Copier lienLien copié sur presse-papiers!
This procedure details the steps to replace an etcd member that is unhealthy either because the machine is not running or because the node is not ready.
If your cluster uses a control plane machine set, see "Recovering a degraded etcd Operator" in "Troubleshooting the control plane machine set" for an etcd recovery procedure.
Prerequisites
- You have identified the unhealthy etcd member.
You have verified that either the machine is not running or the node is not ready.
ImportantYou must wait if the other control plane nodes are powered off. The control plane nodes must remain powered off until the replacement of an unhealthy etcd member is complete.
-
You have access to the cluster as a user with the role.
cluster-admin You have taken an etcd backup.
ImportantBefore you perform this procedure, take an etcd backup so that you can restore your cluster if you experience any issues.
Procedure
Remove the unhealthy member.
Choose a pod that is not on the affected node:
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
etcd-ip-10-0-131-183.ec2.internal 3/3 Running 0 123m etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 123m etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 124mConnect to the running etcd container, passing in the name of a pod that is not on the affected node:
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internalView the member list:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+------------------------------+---------------------------+---------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------------------+---------------------------+---------------------------+ | 6fc1e7c9db35841d | started | ip-10-0-131-183.ec2.internal | https://10.0.131.183:2380 | https://10.0.131.183:2379 | | 757b6793e2408b6c | started | ip-10-0-164-97.ec2.internal | https://10.0.164.97:2380 | https://10.0.164.97:2379 | | ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 | +------------------+---------+------------------------------+---------------------------+---------------------------+Take note of the ID and the name of the unhealthy etcd member because these values are needed later in the procedure. The
command will list the removed member until the procedure of replacement is finished and a new member is added.$ etcdctl endpoint healthRemove the unhealthy etcd member by providing the ID to the
command:etcdctl member removesh-4.2# etcdctl member remove 6fc1e7c9db35841dExample output
Member 6fc1e7c9db35841d removed from cluster ead669ce1fbfb346View the member list again and verify that the member was removed:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+------------------------------+---------------------------+---------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------------------+---------------------------+---------------------------+ | 757b6793e2408b6c | started | ip-10-0-164-97.ec2.internal | https://10.0.164.97:2380 | https://10.0.164.97:2379 | | ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 | +------------------+---------+------------------------------+---------------------------+---------------------------+You can now exit the node shell.
Turn off the quorum guard by entering the following command:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'This command ensures that you can successfully re-create secrets and roll out the static pods.
ImportantAfter you turn off the quorum guard, the cluster might be unreachable for a short time while the remaining etcd instances reboot to reflect the configuration change.
Noteetcd cannot tolerate any additional member failure when running with two members. Restarting either remaining member breaks the quorum and causes downtime in your cluster. The quorum guard protects etcd from restarts due to configuration changes that could cause downtime, so it must be disabled to complete this procedure.
Delete the affected node by running the following command:
$ oc delete node <node_name>Example command
$ oc delete node ip-10-0-131-183.ec2.internalRemove the old secrets for the unhealthy etcd member that was removed.
List the secrets for the unhealthy etcd member that was removed.
$ oc get secrets -n openshift-etcd | grep ip-10-0-131-183.ec2.internal1 - 1
- Pass in the name of the unhealthy etcd member that you took note of earlier in this procedure.
There is a peer, serving, and metrics secret as shown in the following output:
Example output
etcd-peer-ip-10-0-131-183.ec2.internal kubernetes.io/tls 2 47m etcd-serving-ip-10-0-131-183.ec2.internal kubernetes.io/tls 2 47m etcd-serving-metrics-ip-10-0-131-183.ec2.internal kubernetes.io/tls 2 47mDelete the secrets for the unhealthy etcd member that was removed.
Delete the peer secret:
$ oc delete secret -n openshift-etcd etcd-peer-ip-10-0-131-183.ec2.internalDelete the serving secret:
$ oc delete secret -n openshift-etcd etcd-serving-ip-10-0-131-183.ec2.internalDelete the metrics secret:
$ oc delete secret -n openshift-etcd etcd-serving-metrics-ip-10-0-131-183.ec2.internal
Check whether a control plane machine set exists by entering the following command:
$ oc -n openshift-machine-api get controlplanemachinesetIf the control plane machine set exists, delete and re-create the control plane machine. After this machine is re-created, a new revision is forced and etcd scales up automatically. For more information, see "Replacing an unhealthy etcd member whose machine is not running or whose node is not ready".
If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps. Otherwise, you must create the new control plane by using the same method that was used to originally create it.
Obtain the machine for the unhealthy member.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-0 Running m4.xlarge us-east-1 us-east-1a 3h37m ip-10-0-131-183.ec2.internal aws:///us-east-1a/i-0ec2782f8287dfb7e stopped1 clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-154-204.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-164-97.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running- 1
- This is the control plane machine for the unhealthy node,
ip-10-0-131-183.ec2.internal.
Delete the machine of the unhealthy member:
$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-01 - 1
- Specify the name of the control plane machine for the unhealthy node.
A new machine is automatically provisioned after deleting the machine of the unhealthy member.
Verify that a new machine was created:
$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-154-204.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-164-97.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-master-3 Provisioning m4.xlarge us-east-1 us-east-1a 85s ip-10-0-133-53.ec2.internal aws:///us-east-1a/i-015b0888fe17bc2c8 running1 clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running- 1
- The new machine,
clustername-8qw5l-master-3is being created and is ready once the phase changes fromProvisioningtoRunning.
It might take a few minutes for the new machine to be created. The etcd cluster Operator automatically syncs when the machine or node returns to a healthy state.
NoteVerify the subnet IDs that you are using for your machine sets to ensure that they end up in the correct availability zone.
If the control plane machine set does not exist, delete and re-create the control plane machine. After this machine is re-created, a new revision is forced and etcd scales up automatically.
If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps. Otherwise, you must create the new control plane by using the same method that was used to originally create it.
Obtain the machine for the unhealthy member.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-0 Running m4.xlarge us-east-1 us-east-1a 3h37m ip-10-0-131-183.ec2.internal aws:///us-east-1a/i-0ec2782f8287dfb7e stopped1 clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-154-204.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-164-97.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running- 1
- This is the control plane machine for the unhealthy node,
ip-10-0-131-183.ec2.internal.
Save the machine configuration to a file on your file system:
$ oc get machine clustername-8qw5l-master-0 \1 -n openshift-machine-api \ -o yaml \ > new-master-machine.yaml- 1
- Specify the name of the control plane machine for the unhealthy node.
Edit the
file that was created in the previous step to assign a new name and remove unnecessary fields.new-master-machine.yamlRemove the entire
section:statusstatus: addresses: - address: 10.0.131.183 type: InternalIP - address: ip-10-0-131-183.ec2.internal type: InternalDNS - address: ip-10-0-131-183.ec2.internal type: Hostname lastUpdated: "2020-04-20T17:44:29Z" nodeRef: kind: Node name: ip-10-0-131-183.ec2.internal uid: acca4411-af0d-4387-b73e-52b2484295ad phase: Running providerStatus: apiVersion: awsproviderconfig.openshift.io/v1beta1 conditions: - lastProbeTime: "2020-04-20T16:53:50Z" lastTransitionTime: "2020-04-20T16:53:50Z" message: machine successfully created reason: MachineCreationSucceeded status: "True" type: MachineCreation instanceId: i-0fdb85790d76d0c3f instanceState: stopped kind: AWSMachineProviderStatusChange the
field to a new name.metadata.nameKeep the same base name as the old machine and change the ending number to the next available number. In this example,
is changed toclustername-8qw5l-master-0.clustername-8qw5l-master-3For example:
apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: ... name: clustername-8qw5l-master-3 ...Remove the
field:spec.providerIDproviderID: aws:///us-east-1a/i-0fdb85790d76d0c3f
Delete the machine of the unhealthy member:
$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-01 - 1
- Specify the name of the control plane machine for the unhealthy node.
Verify that the machine was deleted:
$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-154-204.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-164-97.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a runningCreate the new machine by using the
file:new-master-machine.yaml$ oc apply -f new-master-machine.yamlVerify that the new machine was created:
$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-154-204.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-164-97.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-master-3 Provisioning m4.xlarge us-east-1 us-east-1a 85s ip-10-0-133-53.ec2.internal aws:///us-east-1a/i-015b0888fe17bc2c8 running1 clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running- 1
- The new machine,
clustername-8qw5l-master-3is being created and is ready once the phase changes fromProvisioningtoRunning.
It might take a few minutes for the new machine to be created. The etcd cluster Operator automatically syncs when the machine or node returns to a healthy state.
Turn the quorum guard back on by entering the following command:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'You can verify that the
section is removed from the object by entering this command:unsupportedConfigOverrides$ oc get etcd/cluster -oyamlIf you are using single-node OpenShift, restart the node. Otherwise, you might encounter the following error in the etcd cluster Operator:
Example output
EtcdCertSignerControllerDegraded: [Operation cannot be fulfilled on secrets "etcd-peer-sno-0": the object has been modified; please apply your changes to the latest version and try again, Operation cannot be fulfilled on secrets "etcd-serving-sno-0": the object has been modified; please apply your changes to the latest version and try again, Operation cannot be fulfilled on secrets "etcd-serving-metrics-sno-0": the object has been modified; please apply your changes to the latest version and try again]
Verification
Verify that all etcd pods are running properly.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
etcd-ip-10-0-133-53.ec2.internal 3/3 Running 0 7m49s etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 123m etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 124mIf the output from the previous command only lists two pods, you can manually force an etcd redeployment. In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge1 - 1
- The
forceRedeploymentReasonvalue must be unique, which is why a timestamp is appended.
Verify that there are exactly three etcd members.
Connect to the running etcd container, passing in the name of a pod that was not on the affected node:
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internalView the member list:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+------------------------------+---------------------------+---------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------------------+---------------------------+---------------------------+ | 5eb0d6b8ca24730c | started | ip-10-0-133-53.ec2.internal | https://10.0.133.53:2380 | https://10.0.133.53:2379 | | 757b6793e2408b6c | started | ip-10-0-164-97.ec2.internal | https://10.0.164.97:2380 | https://10.0.164.97:2379 | | ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 | +------------------+---------+------------------------------+---------------------------+---------------------------+If the output from the previous command lists more than three etcd members, you must carefully remove the unwanted member.
WarningBe sure to remove the correct etcd member; removing a good etcd member might lead to quorum loss.
5.2.4.2. Replacing an unhealthy etcd member whose etcd pod is crashlooping Copier lienLien copié sur presse-papiers!
This procedure details the steps to replace an etcd member that is unhealthy because the etcd pod is crashlooping.
Prerequisites
- You have identified the unhealthy etcd member.
- You have verified that the etcd pod is crashlooping.
-
You have access to the cluster as a user with the role.
cluster-admin You have taken an etcd backup.
ImportantIt is important to take an etcd backup before performing this procedure so that your cluster can be restored if you encounter any issues.
Procedure
Stop the crashlooping etcd pod.
Debug the node that is crashlooping.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc debug node/ip-10-0-131-183.ec2.internal1 - 1
- Replace this with the name of the unhealthy node.
Change your root directory to
:/hostsh-4.2# chroot /hostMove the existing etcd pod file out of the kubelet manifest directory:
sh-4.2# mkdir /var/lib/etcd-backupsh-4.2# mv /etc/kubernetes/manifests/etcd-pod.yaml /var/lib/etcd-backup/Move the etcd data directory to a different location:
sh-4.2# mv /var/lib/etcd/ /tmpYou can now exit the node shell.
Remove the unhealthy member.
Choose a pod that is not on the affected node.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
etcd-ip-10-0-131-183.ec2.internal 2/3 Error 7 6h9m etcd-ip-10-0-164-97.ec2.internal 3/3 Running 0 6h6m etcd-ip-10-0-154-204.ec2.internal 3/3 Running 0 6h6mConnect to the running etcd container, passing in the name of a pod that is not on the affected node.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internalView the member list:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+------------------------------+---------------------------+---------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------------------+---------------------------+---------------------------+ | 62bcf33650a7170a | started | ip-10-0-131-183.ec2.internal | https://10.0.131.183:2380 | https://10.0.131.183:2379 | | b78e2856655bc2eb | started | ip-10-0-164-97.ec2.internal | https://10.0.164.97:2380 | https://10.0.164.97:2379 | | d022e10b498760d5 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 | +------------------+---------+------------------------------+---------------------------+---------------------------+Take note of the ID and the name of the unhealthy etcd member, because these values are needed later in the procedure.
Remove the unhealthy etcd member by providing the ID to the
command:etcdctl member removesh-4.2# etcdctl member remove 62bcf33650a7170aExample output
Member 62bcf33650a7170a removed from cluster ead669ce1fbfb346View the member list again and verify that the member was removed:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+------------------------------+---------------------------+---------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------------------+---------------------------+---------------------------+ | b78e2856655bc2eb | started | ip-10-0-164-97.ec2.internal | https://10.0.164.97:2380 | https://10.0.164.97:2379 | | d022e10b498760d5 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 | +------------------+---------+------------------------------+---------------------------+---------------------------+You can now exit the node shell.
Turn off the quorum guard by entering the following command:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'This command ensures that you can successfully re-create secrets and roll out the static pods.
Remove the old secrets for the unhealthy etcd member that was removed.
List the secrets for the unhealthy etcd member that was removed.
$ oc get secrets -n openshift-etcd | grep ip-10-0-131-183.ec2.internal1 - 1
- Pass in the name of the unhealthy etcd member that you took note of earlier in this procedure.
There is a peer, serving, and metrics secret as shown in the following output:
Example output
etcd-peer-ip-10-0-131-183.ec2.internal kubernetes.io/tls 2 47m etcd-serving-ip-10-0-131-183.ec2.internal kubernetes.io/tls 2 47m etcd-serving-metrics-ip-10-0-131-183.ec2.internal kubernetes.io/tls 2 47mDelete the secrets for the unhealthy etcd member that was removed.
Delete the peer secret:
$ oc delete secret -n openshift-etcd etcd-peer-ip-10-0-131-183.ec2.internalDelete the serving secret:
$ oc delete secret -n openshift-etcd etcd-serving-ip-10-0-131-183.ec2.internalDelete the metrics secret:
$ oc delete secret -n openshift-etcd etcd-serving-metrics-ip-10-0-131-183.ec2.internal
Force etcd redeployment.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge1 - 1
- The
forceRedeploymentReasonvalue must be unique, which is why a timestamp is appended.
When the etcd cluster Operator performs a redeployment, it ensures that all control plane nodes have a functioning etcd pod.
Turn the quorum guard back on by entering the following command:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'You can verify that the
section is removed from the object by entering this command:unsupportedConfigOverrides$ oc get etcd/cluster -oyamlIf you are using single-node OpenShift, restart the node. Otherwise, you might encounter the following error in the etcd cluster Operator:
Example output
EtcdCertSignerControllerDegraded: [Operation cannot be fulfilled on secrets "etcd-peer-sno-0": the object has been modified; please apply your changes to the latest version and try again, Operation cannot be fulfilled on secrets "etcd-serving-sno-0": the object has been modified; please apply your changes to the latest version and try again, Operation cannot be fulfilled on secrets "etcd-serving-metrics-sno-0": the object has been modified; please apply your changes to the latest version and try again]
Verification
Verify that the new member is available and healthy.
Connect to the running etcd container again.
In a terminal that has access to the cluster as a cluster-admin user, run the following command:
$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internalVerify that all members are healthy:
sh-4.2# etcdctl endpoint healthExample output
https://10.0.131.183:2379 is healthy: successfully committed proposal: took = 16.671434ms https://10.0.154.204:2379 is healthy: successfully committed proposal: took = 16.698331ms https://10.0.164.97:2379 is healthy: successfully committed proposal: took = 16.621645ms
5.2.4.3. Replacing an unhealthy bare metal etcd member whose machine is not running or whose node is not ready Copier lienLien copié sur presse-papiers!
This procedure details the steps to replace a bare metal etcd member that is unhealthy either because the machine is not running or because the node is not ready.
If you are running installer-provisioned infrastructure or you used the Machine API to create your machines, follow these steps. Otherwise you must create the new control plane node using the same method that was used to originally create it.
Prerequisites
- You have identified the unhealthy bare metal etcd member.
- You have verified that either the machine is not running or the node is not ready.
-
You have access to the cluster as a user with the role.
cluster-admin You have taken an etcd backup.
ImportantYou must take an etcd backup before performing this procedure so that your cluster can be restored if you encounter any issues.
Procedure
Verify and remove the unhealthy member.
Choose a pod that is not on the affected node:
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc -n openshift-etcd get pods -l k8s-app=etcd -o wideExample output
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none> etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 <none> <none> etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 <none> <none>Connect to the running etcd container, passing in the name of a pod that is not on the affected node:
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc rsh -n openshift-etcd etcd-openshift-control-plane-0View the member list:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+--------------------+---------------------------+---------------------------+---------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+--------------------+---------------------------+---------------------------+---------------------+ | 7a8197040a5126c8 | started | openshift-control-plane-2 | https://192.168.10.11:2380/ | https://192.168.10.11:2379/ | false | | 8d5abe9669a39192 | started | openshift-control-plane-1 | https://192.168.10.10:2380/ | https://192.168.10.10:2379/ | false | | cc3830a72fc357f9 | started | openshift-control-plane-0 | https://192.168.10.9:2380/ | https://192.168.10.9:2379/ | false | +------------------+---------+--------------------+---------------------------+---------------------------+---------------------+Take note of the ID and the name of the unhealthy etcd member, because these values are required later in the procedure. The
command will list the removed member until the replacement procedure is completed and the new member is added.etcdctl endpoint healthRemove the unhealthy etcd member by providing the ID to the
command:etcdctl member removeWarningBe sure to remove the correct etcd member; removing a good etcd member might lead to quorum loss.
sh-4.2# etcdctl member remove 7a8197040a5126c8Example output
Member 7a8197040a5126c8 removed from cluster b23536c33f2cdd1bView the member list again and verify that the member was removed:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+--------------------+---------------------------+---------------------------+-------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+--------------------+---------------------------+---------------------------+-------------------------+ | cc3830a72fc357f9 | started | openshift-control-plane-2 | https://192.168.10.11:2380/ | https://192.168.10.11:2379/ | false | | 8d5abe9669a39192 | started | openshift-control-plane-1 | https://192.168.10.10:2380/ | https://192.168.10.10:2379/ | false | +------------------+---------+--------------------+---------------------------+---------------------------+-------------------------+You can now exit the node shell.
ImportantAfter you remove the member, the cluster might be unreachable for a short time while the remaining etcd instances reboot.
Turn off the quorum guard by entering the following command:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'This command ensures that you can successfully re-create secrets and roll out the static pods.
Remove the old secrets for the unhealthy etcd member that was removed by running the following commands.
List the secrets for the unhealthy etcd member that was removed.
$ oc get secrets -n openshift-etcd | grep openshift-control-plane-2Pass in the name of the unhealthy etcd member that you took note of earlier in this procedure.
There is a peer, serving, and metrics secret as shown in the following output:
etcd-peer-openshift-control-plane-2 kubernetes.io/tls 2 134m etcd-serving-metrics-openshift-control-plane-2 kubernetes.io/tls 2 134m etcd-serving-openshift-control-plane-2 kubernetes.io/tls 2 134mDelete the secrets for the unhealthy etcd member that was removed.
Delete the peer secret:
$ oc delete secret etcd-peer-openshift-control-plane-2 -n openshift-etcd secret "etcd-peer-openshift-control-plane-2" deletedDelete the serving secret:
$ oc delete secret etcd-serving-metrics-openshift-control-plane-2 -n openshift-etcd secret "etcd-serving-metrics-openshift-control-plane-2" deletedDelete the metrics secret:
$ oc delete secret etcd-serving-openshift-control-plane-2 -n openshift-etcd secret "etcd-serving-openshift-control-plane-2" deleted
Obtain the machine for the unhealthy member.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned1 examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned examplecluster-control-plane-2 Running 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned- 1
- This is the control plane machine for the unhealthy node,
examplecluster-control-plane-2.
Ensure that the Bare Metal Operator is available by running the following command:
$ oc get clusteroperator baremetalExample output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE baremetal 4.14.0 True False False 3d15hRemove the old
object by running the following command:BareMetalHost$ oc delete bmh openshift-control-plane-2 -n openshift-machine-apiExample output
baremetalhost.metal3.io "openshift-control-plane-2" deletedDelete the machine of the unhealthy member by running the following command:
$ oc delete machine -n openshift-machine-api examplecluster-control-plane-2After you remove the
andBareMetalHostobjects, then theMachinecontroller automatically deletes theMachineobject.NodeIf deletion of the machine is delayed for any reason or the command is obstructed and delayed, you can force deletion by removing the machine object finalizer field.
ImportantDo not interrupt machine deletion by pressing
. You must allow the command to proceed to completion. Open a new terminal window to edit and delete the finalizer fields.Ctrl+cA new machine is automatically provisioned after deleting the machine of the unhealthy member.
Edit the machine configuration by running the following command:
$ oc edit machine -n openshift-machine-api examplecluster-control-plane-2Delete the following fields in the
custom resource, and then save the updated file:Machinefinalizers: - machine.machine.openshift.ioExample output
machine.machine.openshift.io/examplecluster-control-plane-2 edited
Verify that the machine was deleted by running the following command:
$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisionedVerify that the node has been deleted by running the following command:
$ oc get nodes NAME STATUS ROLES AGE VERSION openshift-control-plane-0 Ready master 3h24m v1.27.3 openshift-control-plane-1 Ready master 3h24m v1.27.3 openshift-compute-0 Ready worker 176m v1.27.3 openshift-compute-1 Ready worker 176m v1.27.3Create the new
object and the secret to store the BMC credentials:BareMetalHost$ cat <<EOF | oc apply -f - apiVersion: v1 kind: Secret metadata: name: openshift-control-plane-2-bmc-secret namespace: openshift-machine-api data: password: <password> username: <username> type: Opaque --- apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: openshift-control-plane-2 namespace: openshift-machine-api spec: automatedCleaningMode: disabled bmc: address: redfish://10.46.61.18:443/redfish/v1/Systems/1 credentialsName: openshift-control-plane-2-bmc-secret disableCertificateVerification: true bootMACAddress: 48:df:37:b0:8a:a0 bootMode: UEFI externallyProvisioned: false online: true rootDeviceHints: deviceName: /dev/disk/by-id/scsi-<serial_number> userData: name: master-user-data-managed namespace: openshift-machine-api EOFNoteThe username and password can be found from the other bare metal host’s secrets. The protocol to use in
can be taken from other bmh objects.bmc:addressImportantIf you reuse the
object definition from an existing control plane host, do not leave theBareMetalHostfield set toexternallyProvisioned.trueExisting control plane
objects may have theBareMetalHostflag set toexternallyProvisionedif they were provisioned by the OpenShift Container Platform installation program.trueAfter the inspection is complete, the
object is created and available to be provisioned.BareMetalHostVerify the creation process using available
objects:BareMetalHost$ oc get bmh -n openshift-machine-api NAME STATE CONSUMER ONLINE ERROR AGE openshift-control-plane-0 externally provisioned examplecluster-control-plane-0 true 4h48m openshift-control-plane-1 externally provisioned examplecluster-control-plane-1 true 4h48m openshift-control-plane-2 available examplecluster-control-plane-3 true 47m openshift-compute-0 provisioned examplecluster-compute-0 true 4h48m openshift-compute-1 provisioned examplecluster-compute-1 true 4h48mVerify that a new machine has been created:
$ oc get machines -n openshift-machine-api -o wideExample output
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned1 examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned examplecluster-control-plane-2 Running 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned- 1
- The new machine,
clustername-8qw5l-master-3is being created and is ready after the phase changes fromProvisioningtoRunning.
It should take a few minutes for the new machine to be created. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
Verify that the bare metal host becomes provisioned and no error reported by running the following command:
$ oc get bmh -n openshift-machine-apiExample output
$ oc get bmh -n openshift-machine-api NAME STATE CONSUMER ONLINE ERROR AGE openshift-control-plane-0 externally provisioned examplecluster-control-plane-0 true 4h48m openshift-control-plane-1 externally provisioned examplecluster-control-plane-1 true 4h48m openshift-control-plane-2 provisioned examplecluster-control-plane-3 true 47m openshift-compute-0 provisioned examplecluster-compute-0 true 4h48m openshift-compute-1 provisioned examplecluster-compute-1 true 4h48mVerify that the new node is added and in a ready state by running this command:
$ oc get nodesExample output
$ oc get nodes NAME STATUS ROLES AGE VERSION openshift-control-plane-0 Ready master 4h26m v1.27.3 openshift-control-plane-1 Ready master 4h26m v1.27.3 openshift-control-plane-2 Ready master 12m v1.27.3 openshift-compute-0 Ready worker 3h58m v1.27.3 openshift-compute-1 Ready worker 3h58m v1.27.3
Turn the quorum guard back on by entering the following command:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'You can verify that the
section is removed from the object by entering this command:unsupportedConfigOverrides$ oc get etcd/cluster -oyamlIf you are using single-node OpenShift, restart the node. Otherwise, you might encounter the following error in the etcd cluster Operator:
Example output
EtcdCertSignerControllerDegraded: [Operation cannot be fulfilled on secrets "etcd-peer-sno-0": the object has been modified; please apply your changes to the latest version and try again, Operation cannot be fulfilled on secrets "etcd-serving-sno-0": the object has been modified; please apply your changes to the latest version and try again, Operation cannot be fulfilled on secrets "etcd-serving-metrics-sno-0": the object has been modified; please apply your changes to the latest version and try again]
Verification
Verify that all etcd pods are running properly.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
etcd-openshift-control-plane-0 5/5 Running 0 105m etcd-openshift-control-plane-1 5/5 Running 0 107m etcd-openshift-control-plane-2 5/5 Running 0 103mIf the output from the previous command only lists two pods, you can manually force an etcd redeployment. In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge1 - 1
- The
forceRedeploymentReasonvalue must be unique, which is why a timestamp is appended.
To verify there are exactly three etcd members, connect to the running etcd container, passing in the name of a pod that was not on the affected node. In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc rsh -n openshift-etcd etcd-openshift-control-plane-0View the member list:
sh-4.2# etcdctl member list -w tableExample output
+------------------+---------+--------------------+---------------------------+---------------------------+-----------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+--------------------+---------------------------+---------------------------+-----------------+ | 7a8197040a5126c8 | started | openshift-control-plane-2 | https://192.168.10.11:2380 | https://192.168.10.11:2379 | false | | 8d5abe9669a39192 | started | openshift-control-plane-1 | https://192.168.10.10:2380 | https://192.168.10.10:2379 | false | | cc3830a72fc357f9 | started | openshift-control-plane-0 | https://192.168.10.9:2380 | https://192.168.10.9:2379 | false | +------------------+---------+--------------------+---------------------------+---------------------------+-----------------+NoteIf the output from the previous command lists more than three etcd members, you must carefully remove the unwanted member.
Verify that all etcd members are healthy by running the following command:
# etcdctl endpoint health --clusterExample output
https://192.168.10.10:2379 is healthy: successfully committed proposal: took = 8.973065ms https://192.168.10.9:2379 is healthy: successfully committed proposal: took = 11.559829ms https://192.168.10.11:2379 is healthy: successfully committed proposal: took = 11.665203msValidate that all nodes are at the latest revision by running the following command:
$ oc get etcd -o=jsonpath='{range.items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'AllNodesAtLatestRevision
5.3. Disaster recovery Copier lienLien copié sur presse-papiers!
5.3.1. About disaster recovery Copier lienLien copié sur presse-papiers!
The disaster recovery documentation provides information for administrators on how to recover from several disaster situations that might occur with their OpenShift Container Platform cluster. As an administrator, you might need to follow one or more of the following procedures to return your cluster to a working state.
Disaster recovery requires you to have at least one healthy control plane host.
- Restoring to a previous cluster state
This solution handles situations where you want to restore your cluster to a previous state, for example, if an administrator deletes something critical. This also includes situations where you have lost the majority of your control plane hosts, leading to etcd quorum loss and the cluster going offline. As long as you have taken an etcd backup, you can follow this procedure to restore your cluster to a previous state.
If applicable, you might also need to recover from expired control plane certificates.
WarningRestoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort.
Prior to performing a restore, see About restoring cluster state for more information on the impact to the cluster.
NoteIf you have a majority of your masters still available and have an etcd quorum, then follow the procedure to replace a single unhealthy etcd member.
- Recovering from expired control plane certificates
- This solution handles situations where your control plane certificates have expired. For example, if you shut down your cluster before the first certificate rotation, which occurs 24 hours after installation, your certificates will not be rotated and will expire. You can follow this procedure to recover from expired control plane certificates.
5.3.2. Restoring to a previous cluster state Copier lienLien copié sur presse-papiers!
To restore the cluster to a previous state, you must have previously backed up the
etcd
5.3.2.1. About restoring cluster state Copier lienLien copié sur presse-papiers!
You can use an etcd backup to restore your cluster to a previous state. This can be used to recover from the following situations:
- The cluster has lost the majority of control plane hosts (quorum loss).
- An administrator has deleted something critical and must restore to recover the cluster.
Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This should only be used as a last resort.
If you are able to retrieve data using the Kubernetes API server, then etcd is available and you should not restore using an etcd backup.
Restoring etcd effectively takes a cluster back in time and all clients will experience a conflicting, parallel history. This can impact the behavior of watching components like kubelets, Kubernetes controller managers, SDN controllers, and persistent volume controllers.
It can cause Operator churn when the content in etcd does not match the actual content on disk, causing Operators for the Kubernetes API server, Kubernetes controller manager, Kubernetes scheduler, and etcd to get stuck when files on disk conflict with content in etcd. This can require manual actions to resolve the issues.
In extreme cases, the cluster can lose track of persistent volumes, delete critical workloads that no longer exist, reimage machines, and rewrite CA bundles with expired certificates.
5.3.2.2. Restoring to a previous cluster state Copier lienLien copié sur presse-papiers!
You can use a saved
etcd
If your cluster uses a control plane machine set, see "Recovering a degraded etcd Operator" in "Troubleshooting the control plane machine set" for an etcd recovery procedure.
When you restore your cluster, you must use an
etcd
etcd
Prerequisites
-
Access to the cluster as a user with the role through a certificate-based
cluster-adminfile, like the one that was used during installation.kubeconfig - A healthy control plane host to use as the recovery host.
- SSH access to control plane hosts.
-
A backup directory containing both the snapshot and the resources for the static pods, which were from the same backup. The file names in the directory must be in the following formats:
etcdandsnapshot_<datetimestamp>.db.static_kuberesources_<datetimestamp>.tar.gz
For non-recovery control plane nodes, it is not required to establish SSH connectivity or to stop the static pods. You can delete and recreate other non-recovery, control plane machines, one by one.
Procedure
- Select a control plane host to use as the recovery host. This is the host that you will run the restore operation on.
Establish SSH connectivity to each of the control plane nodes, including the recovery host.
becomes inaccessible after the restore process starts, so you cannot access the control plane nodes. For this reason, it is recommended to establish SSH connectivity to each control plane host in a separate terminal.kube-apiserverImportantIf you do not complete this step, you will not be able to access the control plane hosts to complete the restore procedure, and you will be unable to recover your cluster from this state.
Copy the
backup directory to the recovery control plane host.etcdThis procedure assumes that you copied the
directory containing thebackupsnapshot and the resources for the static pods to theetcddirectory of your recovery control plane host./home/core/Stop the static pods on any other control plane nodes.
NoteYou do not need to stop the static pods on the recovery host.
- Access a control plane host that is not the recovery host.
Move the existing etcd pod file out of the kubelet manifest directory by running:
$ sudo mv -v /etc/kubernetes/manifests/etcd-pod.yaml /tmpVerify that the
pods are stopped by using:etcd$ sudo crictl ps | grep etcd | egrep -v "operator|etcd-guard"If the output of this command is not empty, wait a few minutes and check again.
Move the existing
file out of the kubelet manifest directory by running:kube-apiserver$ sudo mv -v /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmpVerify that the
containers are stopped by running:kube-apiserver$ sudo crictl ps | grep kube-apiserver | egrep -v "operator|guard"If the output of this command is not empty, wait a few minutes and check again.
Move the existing
file out of the kubelet manifest directory by using:kube-controller-manager$ sudo mv -v /etc/kubernetes/manifests/kube-controller-manager-pod.yaml /tmpVerify that the
containers are stopped by running:kube-controller-manager$ sudo crictl ps | grep kube-controller-manager | egrep -v "operator|guard"If the output of this command is not empty, wait a few minutes and check again.
Move the existing
file out of the kubelet manifest directory by using:kube-scheduler$ sudo mv -v /etc/kubernetes/manifests/kube-scheduler-pod.yaml /tmpVerify that the
containers are stopped by using:kube-scheduler$ sudo crictl ps | grep kube-scheduler | egrep -v "operator|guard"If the output of this command is not empty, wait a few minutes and check again.
Move the
data directory to a different location with the following example:etcd$ sudo mv -v /var/lib/etcd/ /tmpIf the
file exists and the node is deleted, follow these steps:/etc/kubernetes/manifests/keepalived.yamlMove the
file out of the kubelet manifest directory:/etc/kubernetes/manifests/keepalived.yaml$ sudo mv -v /etc/kubernetes/manifests/keepalived.yaml /tmpVerify that any containers managed by the
daemon are stopped:keepalived$ sudo crictl ps --name keepalivedThe output of this command should be empty. If it is not empty, wait a few minutes and check again.
Check if the control plane has any Virtual IPs (VIPs) assigned to it:
$ ip -o address | egrep '<api_vip>|<ingress_vip>'For each reported VIP, run the following command to remove it:
$ sudo ip address del <reported_vip> dev <reported_vip_device>
- Repeat this step on each of the other control plane hosts that is not the recovery host.
- Access the recovery control plane host.
If the
daemon is in use, verify that the recovery control plane node owns the VIP:keepalived$ ip -o address | grep <api_vip>The address of the VIP is highlighted in the output if it exists. This command returns an empty string if the VIP is not set or configured incorrectly.
If the cluster-wide proxy is enabled, be sure that you have exported the
,NO_PROXY, andHTTP_PROXYenvironment variables.HTTPS_PROXYTipYou can check whether the proxy is enabled by reviewing the output of
. The proxy is enabled if theoc get proxy cluster -o yaml,httpProxy, andhttpsProxyfields have values set.noProxyRun the restore script on the recovery control plane host and pass in the path to the
backup directory:etcd$ sudo -E /usr/local/bin/cluster-restore.sh /home/core/assets/backupExample script output
...stopping kube-scheduler-pod.yaml ...stopping kube-controller-manager-pod.yaml ...stopping etcd-pod.yaml ...stopping kube-apiserver-pod.yaml Waiting for container etcd to stop .complete Waiting for container etcdctl to stop .............................complete Waiting for container etcd-metrics to stop complete Waiting for container kube-controller-manager to stop complete Waiting for container kube-apiserver to stop ..........................................................................................complete Waiting for container kube-scheduler to stop complete Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup starting restore-etcd static pod starting kube-apiserver-pod.yaml static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml starting kube-controller-manager-pod.yaml static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.yaml starting kube-scheduler-pod.yaml static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yamlThe cluster-restore.sh script must show that
,etcd,kube-apiserver, andkube-controller-managerpods are stopped and then started at the end of the restore process.kube-schedulerNoteThe restore process can cause nodes to enter the
state if the node certificates were updated after the lastNotReadybackup.etcdCheck the nodes to ensure they are in the
state.ReadyRun the following command:
$ oc get nodes -wSample output
NAME STATUS ROLES AGE VERSION host-172-25-75-28 Ready master 3d20h v1.27.3 host-172-25-75-38 Ready infra,worker 3d20h v1.27.3 host-172-25-75-40 Ready master 3d20h v1.27.3 host-172-25-75-65 Ready master 3d20h v1.27.3 host-172-25-75-74 Ready infra,worker 3d20h v1.27.3 host-172-25-75-79 Ready worker 3d20h v1.27.3 host-172-25-75-86 Ready worker 3d20h v1.27.3 host-172-25-75-98 Ready infra,worker 3d20h v1.27.3It can take several minutes for all nodes to report their state.
If any nodes are in the
state, log in to the nodes and remove all of the PEM files from theNotReadydirectory on each node. You can SSH into the nodes or use the terminal window in the web console./var/lib/kubelet/pki$ ssh -i <ssh-key-path> core@<master-hostname>Sample
pkidirectorysh-4.4# pwd /var/lib/kubelet/pki sh-4.4# ls kubelet-client-2022-04-28-11-24-09.pem kubelet-server-2022-04-28-11-24-15.pem kubelet-client-current.pem kubelet-server-current.pem
Restart the kubelet service on all control plane hosts.
From the recovery host, run:
$ sudo systemctl restart kubelet.service- Repeat this step on all other control plane hosts.
Approve the pending Certificate Signing Requests (CSRs):
NoteClusters with no worker nodes, such as single-node clusters or clusters consisting of three schedulable control plane nodes, will not have any pending CSRs to approve. You can skip all the commands listed in this step.
Get the list of current CSRs by running:
$ oc get csrExample output
NAME AGE SIGNERNAME REQUESTOR CONDITION csr-2s94x 8m3s kubernetes.io/kubelet-serving system:node:<node_name> Pending1 csr-4bd6t 8m3s kubernetes.io/kubelet-serving system:node:<node_name> Pending2 csr-4hl85 13m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending3 csr-zhhhp 3m8s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending4 ...Review the details of a CSR to verify that it is valid by running:
$ oc describe csr <csr_name>1 - 1
<csr_name>is the name of a CSR from the list of current CSRs.
Approve each valid
CSR by running:node-bootstrapper$ oc adm certificate approve <csr_name>For user-provisioned installations, approve each valid kubelet service CSR by running:
$ oc adm certificate approve <csr_name>
Verify that the single member control plane has started successfully.
From the recovery host, verify that the
container is running by using:etcd$ sudo crictl ps | grep etcd | egrep -v "operator|etcd-guard"Example output
3ad41b7908e32 36f86e2eeaaffe662df0d21041eb22b8198e0e58abeeae8c743c3e6e977e8009 About a minute ago Running etcd 0 7c05f8af362f0From the recovery host, verify that the
pod is running by using:etcd$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
NAME READY STATUS RESTARTS AGE etcd-ip-10-0-143-125.ec2.internal 1/1 Running 1 2m47sIf the status is
, or the output lists more than one runningPendingpod, wait a few minutes and check again.etcd
If you are using the
network plugin, you must restartOVNKubernetespods.ovnkube-controlplaneDelete all of the
pods by running:ovnkube-controlplane$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-control-planeVerify that all of the
pods were redeployed by using:ovnkube-controlplane$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-control-plane
If you are using the OVN-Kubernetes network plugin, restart the Open Virtual Network (OVN) Kubernetes pods on all the nodes one by one. Use the following steps to restart OVN-Kubernetes pods on each node:
ImportantRestart OVN-Kubernetes pods in the following order:
- The recovery control plane host
- The other control plane hosts (if available)
- The other nodes
NoteValidating and mutating admission webhooks can reject pods. If you add any additional webhooks with the
set tofailurePolicy, then they can reject pods and the restoration process can fail. You can avoid this by saving and deleting webhooks while restoring the cluster state. After the cluster state is restored successfully, you can enable the webhooks again.FailAlternatively, you can temporarily set the
tofailurePolicywhile restoring the cluster state. After the cluster state is restored successfully, you can set theIgnoretofailurePolicy.Fail
Remove the northbound database (nbdb) and southbound database (sbdb). Access the recovery host and the remaining nodes by using a Secure Shell (SSH), and run the following command:
+
$ sudo rm -f /var/lib/ovn-ic/etc/*.db
Restart the OpenVSwitch services. Access the node by using Secure Shell (SSH) and run the following command:
$ sudo systemctl restart ovs-vswitchd ovsdb-serverDelete the
pod on the node by running the following command, replacingovnkube-nodewith the name of the node that you are restarting:<node>$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>Check the status of the OVN pods by running the following command:
$ oc get po -n openshift-ovn-kubernetesIf any OVN pods are in the
status, delete the node that is running that OVN pod by running the following command. ReplaceTerminatingwith the name of the node you are deleting:<node>$ oc delete node <node>Use SSH to log in to the OVN pod node with the
status by running the following command:Terminating$ ssh -i <ssh-key-path> core@<node>Move all PEM files from the
directory by running the following command:/var/lib/kubelet/pki$ sudo mv /var/lib/kubelet/pki/* /tmpRestart the kubelet service by running the following command:
$ sudo systemctl restart kubelet.serviceReturn to the recovery etcd machines by running the following command:
$ oc get csrExample output
NAME AGE SIGNERNAME REQUESTOR CONDITION csr-<uuid> 8m3s kubernetes.io/kubelet-serving system:node:<node_name> PendingApprove all new CSRs by running the following command, replacing
with the name of the CSR:csr-<uuid>oc adm certificate approve csr-<uuid>Verify that the node is back by running the following command:
$ oc get nodes
Verify that the
pod is running again with:ovnkube-node$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>NoteIt might take several minutes for the pods to restart.
Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and
automatically scales up.etcdIf you use a user-provisioned bare metal installation, you can re-create a control plane machine by using the same method that you used to originally create it. For more information, see "Installing a user-provisioned cluster on bare metal".
WarningDo not delete and re-create the machine for the recovery host.
If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps:
WarningDo not delete and re-create the machine for the recovery host.
For bare metal installations on installer-provisioned infrastructure, control plane machines are not re-created. For more information, see "Replacing a bare-metal control plane node".
Obtain the machine for one of the lost control plane hosts.
In a terminal that has access to the cluster as a cluster-admin user, run the following command:
$ oc get machines -n openshift-machine-api -o wideExample output:
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-0 Running m4.xlarge us-east-1 us-east-1a 3h37m ip-10-0-131-183.ec2.internal aws:///us-east-1a/i-0ec2782f8287dfb7e stopped1 clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running- 1
- This is the control plane machine for the lost control plane host,
ip-10-0-131-183.ec2.internal.
Delete the machine of the lost control plane host by running:
$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-01 - 1
- Specify the name of the control plane machine for the lost control plane host.
A new machine is automatically provisioned after deleting the machine of the lost control plane host.
Verify that a new machine has been created by running:
$ oc get machines -n openshift-machine-api -o wideExample output:
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running clustername-8qw5l-master-3 Provisioning m4.xlarge us-east-1 us-east-1a 85s ip-10-0-173-171.ec2.internal aws:///us-east-1a/i-015b0888fe17bc2c8 running1 clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running- 1
- The new machine,
clustername-8qw5l-master-3is being created and is ready after the phase changes fromProvisioningtoRunning.
It might take a few minutes for the new machine to be created. The
cluster Operator will automatically sync when the machine or node returns to a healthy state.etcdRepeat these steps for each lost control plane host that is not the recovery host.
Turn off the quorum guard by entering:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'This command ensures that you can successfully re-create secrets and roll out the static pods.
In a separate terminal window within the recovery host, export the recovery
file by running:kubeconfig$ export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfigForce
redeployment.etcdIn the same terminal window where you exported the recovery
file, run:kubeconfig$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge1 - 1
- The
forceRedeploymentReasonvalue must be unique, which is why a timestamp is appended.
When the
cluster Operator performs a redeployment, the existing nodes are started with new pods similar to the initial bootstrap scale up.etcdTurn the quorum guard back on by entering:
$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'You can verify that the
section is removed from the object by running:unsupportedConfigOverrides$ oc get etcd/cluster -oyamlVerify all nodes are updated to the latest revision.
In a terminal that has access to the cluster as a
user, run:cluster-admin$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'Review the
status condition forNodeInstallerProgressingto verify that all nodes are at the latest revision. The output showsetcdupon successful update:AllNodesAtLatestRevisionAllNodesAtLatestRevision 3 nodes are at revision 71 - 1
- In this example, the latest revision number is
7.
If the output includes multiple revision numbers, such as
, this means that the update is still in progress. Wait a few minutes and try again.2 nodes are at revision 6; 1 nodes are at revision 7After
is redeployed, force new rollouts for the control plane.etcdwill reinstall itself on the other nodes because the kubelet is connected to API servers using an internal load balancer.kube-apiserverIn a terminal that has access to the cluster as a
user, run:cluster-admin
Force a new rollout for
:kube-apiserver$ oc patch kubeapiserver cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=mergeVerify all nodes are updated to the latest revision.
$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'Review the
status condition to verify that all nodes are at the latest revision. The output showsNodeInstallerProgressingupon successful update:AllNodesAtLatestRevisionAllNodesAtLatestRevision 3 nodes are at revision 71 - 1
- In this example, the latest revision number is
7.
If the output includes multiple revision numbers, such as
, this means that the update is still in progress. Wait a few minutes and try again.2 nodes are at revision 6; 1 nodes are at revision 7Force a new rollout for the Kubernetes controller manager by running the following command:
$ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=mergeVerify all nodes are updated to the latest revision by running:
$ oc get kubecontrollermanager -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'Review the
status condition to verify that all nodes are at the latest revision. The output showsNodeInstallerProgressingupon successful update:AllNodesAtLatestRevisionAllNodesAtLatestRevision 3 nodes are at revision 71 - 1
- In this example, the latest revision number is
7.
If the output includes multiple revision numbers, such as
, this means that the update is still in progress. Wait a few minutes and try again.2 nodes are at revision 6; 1 nodes are at revision 7Force a new rollout for the
by running:kube-scheduler$ oc patch kubescheduler cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=mergeVerify all nodes are updated to the latest revision by using:
$ oc get kubescheduler -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'Review the
status condition to verify that all nodes are at the latest revision. The output showsNodeInstallerProgressingupon successful update:AllNodesAtLatestRevisionAllNodesAtLatestRevision 3 nodes are at revision 71 - 1
- In this example, the latest revision number is
7.
If the output includes multiple revision numbers, such as
, this means that the update is still in progress. Wait a few minutes and try again.2 nodes are at revision 6; 1 nodes are at revision 7Verify that all control plane hosts have started and joined the cluster.
In a terminal that has access to the cluster as a
user, run the following command:cluster-admin$ oc -n openshift-etcd get pods -l k8s-app=etcdExample output
etcd-ip-10-0-143-125.ec2.internal 2/2 Running 0 9h etcd-ip-10-0-154-194.ec2.internal 2/2 Running 0 9h etcd-ip-10-0-173-171.ec2.internal 2/2 Running 0 9h
To ensure that all workloads return to normal operation following a recovery procedure, restart all control plane nodes.
On completion of the previous procedural steps, you might need to wait a few minutes for all services to return to their restored state. For example, authentication by using
oc login
Consider using the
system:admin
kubeconfig
$ export KUBECONFIG=<installation_directory>/auth/kubeconfig
Issue the following command to display your authenticated user name:
$ oc whoami
5.3.2.3. Restoring a cluster manually from an etcd backup Copier lienLien copié sur presse-papiers!
The restore procedure described in the section "Restoring to a previous cluster state":
-
Requires the complete recreation of 2 control plane nodes, which might be a complex procedure for clusters installed with the UPI installation method, since an UPI installation does not create any or
Machinefor the control plane nodes.ControlPlaneMachineset - Uses the script /usr/local/bin/cluster-restore.sh, which starts a new single-member etcd cluster and then scales it to three members.
In contrast, this procedure:
- Does not require recreating any control plane nodes.
- Directly starts a three-member etcd cluster.
If the cluster uses a
MachineSet
When you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an OpenShift Container Platform 4.7.2 cluster must use an etcd backup that was taken from 4.7.2.
Prerequisites
-
Access to the cluster as a user with the role; for example, the
cluster-adminuser.kubeadmin -
SSH access to all control plane hosts, with a host user allowed to become ; for example, the default
roothost user.core -
A backup directory containing both a previous etcd snapshot and the resources for the static pods from the same backup. The file names in the directory must be in the following formats: and
snapshot_<datetimestamp>.db.static_kuberesources_<datetimestamp>.tar.gz
Procedure
Use SSH to connect to each of the control plane nodes.
The Kubernetes API server becomes inaccessible after the restore process starts, so you cannot access the control plane nodes. For this reason, it is recommended to use a SSH connection for each control plane host you are accessing in a separate terminal.
ImportantIf you do not complete this step, you will not be able to access the control plane hosts to complete the restore procedure, and you will be unable to recover your cluster from this state.
Copy the etcd backup directory to each control plane host.
This procedure assumes that you copied the
directory containing the etcd snapshot and the resources for the static pods to thebackupdirectory of each control plane host. You might need to create such/home/core/assetsfolder if it does not exist yet.assetsStop the static pods on all the control plane nodes; one host at a time.
Move the existing Kubernetes API Server static pod manifest out of the kubelet manifest directory.
$ mkdir -p /root/manifests-backup$ mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml /root/manifests-backup/Verify that the Kubernetes API Server containers have stopped with the command:
$ crictl ps | grep kube-apiserver | grep -E -v "operator|guard"The output of this command should be empty. If it is not empty, wait a few minutes and check again.
If the Kubernetes API Server containers are still running, terminate them manually with the following command:
$ crictl stop <container_id>Repeat the same steps for
,kube-controller-manager-pod.yamland finallykube-scheduler-pod.yaml.etcd-pod.yamlStop the
pod with the following command:kube-controller-manager$ mv /etc/kubernetes/manifests/kube-controller-manager-pod.yaml /root/manifests-backup/Check if the containers are stopped using the following command:
$ crictl ps | grep kube-controller-manager | grep -E -v "operator|guard"Stop the
pod using the following command:kube-scheduler$ mv /etc/kubernetes/manifests/kube-scheduler-pod.yaml /root/manifests-backup/Check if the containers are stopped using the following command:
$ crictl ps | grep kube-scheduler | grep -E -v "operator|guard"Stop the
pod using the following command:etcd$ mv /etc/kubernetes/manifests/etcd-pod.yaml /root/manifests-backup/Check if the containers are stopped using the following command:
$ crictl ps | grep etcd | grep -E -v "operator|guard"
On each control plane host, save the current
data, by moving it into theetcdfolder:backup$ mkdir /home/core/assets/old-member-data$ mv /var/lib/etcd/member /home/core/assets/old-member-dataThis data will be useful in case the
backup restore does not work and theetcdcluster must be restored to the current state.etcdFind the correct etcd parameters for each control plane host.
The value for
is unique for the each control plane host, and it is equal to the value of the<ETCD_NAME>variable in the manifestETCD_NAMEfile in the specific control plane host. It can be found with the command:/etc/kubernetes/static-pod-resources/etcd-certs/configmaps/restore-etcd-pod/pod.yamlRESTORE_ETCD_POD_YAML="/etc/kubernetes/static-pod-resources/etcd-certs/configmaps/restore-etcd-pod/pod.yaml" cat $RESTORE_ETCD_POD_YAML | \ grep -A 1 $(cat $RESTORE_ETCD_POD_YAML | grep 'export ETCD_NAME' | grep -Eo 'NODE_.+_ETCD_NAME') | \ grep -Po '(?<=value: ").+(?=")'The value for
can be generated in a control plane host with the command:<UUID>$ uuidgenNoteThe value for
must be generated only once. After generating<UUID>on one control plane host, do not generate it again on the others. The sameUUIDwill be used in the next steps on all control plane hosts.UUIDThe value for
should be set like the following example:ETCD_NODE_PEER_URLhttps://<IP_CURRENT_HOST>:2380The correct IP can be found from the
of the specific control plane host, with the command:<ETCD_NAME>$ echo <ETCD_NAME> | \ sed -E 's/[.-]/_/g' | \ xargs -I {} grep {} /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-scripts/etcd.env | \ grep "IP" | grep -Po '(?<=").+(?=")'The value for
should be set like the following, where<ETCD_INITIAL_CLUSTER>is the<ETCD_NAME_n>of each control plane host.<ETCD_NAME>NoteThe port used must be 2380 and not 2379. The port 2379 is used for etcd database management and is configured directly in etcd start command in container.
Example output
<ETCD_NAME_0>=<ETCD_NODE_PEER_URL_0>,<ETCD_NAME_1>=<ETCD_NODE_PEER_URL_1>,<ETCD_NAME_2>=<ETCD_NODE_PEER_URL_2>1 - 1
- Specifies the
ETCD_NODE_PEER_URLvalues from each control plane host.
The
value remains same across all control plane hosts. The same value is required in the next steps on every control plane host.<ETCD_INITIAL_CLUSTER>
Regenerate the etcd database from the backup.
Such operation must be executed on each control plane host.
Copy the
backup toetcddirectory with the command:/var/lib/etcd$ cp /home/core/assets/backup/<snapshot_yyyy-mm-dd_hhmmss>.db /var/lib/etcdIdentify the correct
image before proceeding. Use the following command to retrieve the image from the backup of the pod manifest:etcdctl$ jq -r '.spec.containers[]|select(.name=="etcdctl")|.image' /root/manifests-backup/etcd-pod.yaml$ podman run --rm -it --entrypoint="/bin/bash" -v /var/lib/etcd:/var/lib/etcd:z <image-hash>Check that the version of the
tool is the version of theetcdctlserver where the backup was created:etcd$ etcdctl versionRun the following command to regenerate the
database, using the correct values for the current host:etcd$ ETCDCTL_API=3 /usr/bin/etcdctl snapshot restore /var/lib/etcd/<snapshot_yyyy-mm-dd_hhmmss>.db \ --name "<ETCD_NAME>" \ --initial-cluster="<ETCD_INITIAL_CLUSTER>" \ --initial-cluster-token "openshift-etcd-<UUID>" \ --initial-advertise-peer-urls "<ETCD_NODE_PEER_URL>" \ --data-dir="/var/lib/etcd/restore-<UUID>" \ --skip-hash-check=trueNoteThe quotes are mandatory when regenerating the
database.etcd
Record the values printed in the
logs; for example:added memberExample output
2022-06-28T19:52:43Z info membership/cluster.go:421 added member {"cluster-id": "c5996b7c11c30d6b", "local-member-id": "0", "added-peer-id": "56cd73b614699e7", "added-peer-peer-urls": ["https://10.0.91.5:2380"], "added-peer-is-learner": false} 2022-06-28T19:52:43Z info membership/cluster.go:421 added member {"cluster-id": "c5996b7c11c30d6b", "local-member-id": "0", "added-peer-id": "1f63d01b31bb9a9e", "added-peer-peer-urls": ["https://10.0.90.221:2380"], "added-peer-is-learner": false} 2022-06-28T19:52:43Z info membership/cluster.go:421 added member {"cluster-id": "c5996b7c11c30d6b", "local-member-id": "0", "added-peer-id": "fdc2725b3b70127c", "added-peer-peer-urls": ["https://10.0.94.214:2380"], "added-peer-is-learner": false}- Exit from the container.
-
Repeat these steps on the other control plane hosts, checking that the values printed in the logs are the same for all control plane hosts.
added member
Move the regenerated
database to the default location.etcdSuch operation must be executed on each control plane host.
Move the regenerated database (the
folder created by the previousmembercommand) to the default etcd locationetcdctl snapshot restore:/var/lib/etcd$ mv /var/lib/etcd/restore-<UUID>/member /var/lib/etcdRestore the SELinux context for
folder on/var/lib/etcd/memberdirectory:/var/lib/etcd$ restorecon -vR /var/lib/etcd/Remove the leftover files and directories:
$ rm -rf /var/lib/etcd/restore-<UUID>$ rm /var/lib/etcd/<snapshot_yyyy-mm-dd_hhmmss>.dbImportantWhen you are finished the
directory must contain only the folder/var/lib/etcd.member- Repeat these steps on the other control plane hosts.
Restart the etcd cluster.
- The following steps must be executed on all control plane hosts, but one host at a time.
Move the
static pod manifest back to the kubelet manifest directory, in order to make kubelet start the related containers :etcd$ mv /root/manifests-backup/etcd-pod.yaml /etc/kubernetes/manifestsVerify that all the
containers have started:etcd$ crictl ps | grep etcd | grep -v operatorExample output
38c814767ad983 f79db5a8799fd2c08960ad9ee22f784b9fbe23babe008e8a3bf68323f004c840 28 seconds ago Running etcd-health-monitor 2 fe4b9c3d6483c e1646b15207c6 9d28c15860870e85c91d0e36b45f7a6edd3da757b113ec4abb4507df88b17f06 About a minute ago Running etcd-metrics 0 fe4b9c3d6483c 08ba29b1f58a7 9d28c15860870e85c91d0e36b45f7a6edd3da757b113ec4abb4507df88b17f06 About a minute ago Running etcd 0 fe4b9c3d6483c 2ddc9eda16f53 9d28c15860870e85c91d0e36b45f7a6edd3da757b113ec4abb4507df88b17f06 About a minute ago Running etcdctlIf the output of this command is empty, wait a few minutes and check again.
Check the status of the
cluster.etcdOn any of the control plane hosts, check the status of the
cluster with the following command:etcd$ crictl exec -it $(crictl ps | grep etcdctl | awk '{print $1}') etcdctl endpoint status -w tableExample output
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://10.0.89.133:2379 | 682e4a83a0cec6c0 | 3.5.0 | 67 MB | true | false | 2 | 218 | 218 | | | https://10.0.92.74:2379 | 450bcf6999538512 | 3.5.0 | 67 MB | false | false | 2 | 218 | 218 | | | https://10.0.93.129:2379 | 358efa9c1d91c3d6 | 3.5.0 | 67 MB | false | false | 2 | 218 | 218 | | +--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Restart the other static pods.
The following steps must be executed on all control plane hosts, but one host at a time.
Move the Kubernetes API Server static pod manifest back to the kubelet manifest directory to make kubelet start the related containers with the command:
$ mv /root/manifests-backup/kube-apiserver-pod.yaml /etc/kubernetes/manifestsVerify that all the Kubernetes API Server containers have started:
$ crictl ps | grep kube-apiserver | grep -v operatorNoteif the output of the following command is empty, wait a few minutes and check again.
Repeat the same steps for
andkube-controller-manager-pod.yamlfiles.kube-scheduler-pod.yamlRestart the kubelets in all nodes using the following command:
$ systemctl restart kubeletStart the remaining control plane pods using the following command:
$ mv /root/manifests-backup/kube-* /etc/kubernetes/manifests/Check if the
,kube-apiserverandkube-schedulerpods start correctly:kube-controller-manager$ crictl ps | grep -E 'kube-(apiserver|scheduler|controller-manager)' | grep -v -E 'operator|guard'Wipe the OVN databases using the following commands:
for NODE in $(oc get node -o name | sed 's:node/::g') do oc debug node/${NODE} -- chroot /host /bin/bash -c 'rm -f /var/lib/ovn-ic/etc/ovn*.db && systemctl restart ovs-vswitchd ovsdb-server' oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector=spec.nodeName=${NODE} --wait oc -n openshift-ovn-kubernetes wait pod -l app=ovnkube-node --field-selector=spec.nodeName=${NODE} --for condition=ContainersReady --timeout=600s done
5.3.2.5. Issues and workarounds for restoring a persistent storage state Copier lienLien copié sur presse-papiers!
If your OpenShift Container Platform cluster uses persistent storage of any form, a state of the cluster is typically stored outside etcd. It might be an Elasticsearch cluster running in a pod or a database running in a
StatefulSet
The contents of persistent volumes (PVs) are never part of the etcd snapshot. When you restore an OpenShift Container Platform cluster from an etcd snapshot, non-critical workloads might gain access to critical data, or vice-versa.
The following are some example scenarios that produce an out-of-date status:
- MySQL database is running in a pod backed up by a PV object. Restoring OpenShift Container Platform from an etcd snapshot does not bring back the volume on the storage provider, and does not produce a running MySQL pod, despite the pod repeatedly attempting to start. You must manually restore this pod by restoring the volume on the storage provider, and then editing the PV to point to the new volume.
- Pod P1 is using volume A, which is attached to node X. If the etcd snapshot is taken while another pod uses the same volume on node Y, then when the etcd restore is performed, pod P1 might not be able to start correctly due to the volume still being attached to node Y. OpenShift Container Platform is not aware of the attachment, and does not automatically detach it. When this occurs, the volume must be manually detached from node Y so that the volume can attach on node X, and then pod P1 can start.
- Cloud provider or storage provider credentials were updated after the etcd snapshot was taken. This causes any CSI drivers or Operators that depend on the those credentials to not work. You might have to manually update the credentials required by those drivers or Operators.
A device is removed or renamed from OpenShift Container Platform nodes after the etcd snapshot is taken. The Local Storage Operator creates symlinks for each PV that it manages from
or/dev/disk/by-iddirectories. This situation might cause the local PVs to refer to devices that no longer exist./devTo fix this problem, an administrator must:
- Manually remove the PVs with invalid devices.
- Remove symlinks from respective nodes.
-
Delete or
LocalVolumeobjects (see Storage → Configuring persistent storage → Persistent storage using local volumes → Deleting the Local Storage Operator Resources).LocalVolumeSet
5.3.3. Recovering from expired control plane certificates Copier lienLien copié sur presse-papiers!
5.3.3.1. Recovering from expired control plane certificates Copier lienLien copié sur presse-papiers!
The cluster can automatically recover from expired control plane certificates.
However, you must manually approve the pending
node-bootstrapper
Use the following steps to approve the pending CSRs:
Procedure
Get the list of current CSRs:
$ oc get csrExample output
NAME AGE SIGNERNAME REQUESTOR CONDITION csr-2s94x 8m3s kubernetes.io/kubelet-serving system:node:<node_name> Pending1 csr-4bd6t 8m3s kubernetes.io/kubelet-serving system:node:<node_name> Pending csr-4hl85 13m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending2 csr-zhhhp 3m8s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending ...Review the details of a CSR to verify that it is valid:
$ oc describe csr <csr_name>1 - 1
<csr_name>is the name of a CSR from the list of current CSRs.
Approve each valid
CSR:node-bootstrapper$ oc adm certificate approve <csr_name>For user-provisioned installations, approve each valid kubelet serving CSR:
$ oc adm certificate approve <csr_name>
Legal Notice
Copier lienLien copié sur presse-papiers!
Copyright © Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of the OpenJS Foundation.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.