8.7. 使用 OADP 的托管集群的灾难恢复

您可以使用 OpenShift API for Data Protection (OADP) Operator 在 Amazon Web Services (AWS) 和裸机上执行灾难恢复。

OpenShift API for Data Protection (OADP)的灾难恢复过程涉及以下步骤：

准备您的平台，如 Amazon Web Services 或裸机，以使用 OADP
备份数据平面工作负载
备份 control plane 工作负载
使用 OADP 恢复托管集群

8.7.1. 先决条件

您必须在管理集群中满足以下先决条件：

已安装 OADP Operator。
您创建了存储类。
您可以使用 cluster-admin 权限访问集群。
您可以通过目录源访问 OADP 订阅。
您可以访问与 OADP 兼容的云存储供应商，如 S3、Microsoft Azure、Google Cloud Platform 或 MinIO。
在断开连接的环境中，您可以访问与 OADP 兼容的自托管存储供应商，如 Red Hat OpenShift Data Foundation 或 MinIO。
您的托管的 control plane pod 已启动并运行。

8.7.2. 准备 AWS 以使用 OADP

要为托管集群执行灾难恢复，您可以在 Amazon Web Services (AWS) S3 兼容存储上使用 OpenShift API 进行数据保护 (OADP)。创建 DataProtectionApplication 对象后，会在 openshift-adp 命名空间中创建新的 velero 部署和 node-agent pod。

要准备 AWS 以使用 OADP，请参阅"配置 OpenShift API for Data Protection with Multicloud Object Gateway"。

其他资源

为使用多云对象网关的数据保护配置 OpenShift API

后续步骤

备份数据平面工作负载
备份 control plane 工作负载

8.7.3. 准备裸机以使用 OADP

要为托管集群执行灾难恢复，您可以在裸机上使用 OpenShift API 进行数据保护 (OADP)。创建 DataProtectionApplication 对象后，会在 openshift-adp 命名空间中创建新的 velero 部署和 node-agent pod。

要准备裸机以使用 OADP，请参阅"配置 OpenShift API for Data Protection with AWS S3 兼容存储"。

其他资源

为 AWS S3 兼容存储的数据保护配置 OpenShift API

后续步骤

备份数据平面工作负载
备份 control plane 工作负载

8.7.4. 备份数据平面工作负载

如果 data plane 工作负载不重要，您可以跳过这个过程。要使用 OADP Operator 备份数据平面工作负载，请参阅"恢复应用程序"。

其他资源

备份应用程序

后续步骤

使用 OADP 恢复托管集群

8.7.5. 备份 control plane 工作负载

您可以通过创建 Backup 自定义资源 (CR) 来备份 control plane 工作负载。

要监控并观察备份过程，请参阅"保留备份和恢复进程"。

流程

运行以下命令暂停 HostedCluster 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

运行以下命令，获取托管集群的基础架构 ID：
```
$ oc get hostedcluster -n local-cluster <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"
```
记录下在下一步要使用的基础架构 ID。

运行以下命令暂停 cluster.cluster.x-k8s.io 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch cluster.cluster.x-k8s.io \
  -n local-cluster-<hosted_cluster_name> <hosted_cluster_infra_id> \
  --type json -p '[{"op": "add", "path": "/spec/paused", "value": true}]'

运行以下命令暂停 NodePool 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

运行以下命令暂停 AgentCluster 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all'

运行以下命令暂停 AgentMachine 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all'

运行以下命令，注解 HostedCluster 资源以防止删除托管的 control plane 命名空间：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

创建定义 Backup CR 的 YAML 文件：
例 8.1. backup-control-plane.yaml 文件示例
```
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name> 1
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces: 2
  - <hosted_cluster_namespace> 3
  - <hosted_control_plane_namespace> 4
  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv 5
  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true 6
  datamover: "velero" 7
  defaultVolumesToFsBackup: true 8
```
1
将 backup_resource_name 替换为 Backup 资源的名称。
2
选择特定命名空间来备份对象。您必须包含托管集群命名空间和托管的 control plane 命名空间。
3
将 <hosted_cluster_namespace> 替换为托管集群命名空间的名称，如 clusters。
4
将 <hosted_control_plane_namespace> 替换为托管的 control plane 命名空间的名称，如 cluster-hosted。
5
您必须在单独的命名空间中创建 infraenv 资源。不要在备份过程中删除 infraenv 资源。
6 7
启用 CSI 卷快照，并自动将 control plane 工作负载上传到云存储中。
8
将持久性卷 (PV) 的 fs-backup 备份方法设置为默认。当您使用 Container Storage Interface (CSI) 卷快照和 fs-backup 方法的组合时，此设置很有用。
注意
如果要使用 CSI 卷快照，您必须在 PV 中添加 backup.velero.io/backup-volumes-excludes=<pv_name> 注解。
运行以下命令来应用 Backup CR：
```
$ oc apply -f backup-control-plane.yaml
```

验证

运行以下命令，验证 status.phase 的值是否为 Completed ：

$ oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

后续步骤

使用 OADP 恢复托管集群

8.7.6. 使用 OADP 恢复托管集群

您可以通过创建 Restore 自定义资源(CR)来恢复托管集群。

如果您使用 原位（in-place）升级，则 InfraEnv 不需要备用节点。您需要从新的管理集群重新置备 worker 节点。
如果使用 替换（replace） 更新，则需要一些备用节点才能部署 worker 节点。

重要

备份托管集群后，您必须销毁它来启动恢复过程。要启动节点置备，您必须在删除托管集群前备份数据平面中的工作负载。

先决条件

已完成了使用控制台删除集群中的步骤删除您的托管集群。
完成了删除集群后删除剩余的资源中的步骤。

要监控并观察备份过程，请参阅"保留备份和恢复进程"。

流程

运行以下命令，验证托管的 control plane 命名空间中没有 pod 和持久性卷声明 (PVC)：
```
$ oc get pod pvc -n <hosted_control_plane_namespace>
```
预期输出
```
No resources found
```
创建定义 Restore CR 的 YAML 文件：
restore-hosted-cluster.yaml 文件示例
```
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name> 1
  namespace: openshift-adp
spec:
  backupName: <backup_resource_name> 2
  restorePVs: true 3
  existingResourcePolicy: update 4
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
```
1
将 <restore_resource_name> 替换为 Restore 资源的名称。
2
将 <backup_resource_name> 替换为 Backup 资源的名称。
3
启动持久性卷 (PV) 及其 pod 的恢复。
4
确保现有对象被备份的内容覆盖。
重要
您必须在单独的命名空间中创建 infraenv 资源。不要在恢复过程中删除 infraenv 资源。要重新置备新节点，infraenv 资源是必须的。

运行以下命令来应用 Restore CR：

$ oc apply -f restore-hosted-cluster.yaml

运行以下命令，验证 status.phase 的值是否为 Completed ：

$ oc get hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> \
  -o jsonpath='{.status.phase}'

恢复过程完成后，启动您在备份 control plane 工作负载过程中暂停的 HostedCluster 和 NodePool 资源的协调：

运行以下命令启动 HostedCluster 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

运行以下命令，启动 NodePool 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

启动您在备份 control plane 工作负载过程中暂停的 Agent 供应商资源的协调：

运行以下命令启动 AgentCluster 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

运行以下命令，启动 AgentMachine 资源的协调：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

运行以下命令，删除 HostedCluster 资源中的 hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- 注解，以避免手动删除托管的 control plane 命名空间：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- \
  --overwrite=true --all

运行以下命令，将 NodePool 资源扩展到所需的副本数：

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  scale nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --replicas <replica_count> 1

1: 将 <replica_count> 替换为整数值，如 3。

8.7.7. 观察备份和恢复过程

当使用 OpenShift API for Data Protection (OADP) 来备份和恢复托管集群时，您可以监控并观察进程。

流程

运行以下命令观察备份过程：

$ watch "oc get backups.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

运行以下命令观察恢复过程：

$ watch "oc get restores.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

运行以下命令观察 Velero 日志：

$ oc logs -n openshift-adp -ldeploy=velero -f

运行以下命令，观察所有 OADP 对象的进度：

$ watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"

8.7.8. 使用 velero CLI 描述备份和恢复资源

当使用 OpenShift API 进行数据保护时，您可以使用 velero 命令行界面 (CLI) 获取 Backup 和 Restore 资源的更多详情。

流程

运行以下命令，创建一个别名，以便从容器中使用 velero CLI：

$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

运行以下命令，获取 Restore 自定义资源 (CR) 的详情：
```
$ velero restore describe <restore_resource_name> --details 1
```
1
将 <restore_resource_name> 替换为 Restore 资源的名称。
运行以下命令，获取 Backup CR 的详情：
```
$ velero restore describe <backup_resource_name> --details 1
```
1
将 <backup_resource_name> 替换为 Backup 资源的名称。

8.7. 使用 OADP 的托管集群的灾难恢复

8.7.1. 先决条件

8.7.2. 准备 AWS 以使用 OADP

8.7.3. 准备裸机以使用 OADP

8.7.4. 备份数据平面工作负载

8.7.5. 备份 control plane 工作负载

8.7.6. 使用 OADP 恢复托管集群

8.7.7. 观察备份和恢复过程

8.7.8. 使用 velero CLI 描述备份和恢复资源

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links