主页
产品
OpenShift Container Platform
4.19
托管 control plane
9.7. 使用 OADP 的托管集群的灾难恢复

9.7. 使用 OADP 的托管集群的灾难恢复

您可以使用 OpenShift API for Data Protection (OADP) Operator 在 Amazon Web Services (AWS) 和裸机上执行灾难恢复。

OpenShift API for Data Protection (OADP)的灾难恢复过程涉及以下步骤：

准备您的平台，如 Amazon Web Services 或裸机，以使用 OADP
备份数据平面工作负载
备份 control plane 工作负载
使用 OADP 恢复托管集群

9.7.1. 先决条件
复制链接

您必须在管理集群中满足以下先决条件：

已安装 OADP Operator。
您创建了存储类。
您可以使用 cluster-admin 权限访问集群。
您可以通过目录源访问 OADP 订阅。
您可以访问与 OADP 兼容的云存储供应商，如 S3、Microsoft Azure、Google Cloud 或 MinIO。
在断开连接的环境中，您可以访问与 OADP 兼容的自托管存储供应商，如 Red Hat OpenShift Data Foundation 或 MinIO。
您的托管的 control plane pod 已启动并运行。

9.7.2. 准备 AWS 以使用 OADP
复制链接

要为托管集群执行灾难恢复，您可以在 Amazon Web Services (AWS) S3 兼容存储上使用 OpenShift API 进行数据保护 (OADP)。创建 DataProtectionApplication 对象后，会在 openshift-adp 命名空间中创建新的 velero 部署和 node-agent pod。

要准备 AWS 以使用 OADP，请参阅"配置 OpenShift API for Data Protection with Multicloud Object Gateway"。

后续步骤

备份数据平面工作负载
备份 control plane 工作负载

9.7.3. 准备裸机以使用 OADP
复制链接

要为托管集群执行灾难恢复，您可以在裸机上使用 OpenShift API 进行数据保护 (OADP)。创建 DataProtectionApplication 对象后，会在 openshift-adp 命名空间中创建新的 velero 部署和 node-agent pod。

要准备裸机以使用 OADP，请参阅"配置 OpenShift API for Data Protection with AWS S3 兼容存储"。

后续步骤

备份数据平面工作负载
备份 control plane 工作负载

9.7.4. 备份数据平面工作负载
复制链接

如果 data plane 工作负载不重要，您可以跳过这个过程。要使用 OADP Operator 备份数据平面工作负载，请参阅"恢复应用程序"。

后续步骤

使用 OADP 恢复托管集群

9.7.5. 备份 control plane 工作负载
复制链接

您可以通过创建 Backup 自定义资源 (CR) 来备份 control plane 工作负载。这些步骤会根据您的平台是 AWS 还是裸机而有所不同。

9.7.5.1. 在 AWS 上备份 control plane 工作负载
复制链接

您可以通过创建 Backup 自定义资源 (CR) 来备份 control plane 工作负载。

要监控并观察备份过程，请参阅"保留备份和恢复进程"。

流程

运行以下命令暂停 HostedCluster 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令，获取托管集群的基础架构 ID：

oc get hostedcluster -n local-cluster <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"

$ oc get hostedcluster -n local-cluster <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"

Copy to Clipboard

Toggle word wrap

记录下在下一步要使用的基础架构 ID。

运行以下命令暂停 cluster.cluster.x-k8s.io 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch cluster.cluster.x-k8s.io \
  -n local-cluster-<hosted_cluster_name> <hosted_cluster_infra_id> \
  --type json -p '[{"op": "add", "path": "/spec/paused", "value": true}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch cluster.cluster.x-k8s.io \
  -n local-cluster-<hosted_cluster_name> <hosted_cluster_infra_id> \
  --type json -p '[{"op": "add", "path": "/spec/paused", "value": true}]'

Copy to Clipboard

Toggle word wrap

运行以下命令暂停 NodePool 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令暂停 AgentCluster 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all'

Copy to Clipboard

Toggle word wrap

运行以下命令暂停 AgentMachine 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all'

Copy to Clipboard

Toggle word wrap

运行以下命令，注解 HostedCluster 资源以防止删除托管的 control plane 命名空间：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

Copy to Clipboard

Toggle word wrap

创建定义 Backup CR 的 YAML 文件：

例 9.1. backup-control-plane.yaml 文件示例

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name> 
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces: 
  - <hosted_cluster_namespace> 
  - <hosted_control_plane_namespace> 
  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv 
  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true 
  datamover: "velero" 
  defaultVolumesToFsBackup: true

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name>


  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces:


  - <hosted_cluster_namespace>


  - <hosted_control_plane_namespace>


  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv


  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true


  datamover: "velero"


  defaultVolumesToFsBackup: true

Copy to Clipboard

Toggle word wrap

1: 将 backup_resource_name 替换为 Backup 资源的名称。
2: 选择特定命名空间来备份对象。您必须包含托管集群命名空间和托管的 control plane 命名空间。
3: 将 <hosted_cluster_namespace> 替换为托管集群命名空间的名称，如 clusters。
4: 将 <hosted_control_plane_namespace> 替换为托管的 control plane 命名空间的名称，如 cluster-hosted。
5: 您必须在单独的命名空间中创建 infraenv 资源。不要在备份过程中删除 infraenv 资源。
6 7: 启用 CSI 卷快照，并自动将 control plane 工作负载上传到云存储中。
8: 将持久性卷 (PV) 的 fs-backup 备份方法设置为默认。当您使用 Container Storage Interface (CSI) 卷快照和 fs-backup 方法的组合时，此设置很有用。

注意

如果要使用 CSI 卷快照，您必须在 PV 中添加 backup.velero.io/backup-volumes-excludes=<pv_name> 注解。

运行以下命令来应用 Backup CR：
```
oc apply -f backup-control-plane.yaml
```
```
$ oc apply -f backup-control-plane.yaml
```
Copy to Clipboard Toggle word wrap

验证

运行以下命令，验证 status.phase 的值是否为 Completed ：

oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

$ oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

Copy to Clipboard

Toggle word wrap

后续步骤

使用 OADP 恢复托管集群

9.7.5.2. 在裸机平台上备份 control plane 工作负载
复制链接

您可以通过创建 Backup 自定义资源 (CR) 来备份 control plane 工作负载。

要监控并观察备份过程，请参阅"保留备份和恢复进程"。

流程

运行以下命令暂停 HostedCluster 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令，获取托管集群的基础架构 ID：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  get hostedcluster -n <hosted_cluster_namespace> \
  <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  get hostedcluster -n <hosted_cluster_namespace> \
  <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"

Copy to Clipboard

Toggle word wrap

记录下在下一步要使用的基础架构 ID。

运行以下命令暂停 cluster.cluster.x-k8s.io 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate cluster -n <hosted_control_plane_namespace> \
  <hosted_cluster_infra_id> cluster.x-k8s.io/paused=true

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate cluster -n <hosted_control_plane_namespace> \
  <hosted_cluster_infra_id> cluster.x-k8s.io/paused=true

Copy to Clipboard

Toggle word wrap

运行以下命令暂停 NodePool 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令暂停 AgentCluster 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

Copy to Clipboard

Toggle word wrap

运行以下命令暂停 AgentMachine 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

Copy to Clipboard

Toggle word wrap

如果您要备份和恢复同一管理集群，请注解 HostedCluster 资源，以防止删除托管的 control plane 命名空间：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

Copy to Clipboard

Toggle word wrap

创建定义 Backup CR 的 YAML 文件：

例 9.2. backup-control-plane.yaml 文件示例

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name> 
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces: 
  - <hosted_cluster_namespace> 
  - <hosted_control_plane_namespace> 
  - <agent_namespace> 
  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv
  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - services
  - deployments
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true 
  datamover: "velero" 
  defaultVolumesToFsBackup: true

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name>


  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces:


  - <hosted_cluster_namespace>


  - <hosted_control_plane_namespace>


  - <agent_namespace>


  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv
  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - services
  - deployments
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true


  datamover: "velero"


  defaultVolumesToFsBackup: true

Copy to Clipboard

Toggle word wrap

1: 将 backup_resource_name 替换为 Backup 资源的名称。
2: 选择特定命名空间来备份对象。您必须包含托管集群命名空间和托管的 control plane 命名空间。
3: 将 <hosted_cluster_namespace> 替换为托管集群命名空间的名称，如 clusters。
4: 将 <hosted_control_plane_namespace> 替换为托管的 control plane 命名空间的名称，如 cluster-hosted。
5: 将 <agent_namespace> 替换为 Agent、BMH 和 InfraEnv CR 的命名空间，例如 agents。
6 7: 启用 CSI 卷快照，并自动将 control plane 工作负载上传到云存储中。
8: 将持久性卷 (PV) 的 fs-backup 备份方法设置为默认。当您使用 Container Storage Interface (CSI) 卷快照和 fs-backup 方法的组合时，此设置很有用。

注意

如果要使用 CSI 卷快照，您必须在 PV 中添加 backup.velero.io/backup-volumes-excludes=<pv_name> 注解。

运行以下命令来应用 Backup CR：
```
oc apply -f backup-control-plane.yaml
```
```
$ oc apply -f backup-control-plane.yaml
```
Copy to Clipboard Toggle word wrap

验证

运行以下命令，验证 status.phase 的值是否为 Completed ：

oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

$ oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

Copy to Clipboard

Toggle word wrap

后续步骤

使用 OADP 恢复托管集群。

9.7.6. 使用 OADP 恢复托管集群
复制链接

您可以将托管集群恢复到同一管理集群或新的管理集群。

9.7.6.1. 使用 OADP 将托管集群恢复到同一管理集群
复制链接

您可以通过创建 Restore 自定义资源(CR)来恢复托管集群。

如果您使用 原位（in-place）升级，则 InfraEnv 不需要备用节点。您需要从新的管理集群重新置备 worker 节点。
如果使用 替换（replace） 更新，则需要一些备用节点才能部署 worker 节点。

重要

备份托管集群后，您必须销毁它来启动恢复过程。要启动节点置备，您必须在删除托管集群前备份数据平面中的工作负载。

先决条件

已完成了使用控制台删除集群中的步骤删除您的托管集群。
完成了删除集群后删除剩余的资源中的步骤。

要监控并观察备份过程，请参阅"保留备份和恢复进程"。

流程

运行以下命令，验证托管的 control plane 命名空间中没有 pod 和持久性卷声明 (PVC)：
```
oc get pod pvc -n <hosted_control_plane_namespace>
```
```
$ oc get pod pvc -n <hosted_control_plane_namespace>
```
Copy to Clipboard Toggle word wrap
预期输出
```
No resources found
```
```
No resources found
```
Copy to Clipboard Toggle word wrap

创建定义 Restore CR 的 YAML 文件：

restore-hosted-cluster.yaml 文件示例

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name> 
  namespace: openshift-adp
spec:
  backupName: <backup_resource_name> 
  restorePVs: true 
  existingResourcePolicy: update 
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name>


  namespace: openshift-adp
spec:
  backupName: <backup_resource_name>


  restorePVs: true


  existingResourcePolicy: update


  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io

Copy to Clipboard

Toggle word wrap

1: 将 <restore_resource_name> 替换为 Restore 资源的名称。
2: 将 <backup_resource_name> 替换为 Backup 资源的名称。
3: 启动持久性卷 (PV) 及其 pod 的恢复。
4: 确保现有对象被备份的内容覆盖。

重要

您必须在单独的命名空间中创建 infraenv 资源。不要在恢复过程中删除 infraenv 资源。要重新置备新节点，infraenv 资源是必须的。

运行以下命令来应用 Restore CR：
```
oc apply -f restore-hosted-cluster.yaml
```
```
$ oc apply -f restore-hosted-cluster.yaml
```
Copy to Clipboard Toggle word wrap

运行以下命令，验证 status.phase 的值是否为 Completed ：

oc get hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> \
  -o jsonpath='{.status.phase}'

$ oc get hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> \
  -o jsonpath='{.status.phase}'

Copy to Clipboard

Toggle word wrap

恢复过程完成后，启动您在备份 control plane 工作负载过程中暂停的 HostedCluster 和 NodePool 资源的协调：

运行以下命令启动 HostedCluster 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令，启动 NodePool 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

Copy to Clipboard

Toggle word wrap

启动您在备份 control plane 工作负载过程中暂停的 Agent 供应商资源的协调：

运行以下命令启动 AgentCluster 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

运行以下命令，启动 AgentMachine 资源的协调：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

运行以下命令，删除 HostedCluster 资源中的 hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- 注解，以避免手动删除托管的 control plane 命名空间：

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- \
  --overwrite=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- \
  --overwrite=true --all

Copy to Clipboard

Toggle word wrap

9.7.6.2. 使用 OADP 将托管集群恢复到新的管理集群
复制链接

您可以通过创建 Restore 自定义资源(CR)将托管集群恢复到新的管理集群。

如果您使用原位升级，则 InfraEnv 资源不需要备用节点。相反，您需要从新的管理集群中重新置备 worker 节点。
如果使用替换更新，则需要一些备用节点才能让 InfraEnv 资源部署 worker 节点。

先决条件

您已将新的管理集群配置为使用 OpenShift API 进行数据保护(OADP)。新的管理集群必须具有与您备份的管理集群相同的数据保护应用程序 (DPA)，以便 Restore CR 可以访问备份存储。
您已配置了新管理集群的网络设置，以解决托管集群的 DNS。
- 主机的 DNS 必须解析为新管理集群和托管集群的 IP。
- 托管的集群必须解析到新管理集群的 IP。

要监控并观察备份过程，请参阅"保留备份和恢复进程"。

重要

在您要从中恢复托管集群的新管理集群中完成以下步骤，而不是从中创建备份的管理集群中。

流程

创建定义 Restore CR 的 YAML 文件：

restore-hosted-cluster.yaml 文件示例

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name> 
  namespace: openshift-adp
spec:
  includedNamespaces: 
  - <hosted_cluster_namespace> 
  - <hosted_control_plane_namespace> 
  - <agent_namespace> 
  backupName: <backup_resource_name> 
  cleanupBeforeRestore: CleanupRestored
  veleroManagedClustersBackupName: <managed_cluster_name> 
  veleroCredentialsBackupName: <credentials_backup_name>
  veleroResourcesBackupName: <resources_backup_name>
  restorePVs: true 
  preserveNodePorts: true
  existingResourcePolicy: update 
  excludedResources:
  - pod
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  - pv
  - pvc

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name>


  namespace: openshift-adp
spec:
  includedNamespaces:


  - <hosted_cluster_namespace>


  - <hosted_control_plane_namespace>


  - <agent_namespace>


  backupName: <backup_resource_name>


  cleanupBeforeRestore: CleanupRestored
  veleroManagedClustersBackupName: <managed_cluster_name>


  veleroCredentialsBackupName: <credentials_backup_name>
  veleroResourcesBackupName: <resources_backup_name>
  restorePVs: true


  preserveNodePorts: true
  existingResourcePolicy: update


  excludedResources:
  - pod
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  - pv
  - pvc

Copy to Clipboard

Toggle word wrap

1: 将 <restore_resource_name> 替换为 Restore 资源的名称。
2: 选择特定命名空间来备份对象。您必须包含托管集群命名空间和托管的 control plane 命名空间。
3: 将 <hosted_cluster_namespace> 替换为托管集群命名空间的名称，如 clusters。
4: 将 <hosted_control_plane_namespace> 替换为托管的 control plane 命名空间的名称，如 cluster-hosted。
5: 将 <agent_namespace> 替换为 Agent、BMH 和 InfraEnv CR 的命名空间，例如 agents。
6: 将 <backup_resource_name> 替换为 Backup 资源的名称。
7: 如果没有使用 Red Hat Advanced Cluster Management，可以省略此字段。
8: 启动持久性卷 (PV) 及其 pod 的恢复。
9: 确保现有对象被备份的内容覆盖。

运行以下命令来应用 Restore CR：

oc --kubeconfig <restore_management_kubeconfig> apply -f restore-hosted-cluster.yaml

$ oc --kubeconfig <restore_management_kubeconfig> apply -f restore-hosted-cluster.yaml

Copy to Clipboard

Toggle word wrap

运行以下命令，验证 status.phase 的值是否为 Completed ：

oc --kubeconfig <restore_management_kubeconfig> \
  get restore.velero.io <restore_resource_name> \
  -n openshift-adp -o jsonpath='{.status.phase}'

$ oc --kubeconfig <restore_management_kubeconfig> \
  get restore.velero.io <restore_resource_name> \
  -n openshift-adp -o jsonpath='{.status.phase}'

Copy to Clipboard

Toggle word wrap

运行以下命令验证所有 CR 是否已恢复：

oc --kubeconfig <restore_management_kubeconfig> get infraenv -n <agent_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> get infraenv -n <agent_namespace>

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <restore_management_kubeconfig> get agent -n <agent_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> get agent -n <agent_namespace>

Copy to Clipboard

Toggle word wrap

 oc --kubeconfig <restore_management_kubeconfig> get bmh -n <agent_namespace>

$  oc --kubeconfig <restore_management_kubeconfig> get bmh -n <agent_namespace>

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <restore_management_kubeconfig> get hostedcluster -n <hosted_cluster_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> get hostedcluster -n <hosted_cluster_namespace>

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <restore_management_kubeconfig> get nodepool -n <hosted_cluster_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> get nodepool -n <hosted_cluster_namespace>

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <restore_management_kubeconfig> get agentmachine -n <hosted_controlplane_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> get agentmachine -n <hosted_controlplane_namespace>

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <restore_management_kubeconfig> get agentcluster -n <hosted_controlplane_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> get agentcluster -n <hosted_controlplane_namespace>

Copy to Clipboard

Toggle word wrap

如果您计划将新的管理集群用作主管理集群，请完成以下步骤。否则，如果您计划使用从备份到主管理集群的管理集群，请完成"使用 OADP"将托管集群恢复到同一管理集群中的第 5 - 8 步。

运行以下命令，从您备份的管理集群中删除 Cluster API 部署：
```
oc --kubeconfig <backup_management_kubeconfig> delete deploy cluster-api \
  -n <hosted_control_plane_namespace>
```
```
$ oc --kubeconfig <backup_management_kubeconfig> delete deploy cluster-api \
  -n <hosted_control_plane_namespace>
```
Copy to Clipboard Toggle word wrap
因为每次只有一个 Cluster API 可以访问集群，所以此步骤可确保新管理集群的 Cluster API 可以正常工作。

恢复过程完成后，启动您在备份 control plane 工作负载过程中暂停的 HostedCluster 和 NodePool 资源的协调：

运行以下命令启动 HostedCluster 资源的协调：

oc --kubeconfig <restore_management_kubeconfig> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "replace", "path": "/spec/pausedUntil", "value": "false"}]'

$ oc --kubeconfig <restore_management_kubeconfig> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "replace", "path": "/spec/pausedUntil", "value": "false"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令，启动 NodePool 资源的协调：

oc --kubeconfig <restore_management_kubeconfig> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "replace", "path": "/spec/pausedUntil", "value": "false"}]'

$ oc --kubeconfig <restore_management_kubeconfig> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "replace", "path": "/spec/pausedUntil", "value": "false"}]'

Copy to Clipboard

Toggle word wrap

运行以下命令，验证托管集群是否报告托管的 control plane 是否可用：
```
oc --kubeconfig <restore_management_kubeconfig> get hostedcluster
```
```
$ oc --kubeconfig <restore_management_kubeconfig> get hostedcluster
```
Copy to Clipboard Toggle word wrap
运行以下命令，验证托管集群是否报告集群 Operator 是否可用：
```
oc get co --kubeconfig <hosted_cluster_kubeconfig>
```
```
$ oc get co --kubeconfig <hosted_cluster_kubeconfig>
```
Copy to Clipboard Toggle word wrap

启动您在备份 control plane 工作负载过程中暂停的 Agent 供应商资源的协调：

运行以下命令启动 AgentCluster 资源的协调：

oc --kubeconfig <restore_management_kubeconfig> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <restore_management_kubeconfig> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

运行以下命令，启动 AgentMachine 资源的协调：

oc --kubeconfig <restore_management_kubeconfig> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <restore_management_kubeconfig> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

运行以下命令启动集群资源的协调：

oc --kubeconfig <restore_management_kubeconfig> \
  annotate cluster -n <hosted_control_plane_namespace> \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <restore_management_kubeconfig> \
  annotate cluster -n <hosted_control_plane_namespace> \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

运行以下命令验证节点池是否正常工作：

oc --kubeconfig <restore_management_kubeconfig> \
  get nodepool -n <hosted_cluster_namespace>

$ oc --kubeconfig <restore_management_kubeconfig> \
  get nodepool -n <hosted_cluster_namespace>

Copy to Clipboard

Toggle word wrap

输出示例

NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
hosted-0   hosted-0   3               3               False         False        4.17.11   False             False

NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
hosted-0   hosted-0   3               3               False         False        4.17.11   False             False

Copy to Clipboard

Toggle word wrap

可选：要确保不存在冲突，且新管理集群继续功能，请完成以下步骤从备份管理集群中删除 HostedCluster 资源：

在您备份的管理集群中，在 ClusterDeployment 资源中将 spec.preserveOnDelete 参数设置为 true ：

oc --kubeconfig <backup_management_kubeconfig> patch \
  -n <hosted_control_plane_namespace> \
  ClusterDeployment/<hosted_cluster_name> -p \
  '{"spec":{"preserveOnDelete":'true'}}' \
  --type=merge

$ oc --kubeconfig <backup_management_kubeconfig> patch \
  -n <hosted_control_plane_namespace> \
  ClusterDeployment/<hosted_cluster_name> -p \
  '{"spec":{"preserveOnDelete":'true'}}' \
  --type=merge

Copy to Clipboard

Toggle word wrap

此步骤可确保主机没有被取消置备。

运行以下命令来删除机器：

oc --kubeconfig <backup_management_kubeconfig> patch \
  <machine_name> -n <hosted_control_plane_namespace> -p \
  '[{"op":"remove","path":"/metadata/finalizers"}]' \
  --type=merge

$ oc --kubeconfig <backup_management_kubeconfig> patch \
  <machine_name> -n <hosted_control_plane_namespace> -p \
  '[{"op":"remove","path":"/metadata/finalizers"}]' \
  --type=merge

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <backup_management_kubeconfig> \
  delete machine <machine_name> \
  -n <hosted_control_plane_namespace>

$ oc --kubeconfig <backup_management_kubeconfig> \
  delete machine <machine_name> \
  -n <hosted_control_plane_namespace>

Copy to Clipboard

Toggle word wrap

运行以下命令来删除 AgentCluster 和 Cluster 资源：

oc --kubeconfig <backup_management_kubeconfig> \
  delete agentcluster <hosted_cluster_name> \
  -n <hosted_control_plane_namespace>

$ oc --kubeconfig <backup_management_kubeconfig> \
  delete agentcluster <hosted_cluster_name> \
  -n <hosted_control_plane_namespace>

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <backup_management_kubeconfig> \
  patch cluster <cluster_name> \
  -n <hosted_control_plane_namespace> \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]' \
  --type=json

$ oc --kubeconfig <backup_management_kubeconfig> \
  patch cluster <cluster_name> \
  -n <hosted_control_plane_namespace> \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]' \
  --type=json

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <backup_management_kubeconfig> \
  delete cluster <cluster_name> \
  -n <hosted_control_plane_namespace>

$ oc --kubeconfig <backup_management_kubeconfig> \
  delete cluster <cluster_name> \
  -n <hosted_control_plane_namespace>

Copy to Clipboard

Toggle word wrap

如果使用 Red Hat Advanced Cluster Management，请运行以下命令删除受管集群：

oc --kubeconfig <backup_management_kubeconfig> \
  patch managedcluster <hosted_cluster_name> \
  -n <hosted_cluster_namespace> \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]' \
  --type=json

$ oc --kubeconfig <backup_management_kubeconfig> \
  patch managedcluster <hosted_cluster_name> \
  -n <hosted_cluster_namespace> \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]' \
  --type=json

Copy to Clipboard

Toggle word wrap

oc --kubeconfig <backup_management_kubeconfig> \
  delete managedcluster <hosted_cluster_name> \
  -n <hosted_cluster_namespace>

$ oc --kubeconfig <backup_management_kubeconfig> \
  delete managedcluster <hosted_cluster_name> \
  -n <hosted_cluster_namespace>

Copy to Clipboard

Toggle word wrap

运行以下命令来删除 HostedCluster 资源：

oc --kubeconfig <backup_management_kubeconfig> \
  delete hostedcluster \
  -n <hosted_cluster_namespace> <hosted_cluster_name>

$ oc --kubeconfig <backup_management_kubeconfig> \
  delete hostedcluster \
  -n <hosted_cluster_namespace> <hosted_cluster_name>

Copy to Clipboard

Toggle word wrap

9.7.7. 观察备份和恢复过程
复制链接

当使用 OpenShift API for Data Protection (OADP) 来备份和恢复托管集群时，您可以监控并观察进程。

流程

运行以下命令观察备份过程：

watch "oc get backups.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

$ watch "oc get backups.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

Copy to Clipboard

Toggle word wrap

运行以下命令观察恢复过程：

watch "oc get restores.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

$ watch "oc get restores.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

Copy to Clipboard

Toggle word wrap

运行以下命令观察 Velero 日志：
```
oc logs -n openshift-adp -ldeploy=velero -f
```
```
$ oc logs -n openshift-adp -ldeploy=velero -f
```
Copy to Clipboard Toggle word wrap

运行以下命令，观察所有 OADP 对象的进度：

watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"

$ watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"

Copy to Clipboard

Toggle word wrap

9.7.8. 使用 velero CLI 描述备份和恢复资源
复制链接

当使用 OpenShift API 进行数据保护时，您可以使用 velero 命令行界面 (CLI) 获取 Backup 和 Restore 资源的更多详情。

流程

运行以下命令，创建一个别名，以便从容器中使用 velero CLI：

alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

Copy to Clipboard

Toggle word wrap

运行以下命令，获取 Restore 自定义资源 (CR) 的详情：
```
velero restore describe <restore_resource_name> --details
```
```
$ velero restore describe <restore_resource_name> --details 
```
1
Copy to Clipboard Toggle word wrap
1
将 <restore_resource_name> 替换为 Restore 资源的名称。
运行以下命令，获取 Backup CR 的详情：
```
velero restore describe <backup_resource_name> --details
```
```
$ velero restore describe <backup_resource_name> --details 
```
1
Copy to Clipboard Toggle word wrap
1
将 <backup_resource_name> 替换为 Backup 资源的名称。

返回顶部

9.7. 使用 OADP 的托管集群的灾难恢复

9.7.1. 先决条件
复制链接

9.7.2. 准备 AWS 以使用 OADP
复制链接

9.7.3. 准备裸机以使用 OADP
复制链接

9.7.4. 备份数据平面工作负载
复制链接

9.7.5. 备份 control plane 工作负载
复制链接

9.7.5.1. 在 AWS 上备份 control plane 工作负载
复制链接

9.7.5.2. 在裸机平台上备份 control plane 工作负载
复制链接

9.7.6. 使用 OADP 恢复托管集群
复制链接

9.7.6.1. 使用 OADP 将托管集群恢复到同一管理集群
复制链接

9.7.6.2. 使用 OADP 将托管集群恢复到新的管理集群
复制链接

9.7.7. 观察备份和恢复过程
复制链接

9.7.8. 使用 velero CLI 描述备份和恢复资源
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

9.7. 使用 OADP 的托管集群的灾难恢复

9.7.1. 先决条件复制链接链接已复制到粘贴板!

9.7.2. 准备 AWS 以使用 OADP复制链接链接已复制到粘贴板!

9.7.3. 准备裸机以使用 OADP复制链接链接已复制到粘贴板!

9.7.4. 备份数据平面工作负载复制链接链接已复制到粘贴板!

9.7.5. 备份 control plane 工作负载复制链接链接已复制到粘贴板!

9.7.5.1. 在 AWS 上备份 control plane 工作负载复制链接链接已复制到粘贴板!

9.7.5.2. 在裸机平台上备份 control plane 工作负载复制链接链接已复制到粘贴板!

9.7.6. 使用 OADP 恢复托管集群复制链接链接已复制到粘贴板!

9.7.6.1. 使用 OADP 将托管集群恢复到同一管理集群复制链接链接已复制到粘贴板!

9.7.6.2. 使用 OADP 将托管集群恢复到新的管理集群复制链接链接已复制到粘贴板!

9.7.7. 观察备份和恢复过程复制链接链接已复制到粘贴板!

9.7.8. 使用 velero CLI 描述备份和恢复资源复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

9.7.1. 先决条件
复制链接

9.7.2. 准备 AWS 以使用 OADP
复制链接

9.7.3. 准备裸机以使用 OADP
复制链接

9.7.4. 备份数据平面工作负载
复制链接

9.7.5. 备份 control plane 工作负载
复制链接

9.7.5.1. 在 AWS 上备份 control plane 工作负载
复制链接

9.7.5.2. 在裸机平台上备份 control plane 工作负载
复制链接

9.7.6. 使用 OADP 恢复托管集群
复制链接

9.7.6.1. 使用 OADP 将托管集群恢复到同一管理集群
复制链接

9.7.6.2. 使用 OADP 将托管集群恢复到新的管理集群
复制链接

9.7.7. 观察备份和恢复过程
复制链接

9.7.8. 使用 velero CLI 描述备份和恢复资源
复制链接