主页
产品
OpenShift Container Platform
4.16
边缘计算
15.3. 使用 Lifecycle Agent 为单节点 OpenShift 集群执行基于镜像的升级

15.3. 使用 Lifecycle Agent 为单节点 OpenShift 集群执行基于镜像的升级

您可以使用 Lifecycle Agent 手动升级单节点 OpenShift 集群。

当您在集群中部署生命周期代理时，会自动创建 ImageBasedUpgrade CR。您可以更新此 CR，以指定 seed 镜像的镜像存储库，并通过不同的阶段移动。

15.3.1. 使用生命周期代理移动到基于镜像的升级的 Prep 阶段
复制链接

当您在集群中部署生命周期代理时，会自动创建 ImageBasedUpgrade 自定义资源 (CR)。

在创建了升级过程中需要的所有资源后，您可以进入 Prep 阶段。如需更多信息，请参阅"使用 Lifecycle Agent 为基于镜像升级创建 ConfigMap 对象"部分。

注意

在断开连接的环境中，如果 seed 集群的发行镜像 registry 与目标集群的发行版本镜像 registry 不同，您必须创建一个 ImageDigestMirrorSet (IDMS)资源来配置替代镜像的存储库位置。如需更多信息，请参阅"配置镜像 registry 存储库镜像"。

您可以运行以下命令来检索 seed 镜像中使用的发行版本 registry：

skopeo inspect docker://<imagename> | jq -r '.Labels."com.openshift.lifecycle-agent.seed_cluster_info" | fromjson | .release_registry'

$ skopeo inspect docker://<imagename> | jq -r '.Labels."com.openshift.lifecycle-agent.seed_cluster_info" | fromjson | .release_registry'

Copy to Clipboard

Toggle word wrap

先决条件

您已创建了资源来备份和恢复集群。

流程

检查您是否修补了 ImageBasedUpgrade CR:

apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
  name: upgrade
spec:
  stage: Idle
  seedImageRef:
    version: 4.15.2 
    image: <seed_container_image> 
    pullSecretRef: <seed_pull_secret> 
  autoRollbackOnFailure: {}
#    initMonitorTimeoutSeconds: 1800 
  extraManifests: 
  - name: example-extra-manifests-cm
    namespace: openshift-lifecycle-agent
  - name: example-catalogsources-cm
    namespace: openshift-lifecycle-agent
  oadpContent: 
  - name: oadp-cm-example
    namespace: openshift-adp

apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
  name: upgrade
spec:
  stage: Idle
  seedImageRef:
    version: 4.15.2


    image: <seed_container_image>


    pullSecretRef: <seed_pull_secret>


  autoRollbackOnFailure: {}
#    initMonitorTimeoutSeconds: 1800


  extraManifests:


  - name: example-extra-manifests-cm
    namespace: openshift-lifecycle-agent
  - name: example-catalogsources-cm
    namespace: openshift-lifecycle-agent
  oadpContent:


  - name: oadp-cm-example
    namespace: openshift-adp

Copy to Clipboard

Toggle word wrap

1: 指定目标平台版本。该值必须与 seed 镜像的版本匹配。
2: 指定目标集群可从中拉取 seed 镜像的存储库。
3: 如果镜像位于私有 registry 中，则指定对带有凭证的 secret 的引用。
4: （可选）指定在第一次重启后在那个时间段内回滚的时间帧（以秒为单位）。如果没有定义或设置为 0，则使用默认值 1800 秒 (30 分钟)。
5: （可选）指定要在升级后保留的自定义目录源的 ConfigMap 资源列表，以及要应用到不是 seed 镜像一部分的目标集群的额外清单。
6: 使用 OADP ConfigMap 信息添加 oadpContent 部分。

要启动 Prep 阶段，请运行以下命令将 stage 字段的值改为 ImageBasedUpgrade CR 中的 Prep ：
```
oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Prep"}}' --type=merge -n openshift-lifecycle-agent
```
```
$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Prep"}}' --type=merge -n openshift-lifecycle-agent
```
Copy to Clipboard Toggle word wrap
如果您为 OADP 资源和额外清单提供 ConfigMap 对象，则 Lifecycle Agent 会在 Prep 阶段验证指定的 ConfigMap 对象。您可能会遇到以下问题：
- 如果 Lifecycle Agent 检测到 extraManifests 参数的问题，验证警告或错误。
- 如果 Lifecycle Agent 检测到 oadpContent 参数的任何问题，验证错误。
验证警告不会阻止 Upgrade 阶段，但您必须确定是否可以安全地进行升级。这些警告（如缺少 CRD、命名空间或空运行失败）更新 ImageBasedUpgrade CR 中的 Prep 阶段和 annotation 字段的 status.conditions，其中包含有关警告的详情。
验证警告示例
```
[...]
metadata:
annotations:
  extra-manifest.lca.openshift.io/validation-warning: '...'
[...]
```
```
[...]
metadata:
annotations:
  extra-manifest.lca.openshift.io/validation-warning: '...'
[...]
```
Copy to Clipboard Toggle word wrap
但是，验证错误，如将 MachineConfig 或 Operator 清单添加到额外清单中，从而导致 Prep 阶段失败并阻止 Upgrade 阶段。
当验证通过时，集群会创建一个新的 ostree stateroot，它涉及拉取和解包 seed 镜像，并运行主机级命令。最后，目标集群中会包括所有必需的镜像。

验证

运行以下命令，检查 ImageBasedUpgrade CR 的状态：

oc get ibu -o yaml

$ oc get ibu -o yaml

Copy to Clipboard

Toggle word wrap

输出示例

  conditions:
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: In progress
    observedGeneration: 13
    reason: InProgress
    status: "False"
    type: Idle
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed
    observedGeneration: 13
    reason: Completed
    status: "False"
    type: PrepInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep stage completed successfully
    observedGeneration: 13
    reason: Completed
    status: "True"
    type: PrepCompleted
  observedGeneration: 13
  validNextStages:
  - Idle
  - Upgrade

  conditions:
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: In progress
    observedGeneration: 13
    reason: InProgress
    status: "False"
    type: Idle
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed
    observedGeneration: 13
    reason: Completed
    status: "False"
    type: PrepInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep stage completed successfully
    observedGeneration: 13
    reason: Completed
    status: "True"
    type: PrepCompleted
  observedGeneration: 13
  validNextStages:
  - Idle
  - Upgrade

Copy to Clipboard

Toggle word wrap

15.3.2. 使用生命周期代理移动到基于镜像的升级升级阶段
复制链接

生成 seed 镜像并完成 Prep 阶段后，您可以升级目标集群。在升级过程中，OADP Operator 会创建 OADP 自定义资源 (CR) 中指定的工件备份，然后升级集群。

如果升级失败或停止，则会启动一个自动回滚。如果在升级后有问题，您可以启动手动回滚。有关手动回滚的更多信息，请参阅"使用生命周期代理进行基于镜像的 Rollback 阶段"。

先决条件

完成 Prep 阶段。

流程

要移到 Upgrade 阶段，请运行以下命令将 ImageBasedUpgrade CR 中的 Upgrade 的 stage 字段的值改为 Upgrade：

oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Upgrade"}}' --type=merge

$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Upgrade"}}' --type=merge

Copy to Clipboard

Toggle word wrap

运行以下命令，检查 ImageBasedUpgrade CR 的状态：

oc get ibu -o yaml

$ oc get ibu -o yaml

Copy to Clipboard

Toggle word wrap

输出示例

status:
  conditions:
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: In progress
    observedGeneration: 5
    reason: InProgress
    status: "False"
    type: Idle
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed
    observedGeneration: 5
    reason: Completed
    status: "False"
    type: PrepInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed successfully
    observedGeneration: 5
    reason: Completed
    status: "True"
    type: PrepCompleted
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: |-
      Waiting for system to stabilize: one or more health checks failed
        - one or more ClusterOperators not yet ready: authentication
        - one or more MachineConfigPools not yet ready: master
        - one or more ClusterServiceVersions not yet ready: sriov-fec.v2.8.0
    observedGeneration: 1
    reason: InProgress
    status: "True"
    type: UpgradeInProgress
  observedGeneration: 1
  rollbackAvailabilityExpiration: "2024-05-19T14:01:52Z"
  validNextStages:
  - Rollback

status:
  conditions:
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: In progress
    observedGeneration: 5
    reason: InProgress
    status: "False"
    type: Idle
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed
    observedGeneration: 5
    reason: Completed
    status: "False"
    type: PrepInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed successfully
    observedGeneration: 5
    reason: Completed
    status: "True"
    type: PrepCompleted
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: |-
      Waiting for system to stabilize: one or more health checks failed
        - one or more ClusterOperators not yet ready: authentication
        - one or more MachineConfigPools not yet ready: master
        - one or more ClusterServiceVersions not yet ready: sriov-fec.v2.8.0
    observedGeneration: 1
    reason: InProgress
    status: "True"
    type: UpgradeInProgress
  observedGeneration: 1
  rollbackAvailabilityExpiration: "2024-05-19T14:01:52Z"
  validNextStages:
  - Rollback

Copy to Clipboard

Toggle word wrap

OADP Operator 创建 OADP Backup 和 Restore CR 中指定的数据的备份，目标集群重启。

运行以下命令监控 CR 的状态：
```
oc get ibu -o yaml
```
```
$ oc get ibu -o yaml
```
Copy to Clipboard Toggle word wrap
如果您对升级满意，请运行以下命令在 ImageBasedUpgrade CR 中将 stage 字段的值修补为 Idle 来完成更改：
```
oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge
```
```
$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge
```
Copy to Clipboard Toggle word wrap
重要
在升级后，一旦进入 Idle 阶段将无法回滚进行的改变。
Lifecycle Agent 会删除升级过程中创建的所有资源。
您可以在成功升级后删除 OADP Operator 及其配置文件。如需更多信息，请参阅"从集群中删除 Operator"。

验证

运行以下命令，检查 ImageBasedUpgrade CR 的状态：

oc get ibu -o yaml

$ oc get ibu -o yaml

Copy to Clipboard

Toggle word wrap

输出示例

status:
  conditions:
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: In progress
    observedGeneration: 5
    reason: InProgress
    status: "False"
    type: Idle
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed
    observedGeneration: 5
    reason: Completed
    status: "False"
    type: PrepInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed successfully
    observedGeneration: 5
    reason: Completed
    status: "True"
    type: PrepCompleted
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Upgrade completed
    observedGeneration: 1
    reason: Completed
    status: "False"
    type: UpgradeInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Upgrade completed
    observedGeneration: 1
    reason: Completed
    status: "True"
    type: UpgradeCompleted
  observedGeneration: 1
  rollbackAvailabilityExpiration: "2024-01-01T09:00:00Z"
  validNextStages:
  - Idle
  - Rollback

status:
  conditions:
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: In progress
    observedGeneration: 5
    reason: InProgress
    status: "False"
    type: Idle
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed
    observedGeneration: 5
    reason: Completed
    status: "False"
    type: PrepInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Prep completed successfully
    observedGeneration: 5
    reason: Completed
    status: "True"
    type: PrepCompleted
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Upgrade completed
    observedGeneration: 1
    reason: Completed
    status: "False"
    type: UpgradeInProgress
  - lastTransitionTime: "2024-01-01T09:00:00Z"
    message: Upgrade completed
    observedGeneration: 1
    reason: Completed
    status: "True"
    type: UpgradeCompleted
  observedGeneration: 1
  rollbackAvailabilityExpiration: "2024-01-01T09:00:00Z"
  validNextStages:
  - Idle
  - Rollback

Copy to Clipboard

Toggle word wrap

运行以下命令检查集群恢复的状态：

oc get restores -n openshift-adp -o custom-columns=NAME:.metadata.name,Status:.status.phase,Reason:.status.failureReason

$ oc get restores -n openshift-adp -o custom-columns=NAME:.metadata.name,Status:.status.phase,Reason:.status.failureReason

Copy to Clipboard

Toggle word wrap

输出示例

NAME             Status      Reason
acm-klusterlet   Completed   <none> 
apache-app       Completed   <none>
localvolume      Completed   <none>

NAME             Status      Reason
acm-klusterlet   Completed   <none>


apache-app       Completed   <none>
localvolume      Completed   <none>

Copy to Clipboard

Toggle word wrap

1: acm-klusterlet 只特定于 RHACM 环境。

15.3.3. 使用 Lifecycle Agent 移到基于镜像的升级的 Rollback 阶段
复制链接

如果升级在重新引导后在 initMonitorTimeoutSeconds 字段中指定的时间范围内没有完成，则会启动一个自动回滚。

ImageBasedUpgrade CR 示例

apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
  name: upgrade
spec:
  stage: Idle
  seedImageRef:
    version: 4.15.2
    image: <seed_container_image>
  autoRollbackOnFailure: {}
#    initMonitorTimeoutSeconds: 1800 
[...]

apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
  name: upgrade
spec:
  stage: Idle
  seedImageRef:
    version: 4.15.2
    image: <seed_container_image>
  autoRollbackOnFailure: {}
#    initMonitorTimeoutSeconds: 1800


[...]

Copy to Clipboard

Toggle word wrap

1: （可选）指定在第一次重启后在那个时间段内回滚的时间帧（以秒为单位）。如果没有定义或设置为 0，则使用默认值 1800 秒 (30 分钟)。

如果您在升级后遇到无法解析的问题，可以手动回滚更改。

先决条件

以具有 cluster-admin 权限的用户身份登录 hub 集群。
确保原始 stateroot 上的 control plane 证书有效。如果证书已过期，请参阅"恢复过期的 control plane 证书"。

流程

要进入回滚阶段，运行以下命令将 ImageBasedUpgrade CR 中的 stage 字段的值设置为 Rollback：
```
oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Rollback"}}' --type=merge
```
```
$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Rollback"}}' --type=merge
```
Copy to Clipboard Toggle word wrap
Lifecycle Agent 使用之前安装的 OpenShift Container Platform 版本重启集群，并恢复应用程序。
如果您对更改满意，请运行以下命令在 ImageBasedUpgrade CR 中将 stage 字段的值修补为 Idle 来完成回滚：
```
oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge -n openshift-lifecycle-agent
```
```
$ oc patch imagebasedupgrades.lca.openshift.io upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge -n openshift-lifecycle-agent
```
Copy to Clipboard Toggle word wrap
警告
如果在回滚后移至 Idle 阶段，则 Lifecycle Agent 会清理可用于对失败的升级进行故障排除的资源。

15.3.4. 使用生命周期代理对基于镜像的升级进行故障排除
复制链接

对受问题影响的受管集群执行故障排除步骤。

重要

如果您使用 ImageBasedGroupUpgrade CR 升级集群，请确保在受管集群中执行故障排除或恢复步骤后，正确更新了 lcm.openshift.io/ibgu-<stage>-completed 或 'lcm.openshift.io/ibgu-<stage>-failed 集群标签。这样可确保 TALM 继续为集群管理基于镜像的升级。

15.3.4.1. 收集日志
复制链接

您可以使用 oc adm must-gather CLI 来收集用于调试和故障排除的信息。

流程

运行以下命令收集有关 Operator 的数据：

 oc adm must-gather \
  --dest-dir=must-gather/tmp \
  --image=$(oc -n openshift-lifecycle-agent get deployment.apps/lifecycle-agent-controller-manager -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \
  --image=quay.io/konveyor/oadp-must-gather:latest \
  --image=quay.io/openshift/origin-must-gather:latest

$  oc adm must-gather \
  --dest-dir=must-gather/tmp \
  --image=$(oc -n openshift-lifecycle-agent get deployment.apps/lifecycle-agent-controller-manager -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \
  --image=quay.io/konveyor/oadp-must-gather:latest \


  --image=quay.io/openshift/origin-must-gather:latest

Copy to Clipboard

Toggle word wrap

1: （可选）如果您需要从 OADP Operator 收集更多信息，您可以添加这个选项。
2: （可选）如果您需要从 SR-IOV Operator 收集更多信息，您可以添加这个选项。

15.3.4.2. AbortFailed 或 FinalizeFailed 错误
复制链接

问题

在完成阶段或停止 Prep 阶段时，生命周期代理会清理以下资源：

不再需要的 Stateroot
预缓存资源
OADP CR
ImageBasedUpgrade CR

如果 Lifecycle Agent 无法执行上述步骤，它将过渡到 AbortFailed 或 FinalizeFailed 状态。条件消息和日志显示哪些步骤失败。

错误信息示例

message: failed to delete all the backup CRs. Perform cleanup manually then add 'lca.openshift.io/manual-cleanup-done' annotation to ibu CR to transition back to Idle
      observedGeneration: 5
      reason: AbortFailed
      status: "False"
      type: Idle

message: failed to delete all the backup CRs. Perform cleanup manually then add 'lca.openshift.io/manual-cleanup-done' annotation to ibu CR to transition back to Idle
      observedGeneration: 5
      reason: AbortFailed
      status: "False"
      type: Idle

Copy to Clipboard

Toggle word wrap

解决方案

检查日志，以确定发生失败的原因。
要提示生命周期代理重试清理，请将 lca.openshift.io/manual-cleanup-done 注解添加到 ImageBasedUpgrade CR。
观察此注解后，生命周期代理会重试清理，如果成功，ImageBasedUpgrade 阶段会过渡到 Idle。
如果清理再次失败，您可以手动清理资源。

15.3.4.2.1. 手动清理 stateroot
复制链接

问题

在 Prep 阶段停止，生命周期代理会清理新的 stateroot。在成功升级或回滚后最终调整时，生命周期代理会清理旧的 stateroot。如果此步骤失败，建议您检查日志以确定失败的原因。

解决方案

运行以下命令，检查 stateroot 中是否有现有部署：
```
ostree admin status
```
```
$ ostree admin status
```
Copy to Clipboard Toggle word wrap
如果存在，请运行以下命令清理现有部署：
```
ostree admin undeploy <index_of_deployment>
```
```
$ ostree admin undeploy <index_of_deployment>
```
Copy to Clipboard Toggle word wrap

清理 stateroot 的所有部署后，运行以下命令来擦除 stateroot 目录：

警告

确保引导的部署不处于这个 stateroot。

stateroot="<stateroot_to_delete>"

$ stateroot="<stateroot_to_delete>"

Copy to Clipboard

Toggle word wrap

unshare -m /bin/sh -c "mount -o remount,rw /sysroot && rm -rf /sysroot/ostree/deploy/${stateroot}"

$ unshare -m /bin/sh -c "mount -o remount,rw /sysroot && rm -rf /sysroot/ostree/deploy/${stateroot}"

Copy to Clipboard

Toggle word wrap

15.3.4.2.2. 手动清理 OADP 资源
复制链接

问题: 由于生命周期代理和 S3 后端之间的连接问题，自动清理 OADP 资源可能会失败。通过恢复连接并添加 lca.openshift.io/manual-cleanup-done 注解，生命周期代理可以成功清理备份资源。
解决方案

运行以下命令检查后端连接：

oc get backupstoragelocations.velero.io -n openshift-adp

$ oc get backupstoragelocations.velero.io -n openshift-adp

Copy to Clipboard

Toggle word wrap

输出示例

NAME                          PHASE       LAST VALIDATED   AGE   DEFAULT
dataprotectionapplication-1   Available   33s              8d    true

NAME                          PHASE       LAST VALIDATED   AGE   DEFAULT
dataprotectionapplication-1   Available   33s              8d    true

Copy to Clipboard

Toggle word wrap

删除所有备份资源，然后将 lca.openshift.io/manual-cleanup-done 注解添加到 ImageBasedUpgrade CR。

15.3.4.3. LVM 存储卷内容没有恢复
复制链接

当使用 LVM 存储来提供动态持久性卷存储时，如果配置不正确，LVM 存储可能无法恢复持久性卷内容。

15.3.4.3.1. Backup CR 中缺少与 LVM Storage 相关的字段
复制链接

问题

您的 Backup CR 可能缺少恢复持久性卷所需的字段。您可以通过运行以下命令来检查应用程序 pod 中的事件，以确定是否出现这个问题：

oc describe pod <your_app_name>

$ oc describe pod <your_app_name>

Copy to Clipboard

Toggle word wrap

显示 Backup CR 中缺少 LVM Storage 相关字段的输出示例

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  58s (x2 over 66s)  default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled         56s                default-scheduler  Successfully assigned default/db-1234 to sno1.example.lab
  Warning  FailedMount       24s (x7 over 55s)  kubelet            MountVolume.SetUp failed for volume "pvc-1234" : rpc error: code = Unknown desc = VolumeID is not found

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  58s (x2 over 66s)  default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled         56s                default-scheduler  Successfully assigned default/db-1234 to sno1.example.lab
  Warning  FailedMount       24s (x7 over 55s)  kubelet            MountVolume.SetUp failed for volume "pvc-1234" : rpc error: code = Unknown desc = VolumeID is not found

Copy to Clipboard

Toggle word wrap

解决方案

您必须在应用程序 Backup CR 中包含 logicalvolumes.topolvm.io。如果没有此资源，应用程序可以正确地恢复其持久性卷声明和持久性卷清单，但与这个持久性卷关联的 logicalvolume 在 pivot 后无法正确恢复。

备份 CR 示例

apiVersion: velero.io/v1
kind: Backup
metadata:
  labels:
    velero.io/storage-location: default
  name: small-app
  namespace: openshift-adp
spec:
  includedNamespaces:
  - test
  includedNamespaceScopedResources:
  - secrets
  - persistentvolumeclaims
  - deployments
  - statefulsets
  includedClusterScopedResources: 
  - persistentVolumes
  - volumesnapshotcontents
  - logicalvolumes.topolvm.io

apiVersion: velero.io/v1
kind: Backup
metadata:
  labels:
    velero.io/storage-location: default
  name: small-app
  namespace: openshift-adp
spec:
  includedNamespaces:
  - test
  includedNamespaceScopedResources:
  - secrets
  - persistentvolumeclaims
  - deployments
  - statefulsets
  includedClusterScopedResources:


  - persistentVolumes
  - volumesnapshotcontents
  - logicalvolumes.topolvm.io

Copy to Clipboard

Toggle word wrap

1: 要恢复应用程序的持久性卷，您必须配置本节，如下所示。

15.3.4.3.2. Restore CR 中缺少与 LVM Storage 相关的字段
复制链接

问题

应用程序的预期资源会被恢复，但升级后持久性卷内容不会被保留。

在 pivot 前运行以下命令来列出应用程序的持久性卷：

oc get pv,pvc,logicalvolumes.topolvm.io -A

$ oc get pv,pvc,logicalvolumes.topolvm.io -A

Copy to Clipboard

Toggle word wrap

pivot 前的输出示例

NAME                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
persistentvolume/pvc-1234   1Gi        RWO            Retain           Bound    default/pvc-db   lvms-vg1                4h45m

NAMESPACE   NAME                           STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default     persistentvolumeclaim/pvc-db   Bound    pvc-1234   1Gi        RWO            lvms-vg1       4h45m

NAMESPACE   NAME                                AGE
            logicalvolume.topolvm.io/pvc-1234   4h45m

NAME                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
persistentvolume/pvc-1234   1Gi        RWO            Retain           Bound    default/pvc-db   lvms-vg1                4h45m

NAMESPACE   NAME                           STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default     persistentvolumeclaim/pvc-db   Bound    pvc-1234   1Gi        RWO            lvms-vg1       4h45m

NAMESPACE   NAME                                AGE
            logicalvolume.topolvm.io/pvc-1234   4h45m

Copy to Clipboard

Toggle word wrap

在 pivot 后运行以下命令来列出应用程序的持久性卷：

oc get pv,pvc,logicalvolumes.topolvm.io -A

$ oc get pv,pvc,logicalvolumes.topolvm.io -A

Copy to Clipboard

Toggle word wrap

pivot 后的输出示例

NAME                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
persistentvolume/pvc-1234   1Gi        RWO            Delete           Bound    default/pvc-db   lvms-vg1                19s

NAMESPACE   NAME                           STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default     persistentvolumeclaim/pvc-db   Bound    pvc-1234   1Gi        RWO            lvms-vg1       19s

NAMESPACE   NAME                                AGE
            logicalvolume.topolvm.io/pvc-1234   18s

NAME                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
persistentvolume/pvc-1234   1Gi        RWO            Delete           Bound    default/pvc-db   lvms-vg1                19s

NAMESPACE   NAME                           STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default     persistentvolumeclaim/pvc-db   Bound    pvc-1234   1Gi        RWO            lvms-vg1       19s

NAMESPACE   NAME                                AGE
            logicalvolume.topolvm.io/pvc-1234   18s

Copy to Clipboard

Toggle word wrap

解决方案

造成这个问题的原因是 logicalvolume 状态没有在 Restore CR 中保留。这个状态非常重要，因为 Velero 需要引用 pivoting 后必须保留的卷。您必须在应用程序 Restore CR 中包含以下字段：

Restore CR 示例

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: sample-vote-app
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
  annotations:
    lca.openshift.io/apply-wave: "3"
spec:
  backupName:
    sample-vote-app
  restorePVs: true 
  restoreStatus: 
    includedResources:
      - logicalvolumes

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: sample-vote-app
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
  annotations:
    lca.openshift.io/apply-wave: "3"
spec:
  backupName:
    sample-vote-app
  restorePVs: true


  restoreStatus:


    includedResources:
      - logicalvolumes

Copy to Clipboard

Toggle word wrap

1: 要保留应用程序的持久性卷，您必须将 restorePV 设置为 true。
2: 要为应用程序保留持久性卷，您必须配置本节，如下所示。

15.3.4.4. 调试失败的备份和恢复 CR
复制链接

问题

工件的备份或恢复失败。

解决方案

您可以调试 Backup 和 Restore CR，并使用 Velero CLI 工具检索日志。Velero CLI 工具比 OpenShift CLI 工具提供更详细的信息。

运行以下命令描述包含错误的 Backup CR：

oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe backup -n openshift-adp backup-acm-klusterlet --details

$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe backup -n openshift-adp backup-acm-klusterlet --details

Copy to Clipboard

Toggle word wrap

运行以下命令描述包含错误的 Restore CR：

oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe restore -n openshift-adp restore-acm-klusterlet --details

$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero describe restore -n openshift-adp restore-acm-klusterlet --details

Copy to Clipboard

Toggle word wrap

运行以下命令，将备份的资源下载到本地目录：

oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero backup download -n openshift-adp backup-acm-klusterlet -o ~/backup-acm-klusterlet.tar.gz

$ oc exec -n openshift-adp velero-7c87d58c7b-sw6fc -c velero -- ./velero backup download -n openshift-adp backup-acm-klusterlet -o ~/backup-acm-klusterlet.tar.gz

Copy to Clipboard

Toggle word wrap

返回顶部

15.3. 使用 Lifecycle Agent 为单节点 OpenShift 集群执行基于镜像的升级

15.3.1. 使用生命周期代理移动到基于镜像的升级的 Prep 阶段
复制链接

15.3.2. 使用生命周期代理移动到基于镜像的升级升级阶段
复制链接

15.3.3. 使用 Lifecycle Agent 移到基于镜像的升级的 Rollback 阶段
复制链接

15.3.4. 使用生命周期代理对基于镜像的升级进行故障排除
复制链接

15.3.4.1. 收集日志
复制链接

15.3.4.2. AbortFailed 或 FinalizeFailed 错误
复制链接

15.3.4.2.1. 手动清理 stateroot
复制链接

15.3.4.2.2. 手动清理 OADP 资源
复制链接

15.3.4.3. LVM 存储卷内容没有恢复
复制链接

15.3.4.3.1. Backup CR 中缺少与 LVM Storage 相关的字段
复制链接

15.3.4.3.2. Restore CR 中缺少与 LVM Storage 相关的字段
复制链接

15.3.4.4. 调试失败的备份和恢复 CR
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

15.3. 使用 Lifecycle Agent 为单节点 OpenShift 集群执行基于镜像的升级

15.3.1. 使用生命周期代理移动到基于镜像的升级的 Prep 阶段复制链接链接已复制到粘贴板!

15.3.2. 使用生命周期代理移动到基于镜像的升级升级阶段复制链接链接已复制到粘贴板!

15.3.3. 使用 Lifecycle Agent 移到基于镜像的升级的 Rollback 阶段复制链接链接已复制到粘贴板!

15.3.4. 使用生命周期代理对基于镜像的升级进行故障排除复制链接链接已复制到粘贴板!

15.3.4.1. 收集日志复制链接链接已复制到粘贴板!

15.3.4.2. AbortFailed 或 FinalizeFailed 错误复制链接链接已复制到粘贴板!

15.3.4.2.1. 手动清理 stateroot复制链接链接已复制到粘贴板!

15.3.4.2.2. 手动清理 OADP 资源复制链接链接已复制到粘贴板!

15.3.4.3. LVM 存储卷内容没有恢复复制链接链接已复制到粘贴板!

15.3.4.3.1. Backup CR 中缺少与 LVM Storage 相关的字段复制链接链接已复制到粘贴板!

15.3.4.3.2. Restore CR 中缺少与 LVM Storage 相关的字段复制链接链接已复制到粘贴板!

15.3.4.4. 调试失败的备份和恢复 CR复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

15.3.1. 使用生命周期代理移动到基于镜像的升级的 Prep 阶段
复制链接

15.3.2. 使用生命周期代理移动到基于镜像的升级升级阶段
复制链接

15.3.3. 使用 Lifecycle Agent 移到基于镜像的升级的 Rollback 阶段
复制链接

15.3.4. 使用生命周期代理对基于镜像的升级进行故障排除
复制链接

15.3.4.1. 收集日志
复制链接

15.3.4.2. AbortFailed 或 FinalizeFailed 错误
复制链接

15.3.4.2.1. 手动清理 stateroot
复制链接

15.3.4.2.2. 手动清理 OADP 资源
复制链接

15.3.4.3. LVM 存储卷内容没有恢复
复制链接

15.3.4.3.1. Backup CR 中缺少与 LVM Storage 相关的字段
复制链接

15.3.4.3.2. Restore CR 中缺少与 LVM Storage 相关的字段
复制链接

15.3.4.4. 调试失败的备份和恢复 CR
复制链接