1.5. 在 Red Hat Advanced Cluster Management 升级后对 ocm-controller 进行故障排除
从 2.7.x 升级到 2.8.x 后,multicluster-engine
命名空间的 ocm-controller
会崩溃。
1.5.1. 症状:在 Red Hat Advanced Cluster Management 升级后对 ocm-controller 进行故障排除
尝试列出 ManagedClusterSet
和 ManagedClusterSetBinding
自定义资源定义后,会出现以下出错信息:
Error from server: request to convert CR from an invalid group/version: cluster.open-cluster-management.io/v1beta1
前面的消息表示 ManagedClusterSets
和 ManagedClusterSetBindings
自定义资源定义从 v1beta1
迁移到 v1beta2
失败。
1.5.2. 解决问题: 在 Red Hat Advanced Cluster Management 升级后对 ocm-controller 进行故障排除
要解决这个错误,您必须手动启动 API 迁移。完成以下步骤:
将
cluster-manager
恢复到以前的版本。使用以下命令暂停
multiclusterengine
:oc annotate mce multiclusterengine pause=true
运行以下命令,将
cluster-manager
部署的镜像替换为之前的版本:oc patch deployment cluster-manager -n multicluster-engine -p \ '{"spec":{"template":{"spec":{"containers":[{"name":"registration-operator","image":"registry.redhat.io/multicluster-engine/registration-operator-rhel8@sha256:35999c3a1022d908b6fe30aa9b85878e666392dbbd685e9f3edcb83e3336d19f"}]}}}}' export ORIGIN_REGISTRATION_IMAGE=$(oc get clustermanager cluster-manager -o jsonpath='{.spec.registrationImagePullSpec}')
将
ClusterManager
资源中的注册镜像引用替换为以前的版本。运行以下命令:oc patch clustermanager cluster-manager --type='json' -p='[{"op": "replace", "path": "/spec/registrationImagePullSpec", "value": "registry.redhat.io/multicluster-engine/registration-rhel8@sha256:a3c22aa4326859d75986bf24322068f0aff2103cccc06e1001faaf79b9390515"}]'
运行以下命令,将
ManagedClusterSets
和ManagedClusterSetBindings
自定义资源定义恢复到之前的版本:oc annotate crds managedclustersets.cluster.open-cluster-management.io operator.open-cluster-management.io/version- oc annotate crds managedclustersetbindings.cluster.open-cluster-management.io operator.open-cluster-management.io/version-
重启
cluster-manager
并等待重新创建自定义资源定义。运行以下命令:oc -n multicluster-engine delete pods -l app=cluster-manager oc wait crds managedclustersets.cluster.open-cluster-management.io --for=jsonpath="{.metadata.annotations['operator\.open-cluster-management\.io/version']}"="2.3.3" --timeout=120s oc wait crds managedclustersetbindings.cluster.open-cluster-management.io --for=jsonpath="{.metadata.annotations['operator\.open-cluster-management\.io/version']}"="2.3.3" --timeout=120s
使用以下命令启动存储版本迁移:
oc patch StorageVersionMigration managedclustersets.cluster.open-cluster-management.io --type='json' -p='[{"op":"replace", "path":"/spec/resource/version", "value":"v1beta1"}]' oc patch StorageVersionMigration managedclustersets.cluster.open-cluster-management.io --type='json' --subresource status -p='[{"op":"remove", "path":"/status/conditions"}]' oc patch StorageVersionMigration managedclustersetbindings.cluster.open-cluster-management.io --type='json' -p='[{"op":"replace", "path":"/spec/resource/version", "value":"v1beta1"}]' oc patch StorageVersionMigration managedclustersetbindings.cluster.open-cluster-management.io --type='json' --subresource status -p='[{"op":"remove", "path":"/status/conditions"}]'
运行以下命令等待迁移完成:
oc wait storageversionmigration managedclustersets.cluster.open-cluster-management.io --for=condition=Succeeded --timeout=120s oc wait storageversionmigration managedclustersetbindings.cluster.open-cluster-management.io --for=condition=Succeeded --timeout=120s
将
cluster-manager
恢复到 Red Hat Advanced Cluster Management 2.9。它可能需要几分钟时间。运行以下命令:oc annotate mce multiclusterengine pause- oc patch clustermanager cluster-manager --type='json' -p='[{"op": "replace", "path": "/spec/registrationImagePullSpec", "value": "'$ORIGIN_REGISTRATION_IMAGE'"}]'
1.5.2.1. 验证
要验证 Red Hat Advanced Cluster Management 是否已恢复,请运行以下命令:
oc get managedclusterset oc get managedclustersetbinding -A
运行命令后,ManagedClusterSets
和 ManagedClusterSetBindings
资源会在没有错误消息的情况下列出。