1.4. Red Hat Advanced Cluster Management 升级后 ocm-controller 错误故障排除
从 2.7.x 升级到 2.8.x 后,cluster- engine 命名空间的
会崩溃。
ocm-
controller
1.4.1. 症状:Red Hat Advanced Cluster Management 升级后对 ocm-controller 错误进行故障排除
尝试列出 ManagedClusterSet
和 ManagedClusterSetBinding
自定义资源定义后,会出现以下出错信息:
Error from server: request to convert CR from an invalid group/version: cluster.open-cluster-management.io/v1beta1
前面的消息表示,从 v1beta1
到 v1beta2
的 ManagedClusterSets
和 ManagedClusterSetBindings
自定义资源定义的迁移失败。
1.4.2. 解决问题: 在 Red Hat Advanced Cluster Management 升级后对 ocm-controller 错误进行故障排除
要解决这个问题,您必须手动启动 API 迁移。完成以下步骤:
将
cluster-manager
恢复到以前的版本。使用以下命令暂停
multiclusterengine
:oc annotate mce multiclusterengine pause=true
运行以下命令,将
cluster-manager
部署的镜像替换为之前的版本:oc patch deployment cluster-manager -n multicluster-engine -p \ '{"spec":{"template":{"spec":{"containers":[{"name":"registration-operator","image":"registry.redhat.io/multicluster-engine/registration-operator-rhel8@sha256:35999c3a1022d908b6fe30aa9b85878e666392dbbd685e9f3edcb83e3336d19f"}]}}}}' export ORIGIN_REGISTRATION_IMAGE=$(oc get clustermanager cluster-manager -o jsonpath='{.spec.registrationImagePullSpec}')
将
ClusterManager
资源中的注册镜像引用替换为以前的版本。运行以下命令:oc patch clustermanager cluster-manager --type='json' -p='[{"op": "replace", "path": "/spec/registrationImagePullSpec", "value": "registry.redhat.io/multicluster-engine/registration-rhel8@sha256:a3c22aa4326859d75986bf24322068f0aff2103cccc06e1001faaf79b9390515"}]'
运行以下命令,将
ManagedClusterSets
和ManagedClusterSetBindings
自定义资源定义恢复到以前的版本:oc annotate crds managedclustersets.cluster.open-cluster-management.io operator.open-cluster-management.io/version- oc annotate crds managedclustersetbindings.cluster.open-cluster-management.io operator.open-cluster-management.io/version-
重新启动
cluster-manager
,并等待自定义资源定义重新创建。运行以下命令:oc -n multicluster-engine delete pods -l app=cluster-manager oc wait crds managedclustersets.cluster.open-cluster-management.io --for=jsonpath="{.metadata.annotations['operator\.open-cluster-management\.io/version']}"="2.3.3" --timeout=120s oc wait crds managedclustersetbindings.cluster.open-cluster-management.io --for=jsonpath="{.metadata.annotations['operator\.open-cluster-management\.io/version']}"="2.3.3" --timeout=120s
使用以下命令启动存储版本迁移:
oc patch StorageVersionMigration managedclustersets.cluster.open-cluster-management.io --type='json' -p='[{"op":"replace", "path":"/spec/resource/version", "value":"v1beta1"}]' oc patch StorageVersionMigration managedclustersets.cluster.open-cluster-management.io --type='json' --subresource status -p='[{"op":"remove", "path":"/status/conditions"}]' oc patch StorageVersionMigration managedclustersetbindings.cluster.open-cluster-management.io --type='json' -p='[{"op":"replace", "path":"/spec/resource/version", "value":"v1beta1"}]' oc patch StorageVersionMigration managedclustersetbindings.cluster.open-cluster-management.io --type='json' --subresource status -p='[{"op":"remove", "path":"/status/conditions"}]'
运行以下命令等待迁移完成:
oc wait storageversionmigration managedclustersets.cluster.open-cluster-management.io --for=condition=Succeeded --timeout=120s oc wait storageversionmigration managedclustersetbindings.cluster.open-cluster-management.io --for=condition=Succeeded --timeout=120s
将
cluster-manager
恢复到 Red Hat Advanced Cluster Management 2.12。它可能需要几分钟时间。运行以下命令:oc annotate mce multiclusterengine pause- oc patch clustermanager cluster-manager --type='json' -p='[{"op": "replace", "path": "/spec/registrationImagePullSpec", "value": "'$ORIGIN_REGISTRATION_IMAGE'"}]'
1.4.2.1. 验证
要验证 Red Hat Advanced Cluster Management 是否已恢复,请运行以下命令:
oc get managedclusterset oc get managedclustersetbinding -A
运行命令后,ManagedClusterSets
和 ManagedClusterSetBindings
资源会被列出,且没有错误消息。