1.13. 证书更改后导入的集群离线故障排除


支持安装自定义 apiserver 证书,但在更改证书信息前导入的一个或多个集群 处于离线状态

1.13.1. 症状:证书更改后集群处于离线状态

完成更新证书 secret 的步骤后,在线的一个或多个集群现在在控制台中显示 离线状态

1.13.2. 鉴别问题: 证书更改后集群处于离线状态

更新自定义 API 服务器证书信息后,在新证书前导入并运行的集群会处于 offline 状态。

表示证书有问题的错误会出现在离线受管集群的 open-cluster-management-agent 命名空间中的 pod 日志中。以下示例与日志中显示的错误类似:

请参阅以下 work-agent 日志:

E0917 03:04:05.874759       1 manifestwork_controller.go:179] Reconcile work test-1-klusterlet-addon-workmgr fails with err: Failed to update work status with err Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks/test-1-klusterlet-addon-workmgr": x509: certificate signed by unknown authority
E0917 03:04:05.874887       1 base_controller.go:231] "ManifestWorkAgent" controller failed to sync "test-1-klusterlet-addon-workmgr", err: Failed to update work status with err Get "api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks/test-1-klusterlet-addon-workmgr": x509: certificate signed by unknown authority
E0917 03:04:37.245859       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManifestWork: failed to list *v1.ManifestWork: Get "api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks?resourceVersion=607424": x509: certificate signed by unknown authority

请参阅以下 registration-agent 日志:

I0917 02:27:41.525026       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management-agent", Name:"open-cluster-management-agent", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ManagedClusterAvailableConditionUpdated' update managed cluster "test-1" available condition to "True", due to "Managed cluster is available"
E0917 02:58:26.315984       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1beta1.CertificateSigningRequest: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true"": x509: certificate signed by unknown authority
E0917 02:58:26.598343       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManagedCluster: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true": x509: certificate signed by unknown authority
E0917 02:58:27.613963       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManagedCluster: failed to list *v1.ManagedCluster: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true"": x509: certificate signed by unknown authority

1.13.3. 解决问题: 证书更改后集群处于离线状态

如果您的受管集群是 local-cluster,或者受管集群是使用 Red Hat Advanced Cluster Management for Kubernetes 创建的,则必须等待 10 分钟或更长时间重新导入受管集群。

要立即重新导入受管集群,您可以删除 hub 集群上的受管集群导入 secret,并使用 Red Hat Advanced Cluster Management 重新导入它。运行以下命令:

oc delete secret -n <cluster_name> <cluster_name>-import

<cluster_name> 替换为您要导入的受管集群的名称。

如果要重新导入使用 Red Hat Advanced Cluster Management 导入的受管集群,请完成以下步骤以再次导入受管集群:

  1. 在 hub 集群中,运行以下命令来重新创建受管集群导入 secret:

    oc delete secret -n <cluster_name> <cluster_name>-import

    <cluster_name> 替换为您要导入的受管集群的名称。

  2. 在 hub 集群中,运行以下命令来将受管集群导入 secret 公开给 YAML 文件:

    oc get secret -n <cluster_name> <cluster_name>-import -ojsonpath='{.data.import\.yaml}' | base64 --decode  > import.yaml

    <cluster_name> 替换为您要导入的受管集群的名称。

  3. 在受管集群中,运行以下命令应用 import.yaml 文件:

    oc apply -f import.yaml

注:前面的步骤不会从 hub 集群中分离受管集群。步骤使用受管集群中的当前设置更新所需的清单,包括新证书信息。

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.