18.5. 关于 ClusterGroupUpgrade CR
Topology Aware Lifecycle Manager(TALM)为一组集群从 ClusterGroupUpgrade
CR 构建补救计划。您可以在 ClusterGroupUpgrade
CR 中定义以下规格:
- 组中的集群
-
阻塞
ClusterGroupUpgrade
CR - 适用的受管策略列表
- 并发更新数
- 适用的 Canary 更新
- 更新前和更新之后执行的操作
- 更新数据
当 TALM 通过对指定集群进行补救时,ClusterGroupUpgrade
CR 可以具有以下状态:
-
UpgradeNotStarted
-
UpgradeCannotStart
-
UpgradeNotComplete
-
UpgradeTimedOut
-
UpgradeCompleted
-
PrecachingRequired
当 TALM 完成集群更新后,集群不会在同一 ClusterGroupUpgrade
CR 控制下再次更新。在以下情况下,必须创建新的 ClusterGroupUpgrade
CR:
- 当您需要再次更新集群时
-
当集群在更新后变为与
inform
策略不符合时
18.5.1. UpgradeNotStarted 状态
ClusterGroupUpgrade
CR 的初始状态是 UpgradeNotStarted
。
TALM 根据以下字段构建补救计划:
-
clusterSelector
字段指定您要更新的集群标签。 -
clusters
字段指定要更新的集群列表。 -
canaries
字段指定集群进行 Canary 更新。 -
maxConcurrency
字段指定批处理中要更新的集群数量。
您可以组合使用 cluster
和 clusterSelector
字段来创建组合的集群列表。
补救计划从 canaries
字段中列出的集群开始。每个 canary 集群组成一个集群批处理。
在更新 canary 集群的过程中任何错误都会停止更新过程。
在成功创建补救计划后,以及 enable
字段被设为 true
后,ClusterGroupUpgrade
CR 转变为 UpgradeNotCompleted
状态 。此时,TALM 开始使用特定的受管策略更新不合规的集群。
只有 ClusterGroupUpgrade
CR 处于 UpgradeNotStarted
或 UpgradeCannotStart
状态时,才能更改 spec
字段。
UpgradeNotStarted
状态的 ClusterGroupUpgrade
CR 示例
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-upgrade-complete namespace: default spec: clusters: 1 - spoke1 enable: false managedPolicies: 2 - policy1-common-cluster-version-policy - policy2-common-nto-sub-policy remediationStrategy: 3 canaries: 4 - spoke1 maxConcurrency: 1 5 timeout: 240 status: 6 conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-upgrade-complete-policy1-common-cluster-version-policy - cgu-upgrade-complete-policy2-common-nto-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-nto-sub-policy namespace: default placementBindings: - cgu-upgrade-complete-policy1-common-cluster-version-policy - cgu-upgrade-complete-policy2-common-nto-sub-policy placementRules: - cgu-upgrade-complete-policy1-common-cluster-version-policy - cgu-upgrade-complete-policy2-common-nto-sub-policy remediationPlan: - - spoke1
18.5.2. UpgradeCannotStart 状态
在 UpgradeCannotStart
状态下,因为以下原因,更新无法启动:
- 阻塞系统中缺少的 CR
- 阻塞还没有完成的 CR
18.5.3. UpgradeNotCompleted 状态
在 UpgradeNotCompleted
状态下,TALM 按照 UpgradeNotStarted
状态中定义的补救计划强制实施策略。
在当前批处理的所有集群与所有受管策略兼容后,对后续批处理的策略强制启动。如果批处理超时,TALM 会进入下一个批处理。批处理的超时值是 spec.timeout
字段除以补救计划中的批处理数量。
受管策略会按照 ClusterGroupUpgrade
CR 中的 managedPolicies
字段中列出的顺序进行应用。一个受管策略被应用于指定的集群。在指定的集群符合了当前的策略后,下一个受管策略将应用到下一个不合规的集群。
处于 UpgradeNotCompleted
状态的 ClusterGroupUpgrade
CR 示例
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-upgrade-complete namespace: default spec: clusters: - spoke1 enable: true 1 managedPolicies: - policy1-common-cluster-version-policy - policy2-common-nto-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: 2 conditions: - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant reason: UpgradeNotCompleted status: "False" type: Ready copiedPolicies: - cgu-upgrade-complete-policy1-common-cluster-version-policy - cgu-upgrade-complete-policy2-common-nto-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-nto-sub-policy namespace: default placementBindings: - cgu-upgrade-complete-policy1-common-cluster-version-policy - cgu-upgrade-complete-policy2-common-nto-sub-policy placementRules: - cgu-upgrade-complete-policy1-common-cluster-version-policy - cgu-upgrade-complete-policy2-common-nto-sub-policy remediationPlan: - - spoke1 status: currentBatch: 1 remediationPlanForBatch: 3 spoke1: 0
18.5.4. UpgradeTimedOut 状态
在 UpgradeTimedOut
状态下,TALM 会每小时检查 ClusterGroupUpgrade
CR 的所有策略是否合规。检查会一直进行,直到 ClusterGroupUpgrade
CR 被删除或更新已完成。如果由于网络、CPU 或其他问题导致更新会延长,则定期检查允许更新完成。
在两个情况下,TALM 过渡到 UpgradeTimedOut
状态:
- 在当前批处理包含 Canary 更新时,批处理中的集群不会遵循批处理超时中的所有受管策略。
-
当集群不遵循
remediationStrategy
字段指定的timeout
值中的受管策略时。
如果策略合规,TALM 会转换为 UpgradeCompleted
状态。
18.5.5. UpgradeCompleted 状态
在 UpgradeCompleted
状态中,集群更新已完成。
UpgradeCompleted
状态的 ClusterGroupUpgrade
CR 示例
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-upgrade-complete namespace: default spec: actions: afterCompletion: deleteObjects: true 1 clusters: - spoke1 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-nto-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: 2 conditions: - message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies reason: UpgradeCompleted status: "True" type: Ready managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-nto-sub-policy namespace: default remediationPlan: - - spoke1 status: remediationPlanForBatch: spoke1: -2 3
在 PrecachingRequired
状态中,集群需要预先缓存的镜像,然后才能启动更新。有关预缓存功能的更多信息,请参阅"使用容器镜像预缓存功能"部分。
18.5.6. 阻塞 ClusterGroupUpgrade CR
您可以创建多个 ClusterGroupUpgrade
CR,并控制应用程序的顺序。
例如,如果您创建了 ClusterGroupUpgrade
CR C,它会阻塞 ClusterGroupUpgrade
CR A 的启动,那么 ClusterGroupUpgrade
CR A 将无法启动,直到 ClusterGroupUpgrade
CR C 变为 UpgradeComplete
状态。
一个 ClusterGroupUpgrade
CR 可以有多个阻塞 CR。在这种情况下,所有块 CR 都必须在升级当前 CR 升级前完成。
先决条件
- 安装 Topology Aware Lifecycle Manager(TALM)。
- 置备一个或多个受管集群。
-
以具有
cluster-admin
特权的用户身份登录。 - 在 hub 集群中创建 RHACM 策略。
流程
将
ClusterGroupUpgrade
CR 的内容保存到cgu-a.yaml
、cgu-b.yaml
和cgu-c.yaml
文件中。apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-a namespace: default spec: blockingCRs: 1 - name: cgu-c namespace: default clusters: - spoke1 - spoke2 - spoke3 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 2 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default placementBindings: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy placementRules: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy remediationPlan: - - spoke1 - - spoke2
- 1
- 定义阻塞 CR。
cgu-a
更新无法启动,直到cgu-c
完成后。
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-b namespace: default spec: blockingCRs: 1 - name: cgu-a namespace: default clusters: - spoke4 - spoke5 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy placementRules: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy remediationPlan: - - spoke4 - - spoke5 status: {}
- 1
cgu-b
更新无法启动,直到cgu-a
完成后。
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-c namespace: default spec: 1 clusters: - spoke6 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy managedPoliciesCompliantBeforeUpgrade: - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy placementRules: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy remediationPlan: - - spoke6 status: {}
- 1
cgu-c
更新没有任何阻塞 CR。当enable
字段设为true
时,TALM 会启动cgu-c
更新。
通过为每个相关 CR 运行以下命令创建
ClusterGroupUpgrade
CR:$ oc apply -f <name>.yaml
通过为每个相关 CR 运行以下命令启动更新过程:
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'
以下示例显示
enable
字段设为true
的ClusterGroupUpgrade
CR:带有阻塞 CR 的
cgu-a
示例apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-a namespace: default spec: blockingCRs: - name: cgu-c namespace: default clusters: - spoke1 - spoke2 - spoke3 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 2 timeout: 240 status: conditions: - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet completed: [cgu-c]' 1 reason: UpgradeCannotStart status: "False" type: Ready copiedPolicies: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default placementBindings: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy placementRules: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy remediationPlan: - - spoke1 - - spoke2 status: {}
- 1
- 显示阻塞 CR 的列表。
带有阻塞 CR 的
cgu-b
示例apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-b namespace: default spec: blockingCRs: - name: cgu-a namespace: default clusters: - spoke4 - spoke5 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet completed: [cgu-a]' 1 reason: UpgradeCannotStart status: "False" type: Ready copiedPolicies: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy placementRules: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy remediationPlan: - - spoke4 - - spoke5 status: {}
- 1
- 显示阻塞 CR 的列表。
带有阻塞 CR 的
cgu-c
示例apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-c namespace: default spec: clusters: - spoke6 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant 1 reason: UpgradeNotCompleted status: "False" type: Ready copiedPolicies: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy managedPoliciesCompliantBeforeUpgrade: - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy placementRules: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy remediationPlan: - - spoke6 status: currentBatch: 1 remediationPlanForBatch: spoke6: 0
- 1
cgu-c
更新没有任何阻塞 CR。