18.5. 关于 ClusterGroupUpgrade CR
Topology Aware Lifecycle Manager(TALM)为一组集群从 ClusterGroupUpgrade CR 构建补救计划。您可以在 ClusterGroupUpgrade CR 中定义以下规格:
- 组中的集群
-
阻塞
ClusterGroupUpgradeCR - 适用的受管策略列表
- 并发更新数
- 适用的 Canary 更新
- 更新前和更新之后执行的操作
- 更新数据
当 TALM 通过对指定集群进行补救时,ClusterGroupUpgrade CR 可以具有以下状态:
-
UpgradeNotStarted -
UpgradeCannotStart -
UpgradeNotComplete -
UpgradeTimedOut -
UpgradeCompleted -
PrecachingRequired
当 TALM 完成集群更新后,集群不会在同一 ClusterGroupUpgrade CR 控制下再次更新。在以下情况下,必须创建新的 ClusterGroupUpgrade CR:
- 当您需要再次更新集群时
-
当集群在更新后变为与
inform策略不符合时
18.5.1. UpgradeNotStarted 状态 复制链接链接已复制到粘贴板!
ClusterGroupUpgrade CR 的初始状态是 UpgradeNotStarted。
TALM 根据以下字段构建补救计划:
-
clusterSelector字段指定您要更新的集群标签。 -
clusters字段指定要更新的集群列表。 -
canaries字段指定集群进行 Canary 更新。 -
maxConcurrency字段指定批处理中要更新的集群数量。
您可以组合使用 cluster 和 clusterSelector 字段来创建组合的集群列表。
补救计划从 canaries 字段中列出的集群开始。每个 canary 集群组成一个集群批处理。
在更新 canary 集群的过程中任何错误都会停止更新过程。
在成功创建补救计划后,以及 enable 字段被设为 true 后,ClusterGroupUpgrade CR 转变为 UpgradeNotCompleted 状态 。此时,TALM 开始使用特定的受管策略更新不合规的集群。
只有 ClusterGroupUpgrade CR 处于 UpgradeNotStarted 或 UpgradeCannotStart 状态时,才能更改 spec 字段。
UpgradeNotStarted 状态的 ClusterGroupUpgrade CR 示例
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
clusters:
- spoke1
enable: false
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-nto-sub-policy
remediationStrategy:
canaries:
- spoke1
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR is not enabled
reason: UpgradeNotStarted
status: "False"
type: Ready
copiedPolicies:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-nto-sub-policy
namespace: default
placementBindings:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
placementRules:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
remediationPlan:
- - spoke1
18.5.2. UpgradeCannotStart 状态 复制链接链接已复制到粘贴板!
在 UpgradeCannotStart 状态下,因为以下原因,更新无法启动:
- 阻塞系统中缺少的 CR
- 阻塞还没有完成的 CR
18.5.3. UpgradeNotCompleted 状态 复制链接链接已复制到粘贴板!
在 UpgradeNotCompleted 状态下,TALM 按照 UpgradeNotStarted 状态中定义的补救计划强制实施策略。
在当前批处理的所有集群与所有受管策略兼容后,对后续批处理的策略强制启动。如果批处理超时,TALM 会进入下一个批处理。批处理的超时值是 spec.timeout 字段除以补救计划中的批处理数量。
受管策略会按照 ClusterGroupUpgrade CR 中的 managedPolicies 字段中列出的顺序进行应用。一个受管策略被应用于指定的集群。在指定的集群符合了当前的策略后,下一个受管策略将应用到下一个不合规的集群。
处于 UpgradeNotCompleted 状态的 ClusterGroupUpgrade CR 示例
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
clusters:
- spoke1
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-nto-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant
reason: UpgradeNotCompleted
status: "False"
type: Ready
copiedPolicies:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-nto-sub-policy
namespace: default
placementBindings:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
placementRules:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
remediationPlan:
- - spoke1
status:
currentBatch: 1
remediationPlanForBatch:
spoke1: 0
18.5.4. UpgradeTimedOut 状态 复制链接链接已复制到粘贴板!
在 UpgradeTimedOut 状态下,TALM 会每小时检查 ClusterGroupUpgrade CR 的所有策略是否合规。检查会一直进行,直到 ClusterGroupUpgrade CR 被删除或更新已完成。如果由于网络、CPU 或其他问题导致更新会延长,则定期检查允许更新完成。
在两个情况下,TALM 过渡到 UpgradeTimedOut 状态:
- 在当前批处理包含 Canary 更新时,批处理中的集群不会遵循批处理超时中的所有受管策略。
-
当集群不遵循
remediationStrategy字段指定的timeout值中的受管策略时。
如果策略合规,TALM 会转换为 UpgradeCompleted 状态。
18.5.5. UpgradeCompleted 状态 复制链接链接已复制到粘贴板!
在 UpgradeCompleted 状态中,集群更新已完成。
UpgradeCompleted 状态的 ClusterGroupUpgrade CR 示例
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
actions:
afterCompletion:
deleteObjects: true
clusters:
- spoke1
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-nto-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies
reason: UpgradeCompleted
status: "True"
type: Ready
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-nto-sub-policy
namespace: default
remediationPlan:
- - spoke1
status:
remediationPlanForBatch:
spoke1: -2
在 PrecachingRequired 状态中,集群需要预先缓存的镜像,然后才能启动更新。有关预缓存功能的更多信息,请参阅"使用容器镜像预缓存功能"部分。
18.5.6. 阻塞 ClusterGroupUpgrade CR 复制链接链接已复制到粘贴板!
您可以创建多个 ClusterGroupUpgrade CR,并控制应用程序的顺序。
例如,如果您创建了 ClusterGroupUpgrade CR C,它会阻塞 ClusterGroupUpgrade CR A 的启动,那么 ClusterGroupUpgrade CR A 将无法启动,直到 ClusterGroupUpgrade CR C 变为 UpgradeComplete 状态。
一个 ClusterGroupUpgrade CR 可以有多个阻塞 CR。在这种情况下,所有块 CR 都必须在升级当前 CR 升级前完成。
先决条件
- 安装 Topology Aware Lifecycle Manager(TALM)。
- 置备一个或多个受管集群。
-
以具有
cluster-admin特权的用户身份登录。 - 在 hub 集群中创建 RHACM 策略。
流程
将
ClusterGroupUpgradeCR 的内容保存到cgu-a.yaml、cgu-b.yaml和cgu-c.yaml文件中。apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-a namespace: default spec: blockingCRs:1 - name: cgu-c namespace: default clusters: - spoke1 - spoke2 - spoke3 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 2 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default placementBindings: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy placementRules: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy remediationPlan: - - spoke1 - - spoke2- 1
- 定义阻塞 CR。
cgu-a更新无法启动,直到cgu-c完成后。
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-b namespace: default spec: blockingCRs:1 - name: cgu-a namespace: default clusters: - spoke4 - spoke5 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy placementRules: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy remediationPlan: - - spoke4 - - spoke5 status: {}- 1
cgu-b更新无法启动,直到cgu-a完成后。
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-c namespace: default spec:1 clusters: - spoke6 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy managedPoliciesCompliantBeforeUpgrade: - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy placementRules: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy remediationPlan: - - spoke6 status: {}- 1
cgu-c更新没有任何阻塞 CR。当enable字段设为true时,TALM 会启动cgu-c更新。
通过为每个相关 CR 运行以下命令创建
ClusterGroupUpgradeCR:$ oc apply -f <name>.yaml通过为每个相关 CR 运行以下命令启动更新过程:
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'以下示例显示
enable字段设为true的ClusterGroupUpgradeCR:带有阻塞 CR 的
cgu-a示例apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-a namespace: default spec: blockingCRs: - name: cgu-c namespace: default clusters: - spoke1 - spoke2 - spoke3 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 2 timeout: 240 status: conditions: - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet completed: [cgu-c]'1 reason: UpgradeCannotStart status: "False" type: Ready copiedPolicies: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default placementBindings: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy placementRules: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy remediationPlan: - - spoke1 - - spoke2 status: {}- 1
- 显示阻塞 CR 的列表。
带有阻塞 CR 的
cgu-b示例apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-b namespace: default spec: blockingCRs: - name: cgu-a namespace: default clusters: - spoke4 - spoke5 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet completed: [cgu-a]'1 reason: UpgradeCannotStart status: "False" type: Ready copiedPolicies: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy placementRules: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy remediationPlan: - - spoke4 - - spoke5 status: {}- 1
- 显示阻塞 CR 的列表。
带有阻塞 CR 的
cgu-c示例apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-c namespace: default spec: clusters: - spoke6 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant1 reason: UpgradeNotCompleted status: "False" type: Ready copiedPolicies: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy managedPoliciesCompliantBeforeUpgrade: - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy placementRules: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy remediationPlan: - - spoke6 status: currentBatch: 1 remediationPlanForBatch: spoke6: 0- 1
cgu-c更新没有任何阻塞 CR。