12.6. 更新受管集群上的策略
Topology Aware Lifecycle Manager (TALM)修复了在 ClusterGroupUpgrade
自定义资源 (CR) 中指定的集群的 inform
策略。TALM 通过 PlacementBinding
CR 中的 bindingOverrides.remediationAction
和 subFilter
的规格控制 Policy
CR 中的 remediationAction
规格来修复 inform
策略。每个策略都有自己的对应的 RHACM 放置规则和 RHACM 放置绑定。
例如,TALM 将每个集群从当前批处理添加到与适用受管策略相对应的放置规则。如果集群已与策略兼容,TALM 会在兼容集群上跳过应用该策略。TALM 然后进入到将下一个策略应用到还没有合规的集群的步骤。TALM 在批处理中完成更新后,所有集群都会从与策略关联的放置规则中删除。然后,下一个批处理的更新会启动。
如果 spoke 集群没有向 RHACM 报告任何合规状态,则 hub 集群上的受管策略可能会缺少 TALM 需要的状态信息。TALM 通过以下方法处理这些情况:
-
如果缺少策略的
status.compliant
字段,TALM 忽略策略并添加日志条目。然后,TALM 继续查看策略的status.status
字段。 -
如果缺少策略的
status.status
,TALM 会生成错误。 -
如果策略的
status.status
字段中缺少集群的合规状态,TALM 会将该集群视为与该策略不兼容。
ClusterGroupUpgrade
CR 的 batchTimeoutAction
决定升级失败时是否有什么情况。您可以指定 continue
跳过失败的集群,并继续升级其他集群,或者指定 abort
以停止所有集群的策略补救。超时后,TALM 会删除它创建的所有资源,以确保对集群不进行进一步的更新。
升级策略示例
apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: name: ocp-4.4.16.4 namespace: platform-upgrade spec: disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: upgrade spec: namespaceselector: exclude: - kube-* include: - '*' object-templates: - complianceType: musthave objectDefinition: apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: name: version spec: channel: stable-4.16 desiredUpdate: version: 4.4.16.4 upstream: https://api.openshift.com/api/upgrades_info/v1/graph status: history: - state: Completed version: 4.4.16.4 remediationAction: inform severity: low remediationAction: inform
有关 RHACM 策略的更多信息,请参阅策略概述。
12.6.1. 使用 TALM 为安装的受管集群配置 Operator 订阅
Topology Aware Lifecycle Manager (TALM) 只能在 Operator 的 Subscription
自定义资源(CR) 包含 status.state.AtLatestKnown
字段时批准 Operator 的安装计划。
流程
将
status.state.AtLatestKnown
字段添加到 Operator 的Subscription
CR 中:Subscription CR 示例
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging annotations: ran.openshift.io/ztp-deploy-wave: "2" spec: channel: "stable" name: cluster-logging source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown 1
- 1
status.state: AtLatestKnown
字段用于 Operator 目录中可用的最新 Operator 版本。
注意当 registry 中有新版本的 Operator 时,相关的策略将变为不合规。
-
使用
ClusterGroupUpgrade
CR 将更改的Subscription
策略应用到受管集群。
12.6.2. 将更新策略应用到受管集群
您可以通过应用策略来更新受管集群。
先决条件
- 安装 Topology Aware Lifecycle Manager(TALM)。
- TALM 4.16 需要 RHACM 2.9 或更高版本。
- 置备一个或多个受管集群。
-
以具有
cluster-admin
特权的用户身份登录。 - 在 hub 集群中创建 RHACM 策略。
流程
将
ClusterGroupUpgrade
CR 的内容保存到cgu-1.yaml
文件中。apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-1 namespace: default spec: managedPolicies: 1 - policy1-common-cluster-version-policy - policy2-common-nto-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy enable: false clusters: 2 - spoke1 - spoke2 - spoke5 - spoke6 remediationStrategy: maxConcurrency: 2 3 timeout: 240 4 batchTimeoutAction: 5
运行以下命令来创建
ClusterGroupUpgrade
CR:$ oc create -f cgu-1.yaml
运行以下命令,检查 hub 集群中是否已创建
ClusterGroupUpgrade
CR:$ oc get cgu --all-namespaces
输出示例
NAMESPACE NAME AGE STATE DETAILS default cgu-1 8m55 NotEnabled Not Enabled
运行以下命令检查更新的状态:
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
输出示例
{ "computedMaxConcurrency": 2, "conditions": [ { "lastTransitionTime": "2022-02-25T15:34:07Z", "message": "Not enabled", 1 "reason": "NotEnabled", "status": "False", "type": "Progressing" } ], "managedPoliciesContent": { "policy1-common-cluster-version-policy": "null", "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]", "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" }, "managedPoliciesForUpgrade": [ { "name": "policy1-common-cluster-version-policy", "namespace": "default" }, { "name": "policy2-common-nto-sub-policy", "namespace": "default" }, { "name": "policy3-common-ptp-sub-policy", "namespace": "default" }, { "name": "policy4-common-sriov-sub-policy", "namespace": "default" } ], "managedPoliciesNs": { "policy1-common-cluster-version-policy": "default", "policy2-common-nto-sub-policy": "default", "policy3-common-ptp-sub-policy": "default", "policy4-common-sriov-sub-policy": "default" }, "placementBindings": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "placementRules": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "remediationPlan": [ [ "spoke1", "spoke2" ], [ "spoke5", "spoke6" ] ], "status": {} }
- 1
ClusterGroupUpgrade
CR 中的spec.enable
字段设置为false
。
运行以下命令,将
spec.enable
字段的值更改为true
:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge
验证
运行以下命令检查更新的状态:
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
输出示例
{ "computedMaxConcurrency": 2, "conditions": [ 1 { "lastTransitionTime": "2022-02-25T15:33:07Z", "message": "All selected clusters are valid", "reason": "ClusterSelectionCompleted", "status": "True", "type": "ClustersSelected" }, { "lastTransitionTime": "2022-02-25T15:33:07Z", "message": "Completed validation", "reason": "ValidationCompleted", "status": "True", "type": "Validated" }, { "lastTransitionTime": "2022-02-25T15:34:07Z", "message": "Remediating non-compliant policies", "reason": "InProgress", "status": "True", "type": "Progressing" } ], "managedPoliciesContent": { "policy1-common-cluster-version-policy": "null", "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]", "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" }, "managedPoliciesForUpgrade": [ { "name": "policy1-common-cluster-version-policy", "namespace": "default" }, { "name": "policy2-common-nto-sub-policy", "namespace": "default" }, { "name": "policy3-common-ptp-sub-policy", "namespace": "default" }, { "name": "policy4-common-sriov-sub-policy", "namespace": "default" } ], "managedPoliciesNs": { "policy1-common-cluster-version-policy": "default", "policy2-common-nto-sub-policy": "default", "policy3-common-ptp-sub-policy": "default", "policy4-common-sriov-sub-policy": "default" }, "placementBindings": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "placementRules": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "remediationPlan": [ [ "spoke1", "spoke2" ], [ "spoke5", "spoke6" ] ], "status": { "currentBatch": 1, "currentBatchRemediationProgress": { "spoke1": { "policyIndex": 1, "state": "InProgress" }, "spoke2": { "policyIndex": 1, "state": "InProgress" } }, "currentBatchStartedAt": "2022-02-25T15:54:16Z", "startedAt": "2022-02-25T15:54:16Z" } }
- 1
- 反映当前批处理的更新进度。再次运行该命令以接收有关进度的更新信息。
运行以下命令,检查策略的状态:
oc get policies -A
输出示例
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE spoke1 default.policy1-common-cluster-version-policy enforce Compliant 18m spoke1 default.policy2-common-nto-sub-policy enforce NonCompliant 18m spoke2 default.policy1-common-cluster-version-policy enforce Compliant 18m spoke2 default.policy2-common-nto-sub-policy enforce NonCompliant 18m spoke5 default.policy3-common-ptp-sub-policy inform NonCompliant 18m spoke5 default.policy4-common-sriov-sub-policy inform NonCompliant 18m spoke6 default.policy3-common-ptp-sub-policy inform NonCompliant 18m spoke6 default.policy4-common-sriov-sub-policy inform NonCompliant 18m default policy1-common-ptp-sub-policy inform Compliant 18m default policy2-common-sriov-sub-policy inform NonCompliant 18m default policy3-common-ptp-sub-policy inform NonCompliant 18m default policy4-common-sriov-sub-policy inform NonCompliant 18m
-
spec.remediationAction
值会更改为从当前批处理应用到集群的子策略enforce
值。 -
spec.remedationAction
值会针对剩余的集群中的子策略保持inform
。 -
批处理完成后,
spec.remediationAction
值会重新更改为inform
强制执行的子策略。
-
如果策略包含 Operator 订阅,您可以在单节点集群中直接检查安装进度。
运行以下命令,导出用于检查安装的单节点集群的
KUBECONFIG
文件:$ export KUBECONFIG=<cluster_kubeconfig_absolute_path>
运行以下命令,检查单节点集群中存在的所有订阅,并在您要通过 ClusterGroupUpgrade CR 安装的策略中查找您要通过
ClusterGroupUpgrade
CR 安装的订阅:$ oc get subs -A | grep -i <subscription_name>
cluster-logging
策略的输出示例NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable
如果其中一个受管策略包含
ClusterVersion
CR,则根据 spoke 集群运行以下命令来检查当前批处理中的平台更新状态:$ oc get clusterversion
输出示例
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.16.5 True True 43s Working towards 4.4.16.7: 71 of 735 done (9% complete)
运行以下命令检查 Operator 订阅:
$ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"
运行以下命令,检查与所需订阅关联的单节点集群中是否存在安装计划:
$ oc get installplan -n <subscription_namespace>
cluster-logging
Operator 的输出示例NAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true 1
- 1
- 安装计划在 TALM 批准安装计划后将其
Approval
字段设置为Manual
,其Approved
字段会从false
改为true
。
注意当 TALM 修复包含订阅的策略时,它会自动批准附加到该订阅的任何安装计划。如果需要多个安装计划将 Operator 升级到最新的已知版本,TALM 可能会批准多个安装计划,通过一个或多个中间版本进行升级以进入最终版本。
运行以下命令,检查正在安装
ClusterGroupUpgrade
的策略的 Operator 的集群服务版本是否已进入Succeeded
阶段:$ oc get csv -n <operator_namespace>
OpenShift Logging Operator 的输出示例
NAME DISPLAY VERSION REPLACES PHASE cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded