11.6. 更新受管集群上的策略
Topology Aware Lifecycle Manager(TALM)修复了在 ClusterGroupUpgrade
CR 中指定的集群的 inform
策略。TALM 通过生成受管 RHACM 策略的 enforce
副本来修复 inform
策略。每个复制的策略都有自己的对应的 RHACM 放置规则和 RHACM 放置绑定。
例如,TALM 将每个集群从当前批处理添加到与适用受管策略相对应的放置规则。如果集群已与策略兼容,TALM 会在兼容集群上跳过应用该策略。TALM 然后进入到将下一个策略应用到还没有合规的集群的步骤。TALM 在批处理中完成更新后,所有集群都会从与复制策略关联的放置规则中删除。然后,下一个批处理的更新会启动。
如果 spoke 集群没有向 RHACM 报告任何合规状态,则 hub 集群上的受管策略可能会缺少 TALM 需要的状态信息。TALM 通过以下方法处理这些情况:
-
如果缺少策略的
status.compliant
字段,TALM 忽略策略并添加日志条目。然后,TALM 继续查看策略的status.status
字段。 -
如果缺少策略的
status.status
,TALM 会生成错误。 -
如果策略的
status.status
字段中缺少集群的合规状态,TALM 会将该集群视为与该策略不兼容。
ClusterGroupUpgrade
CR 的 batchTimeoutAction
决定升级失败时是否有什么情况。您可以指定 continue
跳过失败的集群,并继续升级其他集群,或者指定 abort
以停止所有集群的策略补救。超时后,TALM 删除所有强制策略,以确保对集群不进行进一步的更新。
升级策略示例
apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: name: ocp-4.4.15.4 namespace: platform-upgrade spec: disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: upgrade spec: namespaceselector: exclude: - kube-* include: - '*' object-templates: - complianceType: musthave objectDefinition: apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: name: version spec: channel: stable-4.15 desiredUpdate: version: 4.4.15.4 upstream: https://api.openshift.com/api/upgrades_info/v1/graph status: history: - state: Completed version: 4.4.15.4 remediationAction: inform severity: low remediationAction: inform
有关 RHACM 策略的更多信息,请参阅策略概述。
其他资源
如需有关 PolicyGenTemplate
CRD 的更多信息,请参阅关于 PolicyGenTemplate CRD
11.6.1. 使用 TALM 为安装的受管集群配置 Operator 订阅
Topology Aware Lifecycle Manager (TALM) 只能在 Operator 的 Subscription
自定义资源(CR) 包含 status.state.AtLatestKnown
字段时批准 Operator 的安装计划。
流程
将
status.state.AtLatestKnown
字段添加到 Operator 的Subscription
CR 中:Subscription CR 示例
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging annotations: ran.openshift.io/ztp-deploy-wave: "2" spec: channel: "stable" name: cluster-logging source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown 1
- 1
status.state: AtLatestKnown
字段用于 Operator 目录中可用的最新 Operator 版本。
注意当 registry 中有新版本的 Operator 时,相关的策略将变为不合规。
-
使用
ClusterGroupUpgrade
CR 将更改的Subscription
策略应用到受管集群。
11.6.2. 将更新策略应用到受管集群
您可以通过应用策略来更新受管集群。
先决条件
- 安装 Topology Aware Lifecycle Manager(TALM)。
- 置备一个或多个受管集群。
-
以具有
cluster-admin
特权的用户身份登录。 - 在 hub 集群中创建 RHACM 策略。
流程
将
ClusterGroupUpgrade
CR 的内容保存到cgu-1.yaml
文件中。apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-1 namespace: default spec: managedPolicies: 1 - policy1-common-cluster-version-policy - policy2-common-nto-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy enable: false clusters: 2 - spoke1 - spoke2 - spoke5 - spoke6 remediationStrategy: maxConcurrency: 2 3 timeout: 240 4 batchTimeoutAction: 5
运行以下命令来创建
ClusterGroupUpgrade
CR:$ oc create -f cgu-1.yaml
运行以下命令,检查 hub 集群中是否已创建
ClusterGroupUpgrade
CR:$ oc get cgu --all-namespaces
输出示例
NAMESPACE NAME AGE STATE DETAILS default cgu-1 8m55 NotEnabled Not Enabled
运行以下命令检查更新的状态:
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
输出示例
{ "computedMaxConcurrency": 2, "conditions": [ { "lastTransitionTime": "2022-02-25T15:34:07Z", "message": "Not enabled", 1 "reason": "NotEnabled", "status": "False", "type": "Progressing" } ], "copiedPolicies": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "managedPoliciesContent": { "policy1-common-cluster-version-policy": "null", "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]", "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" }, "managedPoliciesForUpgrade": [ { "name": "policy1-common-cluster-version-policy", "namespace": "default" }, { "name": "policy2-common-nto-sub-policy", "namespace": "default" }, { "name": "policy3-common-ptp-sub-policy", "namespace": "default" }, { "name": "policy4-common-sriov-sub-policy", "namespace": "default" } ], "managedPoliciesNs": { "policy1-common-cluster-version-policy": "default", "policy2-common-nto-sub-policy": "default", "policy3-common-ptp-sub-policy": "default", "policy4-common-sriov-sub-policy": "default" }, "placementBindings": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "placementRules": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "precaching": { "spec": {} }, "remediationPlan": [ [ "spoke1", "spoke2" ], [ "spoke5", "spoke6" ] ], "status": {} }
- 1
ClusterGroupUpgrade
CR 中的spec.enable
字段设置为false
。
运行以下命令,检查策略的状态:
$ oc get policies -A
输出示例
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default cgu-policy1-common-cluster-version-policy enforce 17m 1 default cgu-policy2-common-nto-sub-policy enforce 17m default cgu-policy3-common-ptp-sub-policy enforce 17m default cgu-policy4-common-sriov-sub-policy enforce 17m default policy1-common-cluster-version-policy inform NonCompliant 15h default policy2-common-nto-sub-policy inform NonCompliant 15h default policy3-common-ptp-sub-policy inform NonCompliant 18m default policy4-common-sriov-sub-policy inform NonCompliant 18m
- 1
- 目前在集群中应用的策略的
spec.remediationAction
字段被设置为enforce
。在更新过程中,来自ClusterGroupUpgrade
CR 的inform
模式的受管策略会处于inform
模式。
运行以下命令,将
spec.enable
字段的值更改为true
:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge
验证
运行以下命令,再次检查更新的状态:
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
输出示例
{ "computedMaxConcurrency": 2, "conditions": [ 1 { "lastTransitionTime": "2022-02-25T15:33:07Z", "message": "All selected clusters are valid", "reason": "ClusterSelectionCompleted", "status": "True", "type": "ClustersSelected", "lastTransitionTime": "2022-02-25T15:33:07Z", "message": "Completed validation", "reason": "ValidationCompleted", "status": "True", "type": "Validated", "lastTransitionTime": "2022-02-25T15:34:07Z", "message": "Remediating non-compliant policies", "reason": "InProgress", "status": "True", "type": "Progressing" } ], "copiedPolicies": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "managedPoliciesContent": { "policy1-common-cluster-version-policy": "null", "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]", "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" }, "managedPoliciesForUpgrade": [ { "name": "policy1-common-cluster-version-policy", "namespace": "default" }, { "name": "policy2-common-nto-sub-policy", "namespace": "default" }, { "name": "policy3-common-ptp-sub-policy", "namespace": "default" }, { "name": "policy4-common-sriov-sub-policy", "namespace": "default" } ], "managedPoliciesNs": { "policy1-common-cluster-version-policy": "default", "policy2-common-nto-sub-policy": "default", "policy3-common-ptp-sub-policy": "default", "policy4-common-sriov-sub-policy": "default" }, "placementBindings": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "placementRules": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "precaching": { "spec": {} }, "remediationPlan": [ [ "spoke1", "spoke2" ], [ "spoke5", "spoke6" ] ], "status": { "currentBatch": 1, "currentBatchStartedAt": "2022-02-25T15:54:16Z", "remediationPlanForBatch": { "spoke1": 0, "spoke2": 1 }, "startedAt": "2022-02-25T15:54:16Z" } }
- 1
- 反映当前批处理的更新进度。再次运行该命令以接收有关进度的更新信息。
如果策略包含 Operator 订阅,您可以在单节点集群中直接检查安装进度。
运行以下命令,导出用于检查安装的单节点集群的
KUBECONFIG
文件:$ export KUBECONFIG=<cluster_kubeconfig_absolute_path>
运行以下命令,检查单节点集群中存在的所有订阅,并在您要通过 ClusterGroupUpgrade CR 安装的策略中查找您要通过
ClusterGroupUpgrade
CR 安装的订阅:$ oc get subs -A | grep -i <subscription_name>
cluster-logging
策略的输出示例NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable
如果其中一个受管策略包含
ClusterVersion
CR,则根据 spoke 集群运行以下命令来检查当前批处理中的平台更新状态:$ oc get clusterversion
输出示例
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.15.5 True True 43s Working towards 4.4.15.7: 71 of 735 done (9% complete)
运行以下命令检查 Operator 订阅:
$ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"
运行以下命令,检查与所需订阅关联的单节点集群中是否存在安装计划:
$ oc get installplan -n <subscription_namespace>
cluster-logging
Operator 的输出示例NAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true 1
- 1
- 安装计划在 TALM 批准安装计划后将其
Approval
字段设置为Manual
,其Approved
字段会从false
改为true
。
注意当 TALM 修复包含订阅的策略时,它会自动批准附加到该订阅的任何安装计划。如果需要多个安装计划将 Operator 升级到最新的已知版本,TALM 可能会批准多个安装计划,通过一个或多个中间版本进行升级以进入最终版本。
运行以下命令,检查正在安装
ClusterGroupUpgrade
的策略的 Operator 的集群服务版本是否已进入Succeeded
阶段:$ oc get csv -n <operator_namespace>
OpenShift Logging Operator 的输出示例
NAME DISPLAY VERSION REPLACES PHASE cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded