12.6. 更新受管集群上的策略


Topology Aware Lifecycle Manager (TALM)修复了在 ClusterGroupUpgrade 自定义资源 (CR) 中指定的集群的 inform 策略。TALM 通过 PlacementBinding CR 中的 bindingOverrides.remediationActionsubFilter 的规格控制 Policy CR 中的 remediationAction 规格来修复 inform 策略。每个策略都有自己的对应的 RHACM 放置规则和 RHACM 放置绑定。

例如,TALM 将每个集群从当前批处理添加到与适用受管策略相对应的放置规则。如果集群已与策略兼容,TALM 会在兼容集群上跳过应用该策略。TALM 然后进入到将下一个策略应用到还没有合规的集群的步骤。TALM 在批处理中完成更新后,所有集群都会从与策略关联的放置规则中删除。然后,下一个批处理的更新会启动。

如果 spoke 集群没有向 RHACM 报告任何合规状态,则 hub 集群上的受管策略可能会缺少 TALM 需要的状态信息。TALM 通过以下方法处理这些情况:

  • 如果缺少策略的 status.compliant 字段,TALM 忽略策略并添加日志条目。然后,TALM 继续查看策略的 status.status 字段。
  • 如果缺少策略的 status.status,TALM 会生成错误。
  • 如果策略的 status.status 字段中缺少集群的合规状态,TALM 会将该集群视为与该策略不兼容。

ClusterGroupUpgrade CR 的 batchTimeoutAction 决定升级失败时是否有什么情况。您可以指定 continue 跳过失败的集群,并继续升级其他集群,或者指定 abort 以停止所有集群的策略补救。超时后,TALM 会删除它创建的所有资源,以确保对集群不进行进一步的更新。

升级策略示例

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: ocp-4.4.16.4
  namespace: platform-upgrade
spec:
  disabled: false
  policy-templates:
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: upgrade
      spec:
        namespaceselector:
          exclude:
          - kube-*
          include:
          - '*'
        object-templates:
        - complianceType: musthave
          objectDefinition:
            apiVersion: config.openshift.io/v1
            kind: ClusterVersion
            metadata:
              name: version
            spec:
              channel: stable-4.16
              desiredUpdate:
                version: 4.4.16.4
              upstream: https://api.openshift.com/api/upgrades_info/v1/graph
            status:
              history:
                - state: Completed
                  version: 4.4.16.4
        remediationAction: inform
        severity: low
  remediationAction: inform

有关 RHACM 策略的更多信息,请参阅策略概述

12.6.1. 使用 TALM 为安装的受管集群配置 Operator 订阅

Topology Aware Lifecycle Manager (TALM) 只能在 Operator 的 Subscription 自定义资源(CR) 包含 status.state.AtLatestKnown 字段时批准 Operator 的安装计划。

流程

  1. status.state.AtLatestKnown 字段添加到 Operator 的 Subscription CR 中:

    Subscription CR 示例

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: cluster-logging
      namespace: openshift-logging
      annotations:
        ran.openshift.io/ztp-deploy-wave: "2"
    spec:
      channel: "stable"
      name: cluster-logging
      source: redhat-operators
      sourceNamespace: openshift-marketplace
      installPlanApproval: Manual
    status:
      state: AtLatestKnown 1

    1
    status.state: AtLatestKnown 字段用于 Operator 目录中可用的最新 Operator 版本。
    注意

    当 registry 中有新版本的 Operator 时,相关的策略将变为不合规。

  2. 使用 ClusterGroupUpgrade CR 将更改的 Subscription 策略应用到受管集群。

12.6.2. 将更新策略应用到受管集群

您可以通过应用策略来更新受管集群。

先决条件

  • 安装 Topology Aware Lifecycle Manager(TALM)。
  • TALM 4.16 需要 RHACM 2.9 或更高版本。
  • 置备一个或多个受管集群。
  • 以具有 cluster-admin 特权的用户身份登录。
  • 在 hub 集群中创建 RHACM 策略。

流程

  1. ClusterGroupUpgrade CR 的内容保存到 cgu-1.yaml 文件中。

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-1
      namespace: default
    spec:
      managedPolicies: 1
        - policy1-common-cluster-version-policy
        - policy2-common-nto-sub-policy
        - policy3-common-ptp-sub-policy
        - policy4-common-sriov-sub-policy
      enable: false
      clusters: 2
      - spoke1
      - spoke2
      - spoke5
      - spoke6
      remediationStrategy:
        maxConcurrency: 2 3
        timeout: 240 4
      batchTimeoutAction: 5
    1
    要应用的策略的名称。
    2
    要更新的集群列表。
    3
    maxConcurrency 字段表示同时更新的集群数量。
    4
    更新超时(以分钟为单位)。
    5
    控制批处理超时时会发生什么。可能的值有 abortcontinue。如果未指定,则默认为 continue
  2. 运行以下命令来创建 ClusterGroupUpgrade CR:

    $ oc create -f cgu-1.yaml
    1. 运行以下命令,检查 hub 集群中是否已创建 ClusterGroupUpgrade CR:

      $ oc get cgu --all-namespaces

      输出示例

      NAMESPACE   NAME  AGE  STATE      DETAILS
      default     cgu-1 8m55 NotEnabled Not Enabled

    2. 运行以下命令检查更新的状态:

      $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq

      输出示例

      {
        "computedMaxConcurrency": 2,
        "conditions": [
          {
            "lastTransitionTime": "2022-02-25T15:34:07Z",
            "message": "Not enabled", 1
            "reason": "NotEnabled",
            "status": "False",
            "type": "Progressing"
          }
        ],
        "managedPoliciesContent": {
          "policy1-common-cluster-version-policy": "null",
          "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]",
          "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
          "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
        },
        "managedPoliciesForUpgrade": [
          {
            "name": "policy1-common-cluster-version-policy",
            "namespace": "default"
          },
          {
            "name": "policy2-common-nto-sub-policy",
            "namespace": "default"
          },
          {
            "name": "policy3-common-ptp-sub-policy",
            "namespace": "default"
          },
          {
            "name": "policy4-common-sriov-sub-policy",
            "namespace": "default"
          }
        ],
        "managedPoliciesNs": {
          "policy1-common-cluster-version-policy": "default",
          "policy2-common-nto-sub-policy": "default",
          "policy3-common-ptp-sub-policy": "default",
          "policy4-common-sriov-sub-policy": "default"
        },
        "placementBindings": [
          "cgu-policy1-common-cluster-version-policy",
          "cgu-policy2-common-nto-sub-policy",
          "cgu-policy3-common-ptp-sub-policy",
          "cgu-policy4-common-sriov-sub-policy"
        ],
        "placementRules": [
          "cgu-policy1-common-cluster-version-policy",
          "cgu-policy2-common-nto-sub-policy",
          "cgu-policy3-common-ptp-sub-policy",
          "cgu-policy4-common-sriov-sub-policy"
        ],
        "remediationPlan": [
          [
            "spoke1",
            "spoke2"
          ],
          [
            "spoke5",
            "spoke6"
          ]
        ],
        "status": {}
      }

      1
      ClusterGroupUpgrade CR 中的 spec.enable 字段设置为 false
  3. 运行以下命令,将 spec.enable 字段的值更改为 true

    $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \
    --patch '{"spec":{"enable":true}}' --type=merge

验证

  1. 运行以下命令检查更新的状态:

    $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq

    输出示例

    {
      "computedMaxConcurrency": 2,
      "conditions": [ 1
        {
          "lastTransitionTime": "2022-02-25T15:33:07Z",
          "message": "All selected clusters are valid",
          "reason": "ClusterSelectionCompleted",
          "status": "True",
          "type": "ClustersSelected"
        },
        {
          "lastTransitionTime": "2022-02-25T15:33:07Z",
          "message": "Completed validation",
          "reason": "ValidationCompleted",
          "status": "True",
          "type": "Validated"
        },
        {
          "lastTransitionTime": "2022-02-25T15:34:07Z",
          "message": "Remediating non-compliant policies",
          "reason": "InProgress",
          "status": "True",
          "type": "Progressing"
        }
      ],
      "managedPoliciesContent": {
        "policy1-common-cluster-version-policy": "null",
        "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]",
        "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
        "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
      },
      "managedPoliciesForUpgrade": [
        {
          "name": "policy1-common-cluster-version-policy",
          "namespace": "default"
        },
        {
          "name": "policy2-common-nto-sub-policy",
          "namespace": "default"
        },
        {
          "name": "policy3-common-ptp-sub-policy",
          "namespace": "default"
        },
        {
          "name": "policy4-common-sriov-sub-policy",
          "namespace": "default"
        }
      ],
      "managedPoliciesNs": {
        "policy1-common-cluster-version-policy": "default",
        "policy2-common-nto-sub-policy": "default",
        "policy3-common-ptp-sub-policy": "default",
        "policy4-common-sriov-sub-policy": "default"
      },
      "placementBindings": [
        "cgu-policy1-common-cluster-version-policy",
        "cgu-policy2-common-nto-sub-policy",
        "cgu-policy3-common-ptp-sub-policy",
        "cgu-policy4-common-sriov-sub-policy"
      ],
      "placementRules": [
        "cgu-policy1-common-cluster-version-policy",
        "cgu-policy2-common-nto-sub-policy",
        "cgu-policy3-common-ptp-sub-policy",
        "cgu-policy4-common-sriov-sub-policy"
      ],
      "remediationPlan": [
        [
          "spoke1",
          "spoke2"
        ],
        [
          "spoke5",
          "spoke6"
        ]
      ],
      "status": {
        "currentBatch": 1,
        "currentBatchRemediationProgress": {
           "spoke1": {
              "policyIndex": 1,
              "state": "InProgress"
           },
           "spoke2": {
              "policyIndex": 1,
              "state": "InProgress"
           }
        },
        "currentBatchStartedAt": "2022-02-25T15:54:16Z",
        "startedAt": "2022-02-25T15:54:16Z"
      }
    }

    1
    反映当前批处理的更新进度。再次运行该命令以接收有关进度的更新信息。
  2. 运行以下命令,检查策略的状态:

    oc get policies -A

    输出示例

    NAMESPACE   NAME                                        REMEDIATION ACTION    COMPLIANCE STATE     AGE
    spoke1    default.policy1-common-cluster-version-policy enforce               Compliant            18m
    spoke1    default.policy2-common-nto-sub-policy         enforce               NonCompliant         18m
    spoke2    default.policy1-common-cluster-version-policy enforce               Compliant            18m
    spoke2    default.policy2-common-nto-sub-policy         enforce               NonCompliant         18m
    spoke5    default.policy3-common-ptp-sub-policy         inform                NonCompliant         18m
    spoke5    default.policy4-common-sriov-sub-policy       inform                NonCompliant         18m
    spoke6    default.policy3-common-ptp-sub-policy         inform                NonCompliant         18m
    spoke6    default.policy4-common-sriov-sub-policy       inform                NonCompliant         18m
    default   policy1-common-ptp-sub-policy                 inform                Compliant            18m
    default   policy2-common-sriov-sub-policy               inform                NonCompliant         18m
    default   policy3-common-ptp-sub-policy                 inform                NonCompliant         18m
    default   policy4-common-sriov-sub-policy               inform                NonCompliant         18m

    • spec.remediationAction 值会更改为从当前批处理应用到集群的子策略 enforce 值。
    • spec.remedationAction 值会针对剩余的集群中的子策略保持 inform
    • 批处理完成后,spec.remediationAction 值会重新更改为 inform 强制执行的子策略。
  3. 如果策略包含 Operator 订阅,您可以在单节点集群中直接检查安装进度。

    1. 运行以下命令,导出用于检查安装的单节点集群的 KUBECONFIG 文件:

      $ export KUBECONFIG=<cluster_kubeconfig_absolute_path>
    2. 运行以下命令,检查单节点集群中存在的所有订阅,并在您要通过 ClusterGroupUpgrade CR 安装的策略中查找您要通过 ClusterGroupUpgrade CR 安装的订阅:

      $ oc get subs -A | grep -i <subscription_name>

      cluster-logging 策略的输出示例

      NAMESPACE                              NAME                         PACKAGE                      SOURCE             CHANNEL
      openshift-logging                      cluster-logging              cluster-logging              redhat-operators   stable

  4. 如果其中一个受管策略包含 ClusterVersion CR,则根据 spoke 集群运行以下命令来检查当前批处理中的平台更新状态:

    $ oc get clusterversion

    输出示例

    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.4.16.5     True        True          43s     Working towards 4.4.16.7: 71 of 735 done (9% complete)

  5. 运行以下命令检查 Operator 订阅:

    $ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"
  6. 运行以下命令,检查与所需订阅关联的单节点集群中是否存在安装计划:

    $ oc get installplan -n <subscription_namespace>

    cluster-logging Operator 的输出示例

    NAMESPACE                              NAME            CSV                                 APPROVAL   APPROVED
    openshift-logging                      install-6khtw   cluster-logging.5.3.3-4             Manual     true 1

    1
    安装计划在 TALM 批准安装计划后将其 Approval 字段设置为 Manual,其 Approved 字段会从 false 改为 true
    注意

    当 TALM 修复包含订阅的策略时,它会自动批准附加到该订阅的任何安装计划。如果需要多个安装计划将 Operator 升级到最新的已知版本,TALM 可能会批准多个安装计划,通过一个或多个中间版本进行升级以进入最终版本。

  7. 运行以下命令,检查正在安装 ClusterGroupUpgrade 的策略的 Operator 的集群服务版本是否已进入 Succeeded 阶段:

    $ oc get csv -n <operator_namespace>

    OpenShift Logging Operator 的输出示例

    NAME                    DISPLAY                     VERSION   REPLACES   PHASE
    cluster-logging.5.4.2   Red Hat OpenShift Logging   5.4.2                Succeeded

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.