Chapter 16. Topology Aware Lifecycle Manager for cluster updates

16.1. About the Topology Aware Lifecycle Manager configuration
Link kopieren

The Topology Aware Lifecycle Manager (TALM) manages the deployment of Red Hat Advanced Cluster Management (RHACM) policies for one or more OpenShift Container Platform clusters. Using TALM in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With TALM, you can control the following actions:

The timing of the update
The number of RHACM-managed clusters
The subset of managed clusters to apply the policies to
The update order of the clusters
The set of policies remediated to the cluster
The order of policies remediated to the cluster

TALM supports the orchestration of the OpenShift Container Platform y-stream and z-stream updates, and day-two operations on y-streams and z-streams.

16.2. About managed policies used with Topology Aware Lifecycle Manager
Link kopieren

The Topology Aware Lifecycle Manager (TALM) uses RHACM policies for cluster updates.

TALM can be used to manage the rollout of any policy CR where the remediationAction field is set to inform. Supported use cases include the following:

Manual user creation of policy CRs
Automatically generated policies from the PolicyGenTemplate custom resource definition (CRD)

For policies that update an Operator subscription with manual approval, TALM provides additional functionality that approves the installation of the updated Operator.

For more information about managed policies, see Policy Overview in the RHACM documentation.

For more information about the PolicyGenTemplate CRD, see the "About the PolicyGenTemplate CRD" section in "Configuring managed clusters with policies and PolicyGenTemplate resources".

16.3. Installing the Topology Aware Lifecycle Manager by using the web console
Link kopieren

You can use the OpenShift Container Platform web console to install the Topology Aware Lifecycle Manager.

Prerequisites

Install the latest version of the RHACM Operator.
Set up a hub cluster with disconnected regitry.
Log in as a user with cluster-admin privileges.

Procedure

In the OpenShift Container Platform web console, navigate to Operators OperatorHub.
Search for the Topology Aware Lifecycle Manager from the list of available Operators, and then click Install.
Keep the default selection of Installation mode ["All namespaces on the cluster (default)"] and Installed Namespace ("openshift-operators") to ensure that the Operator is installed properly.
Click Install.

Verification

To confirm that the installation is successful:

Navigate to the Operators Installed Operators page.
Check that the Operator is installed in the All Namespaces namespace and its status is Succeeded.

If the Operator is not installed successfully:

Navigate to the Operators Installed Operators page and inspect the Status column for any errors or failures.
Navigate to the Workloads Pods page and check the logs in any containers in the cluster-group-upgrades-controller-manager pod that are reporting issues.

16.4. Installing the Topology Aware Lifecycle Manager by using the CLI
Link kopieren

You can use the OpenShift CLI (oc) to install the Topology Aware Lifecycle Manager (TALM).

Prerequisites

Install the OpenShift CLI (oc).
Install the latest version of the RHACM Operator.
Set up a hub cluster with disconnected registry.
Log in as a user with cluster-admin privileges.

Procedure

Create a Subscription CR:

Define the Subscription CR and save the YAML file, for example, talm-subscription.yaml:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-topology-aware-lifecycle-manager-subscription
  namespace: openshift-operators
spec:
  channel: "stable"
  name: topology-aware-lifecycle-manager
  source: redhat-operators
  sourceNamespace: openshift-marketplace

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-topology-aware-lifecycle-manager-subscription
  namespace: openshift-operators
spec:
  channel: "stable"
  name: topology-aware-lifecycle-manager
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Copy to Clipboard

Toggle word wrap

Create the Subscription CR by running the following command:
```
oc create -f talm-subscription.yaml
```
```
$ oc create -f talm-subscription.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Verify that the installation succeeded by inspecting the CSV resource:

oc get csv -n openshift-operators

$ oc get csv -n openshift-operators

Copy to Clipboard

Toggle word wrap

Example output

NAME                                                      DISPLAY                            VERSION               REPLACES   PHASE
topology-aware-lifecycle-manager.4.10.0-202206301927      Topology Aware Lifecycle Manager   4.10.0-202206301927              Succeeded

NAME                                                      DISPLAY                            VERSION               REPLACES   PHASE
topology-aware-lifecycle-manager.4.10.0-202206301927      Topology Aware Lifecycle Manager   4.10.0-202206301927              Succeeded

Copy to Clipboard

Toggle word wrap

Verify that the TALM is up and running:

oc get deploy -n openshift-operators

$ oc get deploy -n openshift-operators

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE                                          NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
openshift-operators                                cluster-group-upgrades-controller-manager        1/1     1            1           14s

NAMESPACE                                          NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
openshift-operators                                cluster-group-upgrades-controller-manager        1/1     1            1           14s

Copy to Clipboard

Toggle word wrap

16.5. About the ClusterGroupUpgrade CR
Link kopieren

The Topology Aware Lifecycle Manager (TALM) builds the remediation plan from the ClusterGroupUpgrade CR for a group of clusters. You can define the following specifications in a ClusterGroupUpgrade CR:

Clusters in the group
Blocking ClusterGroupUpgrade CRs
Applicable list of managed policies
Number of concurrent updates
Applicable canary updates
Actions to perform before and after the update
Update timing

As TALM works through remediation of the policies to the specified clusters, the ClusterGroupUpgrade CR can have the following states:

UpgradeNotStarted
UpgradeCannotStart
UpgradeNotComplete
UpgradeTimedOut
UpgradeCompleted
PrecachingRequired

Note

After TALM completes a cluster update, the cluster does not update again under the control of the same ClusterGroupUpgrade CR. You must create a new ClusterGroupUpgrade CR in the following cases:

When you need to update the cluster again
When the cluster changes to non-compliant with the inform policy after being updated

16.5.1. The UpgradeNotStarted state
Link kopieren

The initial state of the ClusterGroupUpgrade CR is UpgradeNotStarted.

TALM builds a remediation plan based on the following fields:

The clusterSelector field specifies the labels of the clusters that you want to update.
The clusters field specifies a list of clusters to update.
The canaries field specifies the clusters for canary updates.
The maxConcurrency field specifies the number of clusters to update in a batch.

You can use the clusters and the clusterSelector fields together to create a combined list of clusters.

The remediation plan starts with the clusters listed in the canaries field. Each canary cluster forms a single-cluster batch.

Note

Any failures during the update of a canary cluster stops the update process.

The ClusterGroupUpgrade CR transitions to the UpgradeNotCompleted state after the remediation plan is successfully created and after the enable field is set to true. At this point, TALM starts to update the non-compliant clusters with the specified managed policies.

Note

You can only make changes to the spec fields if the ClusterGroupUpgrade CR is either in the UpgradeNotStarted or the UpgradeCannotStart state.

Sample ClusterGroupUpgrade CR in the UpgradeNotStarted state

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  clusters: 
  - spoke1
  enable: false
  managedPolicies: 
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy: 
    canaries: 
      - spoke1
    maxConcurrency: 1 
    timeout: 240
status: 
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  placementBindings:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  placementRules:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  remediationPlan:
  - - spoke1

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  clusters:

1


  - spoke1
  enable: false
  managedPolicies:

2


  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:

3


    canaries:

4


      - spoke1
    maxConcurrency: 1

5


    timeout: 240
status:

6


  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  placementBindings:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  placementRules:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  remediationPlan:
  - - spoke1

Copy to Clipboard

Toggle word wrap

1: Defines the list of clusters to update.
2: Lists the user-defined set of policies to remediate.
3: Defines the specifics of the cluster updates.
4: Defines the clusters for canary updates.
5: Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the maxConcurrency value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
6: Displays information about the status of the updates.

16.5.2. The UpgradeCannotStart state
Link kopieren

In the UpgradeCannotStart state, the update cannot start because of the following reasons:

Blocking CRs are missing from the system
Blocking CRs have not yet finished

16.5.3. The UpgradeNotCompleted state
Link kopieren

In the UpgradeNotCompleted state, TALM enforces the policies following the remediation plan defined in the UpgradeNotStarted state.

Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, TALM moves on to the next batch. The timeout value of a batch is the spec.timeout field divided by the number of batches in the remediation plan.

Note

The managed policies apply in the order that they are listed in the managedPolicies field in the ClusterGroupUpgrade CR. One managed policy is applied to the specified clusters at a time. After the specified clusters comply with the current policy, the next managed policy is applied to the next non-compliant cluster.

Sample ClusterGroupUpgrade CR in the UpgradeNotCompleted state

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  clusters:
  - spoke1
  enable: true 
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status: 
  conditions:
  - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant
    reason: UpgradeNotCompleted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  placementBindings:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  placementRules:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  remediationPlan:
  - - spoke1
  status:
    currentBatch: 1
    remediationPlanForBatch: 
      spoke1: 0

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  clusters:
  - spoke1
  enable: true

1


  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:

2


  conditions:
  - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant
    reason: UpgradeNotCompleted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  placementBindings:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  placementRules:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  remediationPlan:
  - - spoke1
  status:
    currentBatch: 1
    remediationPlanForBatch:

3


      spoke1: 0

Copy to Clipboard

Toggle word wrap

1: The update starts when the value of the spec.enable field is true.
2: The status fields change accordingly when the update begins.
3: Lists the clusters in the batch and the index of the policy that is being currently applied to each cluster. The index of the policies starts with 0 and the index follows the order of the status.managedPoliciesForUpgrade list.

16.5.4. The UpgradeTimedOut state
Link kopieren

In the UpgradeTimedOut state, TALM checks every hour if all the policies for the ClusterGroupUpgrade CR are compliant. The checks continue until the ClusterGroupUpgrade CR is deleted or the updates are completed. The periodic checks allow the updates to complete if they get prolonged due to network, CPU, or other issues.

TALM transitions to the UpgradeTimedOut state in two cases:

When the current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout.
When the clusters do not comply with the managed policies within the timeout value specified in the remediationStrategy field.

If the policies are compliant, TALM transitions to the UpgradeCompleted state.

16.5.5. The UpgradeCompleted state
Link kopieren

In the UpgradeCompleted state, the cluster updates are complete.

Sample ClusterGroupUpgrade CR in the UpgradeCompleted state

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  actions:
    afterCompletion:
      deleteObjects: true 
  clusters:
  - spoke1
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status: 
  conditions:
  - message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies
    reason: UpgradeCompleted
    status: "True"
    type: Ready
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  remediationPlan:
  - - spoke1
  status:
    remediationPlanForBatch:
      spoke1: -2

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  actions:
    afterCompletion:
      deleteObjects: true

1


  clusters:
  - spoke1
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:

2


  conditions:
  - message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies
    reason: UpgradeCompleted
    status: "True"
    type: Ready
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  remediationPlan:
  - - spoke1
  status:
    remediationPlanForBatch:
      spoke1: -2

3

Copy to Clipboard

Toggle word wrap

1: The value of spec.action.afterCompletion.deleteObjects field is true by default. After the update is completed, TALM deletes the underlying RHACM objects that were created during the update. This option is to prevent the RHACM hub from continuously checking for compliance after a successful update.
2: The status fields show that the updates completed successfully.
3: Displays that all the policies are applied to the cluster.

<discreet><title>The PrecachingRequired state</title>

In the PrecachingRequired state, the clusters need to have images pre-cached before the update can start. For more information about pre-caching, see the "Using the container image pre-cache feature" section.

</discreet>

16.5.6. Blocking ClusterGroupUpgrade CRs
Link kopieren

You can create multiple ClusterGroupUpgrade CRs and control their order of application.

For example, if you create ClusterGroupUpgrade CR C that blocks the start of ClusterGroupUpgrade CR A, then ClusterGroupUpgrade CR A cannot start until the status of ClusterGroupUpgrade CR C becomes UpgradeComplete.

One ClusterGroupUpgrade CR can have multiple blocking CRs. In this case, all the blocking CRs must complete before the upgrade for the current CR can start.

Prerequisites

Install the Topology Aware Lifecycle Manager (TALM).
Provision one or more managed clusters.
Log in as a user with cluster-admin privileges.
Create RHACM policies in the hub cluster.

Procedure

Save the content of the ClusterGroupUpgrade CRs in the cgu-a.yaml, cgu-b.yaml, and cgu-c.yaml files.

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-a
  namespace: default
spec:
  blockingCRs: 
  - name: cgu-c
    namespace: default
  clusters:
  - spoke1
  - spoke2
  - spoke3
  enable: false
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  remediationStrategy:
    canaries:
    - spoke1
    maxConcurrency: 2
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  placementBindings:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  placementRules:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  remediationPlan:
  - - spoke1
  - - spoke2

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-a
  namespace: default
spec:
  blockingCRs:

1


  - name: cgu-c
    namespace: default
  clusters:
  - spoke1
  - spoke2
  - spoke3
  enable: false
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  remediationStrategy:
    canaries:
    - spoke1
    maxConcurrency: 2
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  placementBindings:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  placementRules:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  remediationPlan:
  - - spoke1
  - - spoke2

Copy to Clipboard

Toggle word wrap

1: Defines the blocking CRs. The cgu-a update cannot start until cgu-c is complete.

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-b
  namespace: default
spec:
  blockingCRs: 
  - name: cgu-a
    namespace: default
  clusters:
  - spoke4
  - spoke5
  enable: false
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke4
  - - spoke5
  status: {}

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-b
  namespace: default
spec:
  blockingCRs:

1


  - name: cgu-a
    namespace: default
  clusters:
  - spoke4
  - spoke5
  enable: false
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke4
  - - spoke5
  status: {}

Copy to Clipboard

Toggle word wrap

1: The cgu-b update cannot start until cgu-a is complete.

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-c
  namespace: default
spec: 
  clusters:
  - spoke6
  enable: false
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  managedPoliciesCompliantBeforeUpgrade:
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke6
  status: {}

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-c
  namespace: default
spec:

1


  clusters:
  - spoke6
  enable: false
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  managedPoliciesCompliantBeforeUpgrade:
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke6
  status: {}

Copy to Clipboard

Toggle word wrap

1: The cgu-c update does not have any blocking CRs. TALM starts the cgu-c update when the enable field is set to true.

Create the ClusterGroupUpgrade CRs by running the following command for each relevant CR:
```
oc apply -f <name>.yaml
```
```
$ oc apply -f <name>.yaml
```
Copy to Clipboard Toggle word wrap

Start the update process by running the following command for each relevant CR:

oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \
--type merge -p '{"spec":{"enable":true}}'

$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \
--type merge -p '{"spec":{"enable":true}}'

Copy to Clipboard

Toggle word wrap

The following examples show ClusterGroupUpgrade CRs where the enable field is set to true:

Example for cgu-a with blocking CRs

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-a
  namespace: default
spec:
  blockingCRs:
  - name: cgu-c
    namespace: default
  clusters:
  - spoke1
  - spoke2
  - spoke3
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  remediationStrategy:
    canaries:
    - spoke1
    maxConcurrency: 2
    timeout: 240
status:
  conditions:
  - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
      completed: [cgu-c]' 
    reason: UpgradeCannotStart
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  placementBindings:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  placementRules:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  remediationPlan:
  - - spoke1
  - - spoke2
  status: {}

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-a
  namespace: default
spec:
  blockingCRs:
  - name: cgu-c
    namespace: default
  clusters:
  - spoke1
  - spoke2
  - spoke3
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  remediationStrategy:
    canaries:
    - spoke1
    maxConcurrency: 2
    timeout: 240
status:
  conditions:
  - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
      completed: [cgu-c]'

1


    reason: UpgradeCannotStart
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  placementBindings:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  placementRules:
  - cgu-a-policy1-common-cluster-version-policy
  - cgu-a-policy2-common-pao-sub-policy
  - cgu-a-policy3-common-ptp-sub-policy
  remediationPlan:
  - - spoke1
  - - spoke2
  status: {}

Copy to Clipboard

Toggle word wrap

1: Shows the list of blocking CRs.

Example for cgu-b with blocking CRs

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-b
  namespace: default
spec:
  blockingCRs:
  - name: cgu-a
    namespace: default
  clusters:
  - spoke4
  - spoke5
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
      completed: [cgu-a]' 
    reason: UpgradeCannotStart
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke4
  - - spoke5
  status: {}

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-b
  namespace: default
spec:
  blockingCRs:
  - name: cgu-a
    namespace: default
  clusters:
  - spoke4
  - spoke5
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
      completed: [cgu-a]'

1


    reason: UpgradeCannotStart
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  - name: policy3-common-ptp-sub-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-b-policy1-common-cluster-version-policy
  - cgu-b-policy2-common-pao-sub-policy
  - cgu-b-policy3-common-ptp-sub-policy
  - cgu-b-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke4
  - - spoke5
  status: {}

Copy to Clipboard

Toggle word wrap

1: Shows the list of blocking CRs.

Example for cgu-c with blocking CRs

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-c
  namespace: default
spec:
  clusters:
  - spoke6
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant 
    reason: UpgradeNotCompleted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  managedPoliciesCompliantBeforeUpgrade:
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke6
  status:
    currentBatch: 1
    remediationPlanForBatch:
      spoke6: 0

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-c
  namespace: default
spec:
  clusters:
  - spoke6
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  - policy4-common-sriov-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status:
  conditions:
  - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant

1


    reason: UpgradeNotCompleted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  managedPoliciesCompliantBeforeUpgrade:
  - policy2-common-pao-sub-policy
  - policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy4-common-sriov-sub-policy
    namespace: default
  placementBindings:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  placementRules:
  - cgu-c-policy1-common-cluster-version-policy
  - cgu-c-policy4-common-sriov-sub-policy
  remediationPlan:
  - - spoke6
  status:
    currentBatch: 1
    remediationPlanForBatch:
      spoke6: 0

Copy to Clipboard

Toggle word wrap

1: The cgu-c update does not have any blocking CRs.

16.6. Update policies on managed clusters
Link kopieren

The Topology Aware Lifecycle Manager (TALM) remediates a set of inform policies for the clusters specified in the ClusterGroupUpgrade CR. TALM remediates inform policies by making enforce copies of the managed RHACM policies. Each copied policy has its own corresponding RHACM placement rule and RHACM placement binding.

One by one, TALM adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, TALM skips applying that policy on the compliant cluster. TALM then moves on to applying the next policy to the non-compliant cluster. After TALM completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts.

If a spoke cluster does not report any compliant state to RHACM, the managed policies on the hub cluster can be missing status information that TALM needs. TALM handles these cases in the following ways:

If a policy’s status.compliant field is missing, TALM ignores the policy and adds a log entry. Then, TALM continues looking at the policy’s status.status field.
If a policy’s status.status is missing, TALM produces an error.
If a cluster’s compliance status is missing in the policy’s status.status field, TALM considers that cluster to be non-compliant with that policy.

For more information about RHACM policies, see Policy overview.

16.6.1. Applying update policies to managed clusters
Link kopieren

You can update your managed clusters by applying your policies.

Prerequisites

Install the Topology Aware Lifecycle Manager (TALM).
Provision one or more managed clusters.
Log in as a user with cluster-admin privileges.
Create RHACM policies in the hub cluster.

Procedure

Save the contents of the ClusterGroupUpgrade CR in the cgu-1.yaml file.

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-1
  namespace: default
spec:
  managedPolicies: 
    - policy1-common-cluster-version-policy
    - policy2-common-pao-sub-policy
    - policy3-common-ptp-sub-policy
    - policy4-common-sriov-sub-policy
  enable: false
  clusters: 
  - spoke1
  - spoke2
  - spoke5
  - spoke6
  remediationStrategy:
    maxConcurrency: 2 
    timeout: 240

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-1
  namespace: default
spec:
  managedPolicies:

1


    - policy1-common-cluster-version-policy
    - policy2-common-pao-sub-policy
    - policy3-common-ptp-sub-policy
    - policy4-common-sriov-sub-policy
  enable: false
  clusters:

2


  - spoke1
  - spoke2
  - spoke5
  - spoke6
  remediationStrategy:
    maxConcurrency: 2

3


    timeout: 240

4

Copy to Clipboard

Toggle word wrap

1: The name of the policies to apply.
2: The list of clusters to update.
3: The maxConcurrency field signifies the number of clusters updated at the same time.
4: The update timeout in minutes.

Create the ClusterGroupUpgrade CR by running the following command:

oc create -f cgu-1.yaml

$ oc create -f cgu-1.yaml

Copy to Clipboard

Toggle word wrap

Check if the ClusterGroupUpgrade CR was created in the hub cluster by running the following command:
```
oc get cgu --all-namespaces
```
```
$ oc get cgu --all-namespaces
```
Copy to Clipboard Toggle word wrap
Example output
```
NAMESPACE   NAME      AGE
default     cgu-1     8m55s
```
```
NAMESPACE   NAME      AGE
default     cgu-1     8m55s
```
Copy to Clipboard Toggle word wrap

Check the status of the update by running the following command:

oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq

$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq

Copy to Clipboard

Toggle word wrap

Example output

{
  "computedMaxConcurrency": 2,
  "conditions": [
    {
      "lastTransitionTime": "2022-02-25T15:34:07Z",
      "message": "The ClusterGroupUpgrade CR is not enabled", 
      "reason": "UpgradeNotStarted",
      "status": "False",
      "type": "Ready"
    }
  ],
  "copiedPolicies": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "managedPoliciesContent": {
    "policy1-common-cluster-version-policy": "null",
    "policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]",
    "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
    "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
  },
  "managedPoliciesForUpgrade": [
    {
      "name": "policy1-common-cluster-version-policy",
      "namespace": "default"
    },
    {
      "name": "policy2-common-pao-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy3-common-ptp-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy4-common-sriov-sub-policy",
      "namespace": "default"
    }
  ],
  "managedPoliciesNs": {
    "policy1-common-cluster-version-policy": "default",
    "policy2-common-pao-sub-policy": "default",
    "policy3-common-ptp-sub-policy": "default",
    "policy4-common-sriov-sub-policy": "default"
  },
  "placementBindings": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "placementRules": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "precaching": {
    "spec": {}
  },
  "remediationPlan": [
    [
      "spoke1",
      "spoke2"
    ],
    [
      "spoke5",
      "spoke6"
    ]
  ],
  "status": {}
}

{
  "computedMaxConcurrency": 2,
  "conditions": [
    {
      "lastTransitionTime": "2022-02-25T15:34:07Z",
      "message": "The ClusterGroupUpgrade CR is not enabled",

1


      "reason": "UpgradeNotStarted",
      "status": "False",
      "type": "Ready"
    }
  ],
  "copiedPolicies": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "managedPoliciesContent": {
    "policy1-common-cluster-version-policy": "null",
    "policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]",
    "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
    "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
  },
  "managedPoliciesForUpgrade": [
    {
      "name": "policy1-common-cluster-version-policy",
      "namespace": "default"
    },
    {
      "name": "policy2-common-pao-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy3-common-ptp-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy4-common-sriov-sub-policy",
      "namespace": "default"
    }
  ],
  "managedPoliciesNs": {
    "policy1-common-cluster-version-policy": "default",
    "policy2-common-pao-sub-policy": "default",
    "policy3-common-ptp-sub-policy": "default",
    "policy4-common-sriov-sub-policy": "default"
  },
  "placementBindings": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "placementRules": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "precaching": {
    "spec": {}
  },
  "remediationPlan": [
    [
      "spoke1",
      "spoke2"
    ],
    [
      "spoke5",
      "spoke6"
    ]
  ],
  "status": {}
}

Copy to Clipboard

Toggle word wrap

1: The spec.enable field in the ClusterGroupUpgrade CR is set to false.

Check the status of the policies by running the following command:

oc get policies -A

$ oc get policies -A

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE   NAME                                                 REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     cgu-policy1-common-cluster-version-policy            enforce                                 17m 
default     cgu-policy2-common-pao-sub-policy                    enforce                                 17m
default     cgu-policy3-common-ptp-sub-policy                    enforce                                 17m
default     cgu-policy4-common-sriov-sub-policy                  enforce                                 17m
default     policy1-common-cluster-version-policy                inform               NonCompliant       15h
default     policy2-common-pao-sub-policy                        inform               NonCompliant       15h
default     policy3-common-ptp-sub-policy                        inform               NonCompliant       18m
default     policy4-common-sriov-sub-policy                      inform               NonCompliant       18m

NAMESPACE   NAME                                                 REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     cgu-policy1-common-cluster-version-policy            enforce                                 17m

1


default     cgu-policy2-common-pao-sub-policy                    enforce                                 17m
default     cgu-policy3-common-ptp-sub-policy                    enforce                                 17m
default     cgu-policy4-common-sriov-sub-policy                  enforce                                 17m
default     policy1-common-cluster-version-policy                inform               NonCompliant       15h
default     policy2-common-pao-sub-policy                        inform               NonCompliant       15h
default     policy3-common-ptp-sub-policy                        inform               NonCompliant       18m
default     policy4-common-sriov-sub-policy                      inform               NonCompliant       18m

Copy to Clipboard

Toggle word wrap

1: The spec.remediationAction field of policies currently applied on the clusters is set to enforce. The managed policies in inform mode from the ClusterGroupUpgrade CR remain in inform mode during the update.

Change the value of the spec.enable field to true by running the following command:

oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \
--patch '{"spec":{"enable":true}}' --type=merge

$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \
--patch '{"spec":{"enable":true}}' --type=merge

Copy to Clipboard

Toggle word wrap

Verification

Check the status of the update again by running the following command:

oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq

$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq

Copy to Clipboard

Toggle word wrap

Example output

{
  "computedMaxConcurrency": 2,
  "conditions": [ 
    {
      "lastTransitionTime": "2022-02-25T15:34:07Z",
      "message": "The ClusterGroupUpgrade CR has upgrade policies that are still non compliant",
      "reason": "UpgradeNotCompleted",
      "status": "False",
      "type": "Ready"
    }
  ],
  "copiedPolicies": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "managedPoliciesContent": {
    "policy1-common-cluster-version-policy": "null",
    "policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]",
    "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
    "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
  },
  "managedPoliciesForUpgrade": [
    {
      "name": "policy1-common-cluster-version-policy",
      "namespace": "default"
    },
    {
      "name": "policy2-common-pao-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy3-common-ptp-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy4-common-sriov-sub-policy",
      "namespace": "default"
    }
  ],
  "managedPoliciesNs": {
    "policy1-common-cluster-version-policy": "default",
    "policy2-common-pao-sub-policy": "default",
    "policy3-common-ptp-sub-policy": "default",
    "policy4-common-sriov-sub-policy": "default"
  },
  "placementBindings": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "placementRules": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "precaching": {
    "spec": {}
  },
  "remediationPlan": [
    [
      "spoke1",
      "spoke2"
    ],
    [
      "spoke5",
      "spoke6"
    ]
  ],
  "status": {
    "currentBatch": 1,
    "currentBatchStartedAt": "2022-02-25T15:54:16Z",
    "remediationPlanForBatch": {
      "spoke1": 0,
      "spoke2": 1
    },
    "startedAt": "2022-02-25T15:54:16Z"
  }
}

{
  "computedMaxConcurrency": 2,
  "conditions": [

1


    {
      "lastTransitionTime": "2022-02-25T15:34:07Z",
      "message": "The ClusterGroupUpgrade CR has upgrade policies that are still non compliant",
      "reason": "UpgradeNotCompleted",
      "status": "False",
      "type": "Ready"
    }
  ],
  "copiedPolicies": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "managedPoliciesContent": {
    "policy1-common-cluster-version-policy": "null",
    "policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]",
    "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
    "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
  },
  "managedPoliciesForUpgrade": [
    {
      "name": "policy1-common-cluster-version-policy",
      "namespace": "default"
    },
    {
      "name": "policy2-common-pao-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy3-common-ptp-sub-policy",
      "namespace": "default"
    },
    {
      "name": "policy4-common-sriov-sub-policy",
      "namespace": "default"
    }
  ],
  "managedPoliciesNs": {
    "policy1-common-cluster-version-policy": "default",
    "policy2-common-pao-sub-policy": "default",
    "policy3-common-ptp-sub-policy": "default",
    "policy4-common-sriov-sub-policy": "default"
  },
  "placementBindings": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "placementRules": [
    "cgu-policy1-common-cluster-version-policy",
    "cgu-policy2-common-pao-sub-policy",
    "cgu-policy3-common-ptp-sub-policy",
    "cgu-policy4-common-sriov-sub-policy"
  ],
  "precaching": {
    "spec": {}
  },
  "remediationPlan": [
    [
      "spoke1",
      "spoke2"
    ],
    [
      "spoke5",
      "spoke6"
    ]
  ],
  "status": {
    "currentBatch": 1,
    "currentBatchStartedAt": "2022-02-25T15:54:16Z",
    "remediationPlanForBatch": {
      "spoke1": 0,
      "spoke2": 1
    },
    "startedAt": "2022-02-25T15:54:16Z"
  }
}

Copy to Clipboard

Toggle word wrap

1: Reflects the update progress of the current batch. Run this command again to receive updated information about the progress.

If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster.

Export the KUBECONFIG file of the single-node cluster you want to check the installation progress for by running the following command:
```
export KUBECONFIG=<cluster_kubeconfig_absolute_path>
```
```
$ export KUBECONFIG=<cluster_kubeconfig_absolute_path>
```
Copy to Clipboard Toggle word wrap

Check all the subscriptions present on the single-node cluster and look for the one in the policy you are trying to install through the ClusterGroupUpgrade CR by running the following command:

oc get subs -A | grep -i <subscription_name>

$ oc get subs -A | grep -i <subscription_name>

Copy to Clipboard

Toggle word wrap

Example output for cluster-logging policy

NAMESPACE                              NAME                         PACKAGE                      SOURCE             CHANNEL
openshift-logging                      cluster-logging              cluster-logging              redhat-operators   stable

NAMESPACE                              NAME                         PACKAGE                      SOURCE             CHANNEL
openshift-logging                      cluster-logging              cluster-logging              redhat-operators   stable

Copy to Clipboard

Toggle word wrap

If one of the managed policies includes a ClusterVersion CR, check the status of platform updates in the current batch by running the following command against the spoke cluster:

oc get clusterversion

$ oc get clusterversion

Copy to Clipboard

Toggle word wrap

Example output

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.5     True        True          43s     Working towards 4.9.7: 71 of 735 done (9% complete)

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.5     True        True          43s     Working towards 4.9.7: 71 of 735 done (9% complete)

Copy to Clipboard

Toggle word wrap

Check the Operator subscription by running the following command:

oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"

$ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"

Copy to Clipboard

Toggle word wrap

Check the install plans present on the single-node cluster that is associated with the desired subscription by running the following command:
```
oc get installplan -n <subscription_namespace>
```
```
$ oc get installplan -n <subscription_namespace>
```
Copy to Clipboard Toggle word wrap
Example output for cluster-logging Operator
```
NAMESPACE                              NAME            CSV                                 APPROVAL   APPROVED
openshift-logging                      install-6khtw   cluster-logging.5.3.3-4             Manual     true 
```
```
NAMESPACE                              NAME            CSV                                 APPROVAL   APPROVED
openshift-logging                      install-6khtw   cluster-logging.5.3.3-4             Manual     true 
```
1
Copy to Clipboard Toggle word wrap
1
The install plans have their Approval field set to Manual and their Approved field changes from false to true after TALM approves the install plan.
Note
When TALM is remediating a policy containing a subscription, it automatically approves any install plans attached to that subscription. Where multiple install plans are needed to get the operator to the latest known version, TALM might approve multiple install plans, upgrading through one or more intermediate versions to get to the final version.

Check if the cluster service version for the Operator of the policy that the ClusterGroupUpgrade is installing reached the Succeeded phase by running the following command:

oc get csv -n <operator_namespace>

$ oc get csv -n <operator_namespace>

Copy to Clipboard

Toggle word wrap

Example output for OpenShift Logging Operator

NAME                    DISPLAY                     VERSION   REPLACES   PHASE
cluster-logging.5.4.2   Red Hat OpenShift Logging   5.4.2                Succeeded

NAME                    DISPLAY                     VERSION   REPLACES   PHASE
cluster-logging.5.4.2   Red Hat OpenShift Logging   5.4.2                Succeeded

Copy to Clipboard

Toggle word wrap

16.7. Using the container image pre-cache feature
Link kopieren

Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed.

Note

The time of the update is not set by TALM. You can apply the ClusterGroupUpgrade CR at the beginning of the update by manual application or by external automation.

The container image pre-caching starts when the preCaching field is set to true in the ClusterGroupUpgrade CR. After a successful pre-caching process, you can start remediating policies. The remediation actions start when the enable field is set to true.

The pre-caching process can be in the following statuses:

PrecacheNotStarted

This is the initial state all clusters are automatically assigned to on the first reconciliation pass of the ClusterGroupUpgrade CR.

In this state, TALM deletes any pre-caching namespace and hub view resources of spoke clusters that remain from previous incomplete updates. TALM then creates a new ManagedClusterView resource for the spoke pre-caching namespace to verify its deletion in the PrecachePreparing state.

PrecachePreparing

Cleaning up any remaining resources from previous incomplete updates is in progress.

PrecacheStarting

Pre-caching job prerequisites and the job are created.

PrecacheActive

The job is in "Active" state.

PrecacheSucceeded

The pre-cache job has succeeded.

PrecacheTimeout

The artifact pre-caching has been partially done.

PrecacheUnrecoverableError

The job ends with a non-zero exit code.

16.7.1. Creating a ClusterGroupUpgrade CR with pre-caching
Link kopieren

The pre-cache feature allows the required container images to be present on the spoke cluster before the update starts.

Prerequisites

Install the Topology Aware Lifecycle Manager (TALM).
Provision one or more managed clusters.
Log in as a user with cluster-admin privileges.

Procedure

Save the contents of the ClusterGroupUpgrade CR with the preCaching field set to true in the clustergroupupgrades-group-du.yaml file:

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: du-upgrade-4918
  namespace: ztp-group-du-sno
spec:
  preCaching: true 
  clusters:
  - cnfdb1
  - cnfdb2
  enable: false
  managedPolicies:
  - du-upgrade-platform-upgrade
  remediationStrategy:
    maxConcurrency: 2
    timeout: 240

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: du-upgrade-4918
  namespace: ztp-group-du-sno
spec:
  preCaching: true

1


  clusters:
  - cnfdb1
  - cnfdb2
  enable: false
  managedPolicies:
  - du-upgrade-platform-upgrade
  remediationStrategy:
    maxConcurrency: 2
    timeout: 240

Copy to Clipboard

Toggle word wrap

1: The preCaching field is set to true, which enables TALM to pull the container images before starting the update.

When you want to start the update, apply the ClusterGroupUpgrade CR by running the following command:
```
oc apply -f clustergroupupgrades-group-du.yaml
```
```
$ oc apply -f clustergroupupgrades-group-du.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Check if the ClusterGroupUpgrade CR exists in the hub cluster by running the following command:
```
oc get cgu -A
```
```
$ oc get cgu -A
```
Copy to Clipboard Toggle word wrap
Example output
```
NAMESPACE          NAME              AGE
ztp-group-du-sno   du-upgrade-4918   10s 
```
```
NAMESPACE          NAME              AGE
ztp-group-du-sno   du-upgrade-4918   10s 
```
1
Copy to Clipboard Toggle word wrap
1
The CR is created.

Check the status of the pre-caching task by running the following command:

oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'

$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'

Copy to Clipboard

Toggle word wrap

Example output

{
  "conditions": [
    {
      "lastTransitionTime": "2022-01-27T19:07:24Z",
      "message": "Precaching is not completed (required)", 
      "reason": "PrecachingRequired",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2022-01-27T19:07:24Z",
      "message": "Precaching is required and not done",
      "reason": "PrecachingNotDone",
      "status": "False",
      "type": "PrecachingDone"
    },
    {
      "lastTransitionTime": "2022-01-27T19:07:34Z",
      "message": "Pre-caching spec is valid and consistent",
      "reason": "PrecacheSpecIsWellFormed",
      "status": "True",
      "type": "PrecacheSpecValid"
    }
  ],
  "precaching": {
    "clusters": [
      "cnfdb1" 
    ],
    "spec": {
      "platformImage": "image.example.io"},
    "status": {
      "cnfdb1": "Active"}
    }
}

{
  "conditions": [
    {
      "lastTransitionTime": "2022-01-27T19:07:24Z",
      "message": "Precaching is not completed (required)",

1


      "reason": "PrecachingRequired",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2022-01-27T19:07:24Z",
      "message": "Precaching is required and not done",
      "reason": "PrecachingNotDone",
      "status": "False",
      "type": "PrecachingDone"
    },
    {
      "lastTransitionTime": "2022-01-27T19:07:34Z",
      "message": "Pre-caching spec is valid and consistent",
      "reason": "PrecacheSpecIsWellFormed",
      "status": "True",
      "type": "PrecacheSpecValid"
    }
  ],
  "precaching": {
    "clusters": [
      "cnfdb1"

2


    ],
    "spec": {
      "platformImage": "image.example.io"},
    "status": {
      "cnfdb1": "Active"}
    }
}

Copy to Clipboard

Toggle word wrap

1: Displays that the update is in progress.
2: Displays the list of identified clusters.

Check the status of the pre-caching job by running the following command on the spoke cluster:

oc get jobs,pods -n openshift-talm-pre-cache

$ oc get jobs,pods -n openshift-talm-pre-cache

Copy to Clipboard

Toggle word wrap

Example output

NAME                  COMPLETIONS   DURATION   AGE
job.batch/pre-cache   0/1           3m10s      3m10s

NAME                     READY   STATUS    RESTARTS   AGE
pod/pre-cache--1-9bmlr   1/1     Running   0          3m10s

NAME                  COMPLETIONS   DURATION   AGE
job.batch/pre-cache   0/1           3m10s      3m10s

NAME                     READY   STATUS    RESTARTS   AGE
pod/pre-cache--1-9bmlr   1/1     Running   0          3m10s

Copy to Clipboard

Toggle word wrap

Check the status of the ClusterGroupUpgrade CR by running the following command:

oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'

$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'

Copy to Clipboard

Toggle word wrap

Example output

"conditions": [
    {
      "lastTransitionTime": "2022-01-27T19:30:41Z",
      "message": "The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies",
      "reason": "UpgradeCompleted",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2022-01-27T19:28:57Z",
      "message": "Precaching is completed",
      "reason": "PrecachingCompleted",
      "status": "True",
      "type": "PrecachingDone" 
    }

"conditions": [
    {
      "lastTransitionTime": "2022-01-27T19:30:41Z",
      "message": "The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies",
      "reason": "UpgradeCompleted",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2022-01-27T19:28:57Z",
      "message": "Precaching is completed",
      "reason": "PrecachingCompleted",
      "status": "True",
      "type": "PrecachingDone"

1

Copy to Clipboard

Toggle word wrap

1: The pre-cache tasks are done.

16.8. Troubleshooting the Topology Aware Lifecycle Manager
Link kopieren

The Topology Aware Lifecycle Manager (TALM) is an OpenShift Container Platform Operator that remediates RHACM policies. When issues occur, use the oc adm must-gather command to gather details and logs and to take steps in debugging the issues.

For more information about related topics, see the following documentation:

Red Hat Advanced Cluster Management for Kubernetes 2.4 Support Matrix
Red Hat Advanced Cluster Management Troubleshooting
The "Troubleshooting Operator issues" section

16.8.1. General troubleshooting
Link kopieren

You can determine the cause of the problem by reviewing the following questions:

Is the configuration that you are applying supported?
- Are the RHACM and the OpenShift Container Platform versions compatible?
- Are the TALM and RHACM versions compatible?
Which of the following components is causing the problem?

To ensure that the ClusterGroupUpgrade configuration is functional, you can do the following:

Create the ClusterGroupUpgrade CR with the spec.enable field set to false.
Wait for the status to be updated and go through the troubleshooting questions.
If everything looks as expected, set the spec.enable field to true in the ClusterGroupUpgrade CR.

Warning

After you set the spec.enable field to true in the ClusterUpgradeGroup CR, the update procedure starts and you cannot edit the CR’s spec fields anymore.

16.8.2. Cannot modify the ClusterUpgradeGroup CR
Link kopieren

Issue

You cannot edit the ClusterUpgradeGroup CR after enabling the update.

Resolution

Restart the procedure by performing the following steps:

Remove the old ClusterGroupUpgrade CR by running the following command:

oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>

$ oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>

Copy to Clipboard

Toggle word wrap

Check and fix the existing issues with the managed clusters and policies.
1. Ensure that all the clusters are managed clusters and available.
2. Ensure that all the policies exist and have the spec.remediationAction field set to inform.
Create a new ClusterGroupUpgrade CR with the correct configurations.
```
oc apply -f <ClusterGroupUpgradeCR_YAML>
```
```
$ oc apply -f <ClusterGroupUpgradeCR_YAML>
```
Copy to Clipboard Toggle word wrap

16.8.3. Managed policies
Link kopieren

Checking managed policies on the system

Issue

You want to check if you have the correct managed policies on the system.

Resolution

Run the following command:

oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'

$ oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'

Copy to Clipboard

Toggle word wrap

Example output

["group-du-sno-validator-du-validator-policy", "policy2-common-pao-sub-policy", "policy3-common-ptp-sub-policy"]

["group-du-sno-validator-du-validator-policy", "policy2-common-pao-sub-policy", "policy3-common-ptp-sub-policy"]

Copy to Clipboard

Toggle word wrap

Checking remediationAction mode

Issue

You want to check if the remediationAction field is set to inform in the spec of the managed policies.

Resolution

Run the following command:

oc get policies --all-namespaces

$ oc get policies --all-namespaces

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE   NAME                                                 REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     policy1-common-cluster-version-policy                inform               NonCompliant       5d21h
default     policy2-common-pao-sub-policy                        inform               Compliant          5d21h
default     policy3-common-ptp-sub-policy                        inform               NonCompliant       5d21h
default     policy4-common-sriov-sub-policy                      inform               NonCompliant       5d21h

NAMESPACE   NAME                                                 REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     policy1-common-cluster-version-policy                inform               NonCompliant       5d21h
default     policy2-common-pao-sub-policy                        inform               Compliant          5d21h
default     policy3-common-ptp-sub-policy                        inform               NonCompliant       5d21h
default     policy4-common-sriov-sub-policy                      inform               NonCompliant       5d21h

Copy to Clipboard

Toggle word wrap

Checking policy compliance state

Issue

You want to check the compliance state of policies.

Resolution

Run the following command:

oc get policies --all-namespaces

$ oc get policies --all-namespaces

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE   NAME                                                 REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     policy1-common-cluster-version-policy                inform               NonCompliant       5d21h
default     policy2-common-pao-sub-policy                        inform               Compliant          5d21h
default     policy3-common-ptp-sub-policy                        inform               NonCompliant       5d21h
default     policy4-common-sriov-sub-policy                      inform               NonCompliant       5d21h

NAMESPACE   NAME                                                 REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     policy1-common-cluster-version-policy                inform               NonCompliant       5d21h
default     policy2-common-pao-sub-policy                        inform               Compliant          5d21h
default     policy3-common-ptp-sub-policy                        inform               NonCompliant       5d21h
default     policy4-common-sriov-sub-policy                      inform               NonCompliant       5d21h

Copy to Clipboard

Toggle word wrap

16.8.4. Clusters
Link kopieren

Checking if managed clusters are present

Issue

You want to check if the clusters in the ClusterGroupUpgrade CR are managed clusters.

Resolution

Run the following command:

oc get managedclusters

$ oc get managedclusters

Copy to Clipboard

Toggle word wrap

Example output

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                    JOINED   AVAILABLE   AGE
local-cluster   true           https://api.hub.example.com:6443        True     Unknown     13d
spoke1          true           https://api.spoke1.example.com:6443     True     True        13d
spoke3          true           https://api.spoke3.example.com:6443     True     True        27h

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                    JOINED   AVAILABLE   AGE
local-cluster   true           https://api.hub.example.com:6443        True     Unknown     13d
spoke1          true           https://api.spoke1.example.com:6443     True     True        13d
spoke3          true           https://api.spoke3.example.com:6443     True     True        27h

Copy to Clipboard

Toggle word wrap

Alternatively, check the TALM manager logs:

Get the name of the TALM manager by running the following command:

oc get pod -n openshift-operators

$ oc get pod -n openshift-operators

Copy to Clipboard

Toggle word wrap

Example output

NAME                                                         READY   STATUS    RESTARTS   AGE
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp   2/2     Running   0          45m

NAME                                                         READY   STATUS    RESTARTS   AGE
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp   2/2     Running   0          45m

Copy to Clipboard

Toggle word wrap

Check the TALM manager logs by running the following command:

oc logs -n openshift-operators \
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager

$ oc logs -n openshift-operators \
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager

Copy to Clipboard

Toggle word wrap

Example output

ERROR	controller-runtime.manager.controller.clustergroupupgrade	Reconciler error	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} 
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

ERROR	controller-runtime.manager.controller.clustergroupupgrade	Reconciler error	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}

1


sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

Copy to Clipboard

Toggle word wrap

1: The error message shows that the cluster is not a managed cluster.

Checking if managed clusters are available

Issue

You want to check if the managed clusters specified in the ClusterGroupUpgrade CR are available.

Resolution

Run the following command:

oc get managedclusters

$ oc get managedclusters

Copy to Clipboard

Toggle word wrap

Example output

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                    JOINED   AVAILABLE   AGE
local-cluster   true           https://api.hub.testlab.com:6443        True     Unknown     13d
spoke1          true           https://api.spoke1.testlab.com:6443     True     True        13d 
spoke3          true           https://api.spoke3.testlab.com:6443     True     True        27h

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                    JOINED   AVAILABLE   AGE
local-cluster   true           https://api.hub.testlab.com:6443        True     Unknown     13d
spoke1          true           https://api.spoke1.testlab.com:6443     True     True        13d

1


spoke3          true           https://api.spoke3.testlab.com:6443     True     True        27h

2

Copy to Clipboard

Toggle word wrap

1 2: The value of the AVAILABLE field is True for the managed clusters.

Checking clusterSelector

Issue

You want to check if the clusterSelector field is specified in the ClusterGroupUpgrade CR in at least one of the managed clusters.

Resolution

Run the following command:

oc get managedcluster --selector=upgrade=true

$ oc get managedcluster --selector=upgrade=true

1

Copy to Clipboard

Toggle word wrap

1: The label for the clusters you want to update is upgrade:true.

Example output

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                     JOINED    AVAILABLE   AGE
spoke1          true           https://api.spoke1.testlab.com:6443      True     True        13d
spoke3          true           https://api.spoke3.testlab.com:6443      True     True        27h

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                     JOINED    AVAILABLE   AGE
spoke1          true           https://api.spoke1.testlab.com:6443      True     True        13d
spoke3          true           https://api.spoke3.testlab.com:6443      True     True        27h

Copy to Clipboard

Toggle word wrap

Checking if canary clusters are present

Issue

You want to check if the canary clusters are present in the list of clusters.

Example ClusterGroupUpgrade CR

spec:
    clusters:
    - spoke1
    - spoke3
    clusterSelector:
    - upgrade2=true
    remediationStrategy:
        canaries:
        - spoke3
        maxConcurrency: 2
        timeout: 240

spec:
    clusters:
    - spoke1
    - spoke3
    clusterSelector:
    - upgrade2=true
    remediationStrategy:
        canaries:
        - spoke3
        maxConcurrency: 2
        timeout: 240

Copy to Clipboard

Toggle word wrap

Resolution

Run the following commands:

oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'

$ oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'

Copy to Clipboard

Toggle word wrap

Example output

["spoke1", "spoke3"]

["spoke1", "spoke3"]

Copy to Clipboard

Toggle word wrap

Check if the canary clusters are present in the list of clusters that match clusterSelector labels by running the following command:

oc get managedcluster --selector=upgrade=true

$ oc get managedcluster --selector=upgrade=true

Copy to Clipboard

Toggle word wrap

Example output

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS   JOINED    AVAILABLE   AGE
spoke1          true           https://api.spoke1.testlab.com:6443   True     True        13d
spoke3          true           https://api.spoke3.testlab.com:6443   True     True        27h

NAME            HUB ACCEPTED   MANAGED CLUSTER URLS   JOINED    AVAILABLE   AGE
spoke1          true           https://api.spoke1.testlab.com:6443   True     True        13d
spoke3          true           https://api.spoke3.testlab.com:6443   True     True        27h

Copy to Clipboard

Toggle word wrap

Note

A cluster can be present in spec.clusters and also be matched by the spec.clusterSelecter label.

Checking the pre-caching status on spoke clusters

Check the status of pre-caching by running the following command on the spoke cluster:
```
oc get jobs,pods -n openshift-talo-pre-cache
```
```
$ oc get jobs,pods -n openshift-talo-pre-cache
```
Copy to Clipboard Toggle word wrap

16.8.5. Remediation Strategy
Link kopieren

Checking if remediationStrategy is present in the ClusterGroupUpgrade CR

Issue

You want to check if the remediationStrategy is present in the ClusterGroupUpgrade CR.

Resolution

Run the following command:

oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'

$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'

Copy to Clipboard

Toggle word wrap

Example output

{"maxConcurrency":2, "timeout":240}

{"maxConcurrency":2, "timeout":240}

Copy to Clipboard

Toggle word wrap

Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR

Issue

You want to check if the maxConcurrency is specified in the ClusterGroupUpgrade CR.

Resolution

Run the following command:

oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'

$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'

Copy to Clipboard

Toggle word wrap

Example output

Copy to Clipboard

Toggle word wrap

16.8.6. Topology Aware Lifecycle Manager
Link kopieren

Checking condition message and status in the ClusterGroupUpgrade CR

Issue

You want to check the value of the status.conditions field in the ClusterGroupUpgrade CR.

Resolution

Run the following command:

oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'

$ oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'

Copy to Clipboard

Toggle word wrap

Example output

{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"The ClusterGroupUpgrade CR has managed policies that are missing:[policyThatDoesntExist]", "reason":"UpgradeCannotStart", "status":"False", "type":"Ready"}

{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"The ClusterGroupUpgrade CR has managed policies that are missing:[policyThatDoesntExist]", "reason":"UpgradeCannotStart", "status":"False", "type":"Ready"}

Copy to Clipboard

Toggle word wrap

Checking corresponding copied policies

Issue

You want to check if every policy from status.managedPoliciesForUpgrade has a corresponding policy in status.copiedPolicies.

Resolution

Run the following command:

oc get cgu lab-upgrade -oyaml

$ oc get cgu lab-upgrade -oyaml

Copy to Clipboard

Toggle word wrap

Example output

status:
  …
  copiedPolicies:
  - lab-upgrade-policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy3-common-ptp-sub-policy
    namespace: default

status:
  …
  copiedPolicies:
  - lab-upgrade-policy3-common-ptp-sub-policy
  managedPoliciesForUpgrade:
  - name: policy3-common-ptp-sub-policy
    namespace: default

Copy to Clipboard

Toggle word wrap

Checking if status.remediationPlan was computed

Issue

You want to check if status.remediationPlan is computed.

Resolution

Run the following command:

oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'

$ oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'

Copy to Clipboard

Toggle word wrap

Example output

[["spoke2", "spoke3"]]

[["spoke2", "spoke3"]]

Copy to Clipboard

Toggle word wrap

Errors in the TALM manager container

Issue

You want to check the logs of the manager container of TALM.

Resolution

Run the following command:

oc logs -n openshift-operators \
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager

$ oc logs -n openshift-operators \
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager

Copy to Clipboard

Toggle word wrap

Example output

ERROR	controller-runtime.manager.controller.clustergroupupgrade	Reconciler error	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} 
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

ERROR	controller-runtime.manager.controller.clustergroupupgrade	Reconciler error	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}

1


sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

Copy to Clipboard

Toggle word wrap

1: Displays the error.

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

16.1. About the Topology Aware Lifecycle Manager configurationLink kopierenLink in die Zwischenablage kopiert!

16.2. About managed policies used with Topology Aware Lifecycle ManagerLink kopierenLink in die Zwischenablage kopiert!

16.3. Installing the Topology Aware Lifecycle Manager by using the web consoleLink kopierenLink in die Zwischenablage kopiert!

16.4. Installing the Topology Aware Lifecycle Manager by using the CLILink kopierenLink in die Zwischenablage kopiert!

16.5. About the ClusterGroupUpgrade CRLink kopierenLink in die Zwischenablage kopiert!

16.5.1. The UpgradeNotStarted stateLink kopierenLink in die Zwischenablage kopiert!

16.5.2. The UpgradeCannotStart stateLink kopierenLink in die Zwischenablage kopiert!

16.5.3. The UpgradeNotCompleted stateLink kopierenLink in die Zwischenablage kopiert!

16.5.4. The UpgradeTimedOut stateLink kopierenLink in die Zwischenablage kopiert!

16.5.5. The UpgradeCompleted stateLink kopierenLink in die Zwischenablage kopiert!

16.5.6. Blocking ClusterGroupUpgrade CRsLink kopierenLink in die Zwischenablage kopiert!

16.6. Update policies on managed clustersLink kopierenLink in die Zwischenablage kopiert!

16.6.1. Applying update policies to managed clustersLink kopierenLink in die Zwischenablage kopiert!

16.7. Using the container image pre-cache featureLink kopierenLink in die Zwischenablage kopiert!

16.7.1. Creating a ClusterGroupUpgrade CR with pre-cachingLink kopierenLink in die Zwischenablage kopiert!

16.8. Troubleshooting the Topology Aware Lifecycle ManagerLink kopierenLink in die Zwischenablage kopiert!

16.8.1. General troubleshootingLink kopierenLink in die Zwischenablage kopiert!

16.8.2. Cannot modify the ClusterUpgradeGroup CRLink kopierenLink in die Zwischenablage kopiert!

16.8.3. Managed policiesLink kopierenLink in die Zwischenablage kopiert!

Checking managed policies on the system

Checking remediationAction mode

Checking policy compliance state

16.8.4. ClustersLink kopierenLink in die Zwischenablage kopiert!

Checking if managed clusters are present

Checking if managed clusters are available

Checking clusterSelector

Checking if canary clusters are present

Checking the pre-caching status on spoke clusters

16.8.5. Remediation StrategyLink kopierenLink in die Zwischenablage kopiert!

Checking if remediationStrategy is present in the ClusterGroupUpgrade CR

Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR

16.8.6. Topology Aware Lifecycle ManagerLink kopierenLink in die Zwischenablage kopiert!

Checking condition message and status in the ClusterGroupUpgrade CR

Checking corresponding copied policies

Checking if status.remediationPlan was computed

Errors in the TALM manager container

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

16.1. About the Topology Aware Lifecycle Manager configuration
Link kopieren

16.2. About managed policies used with Topology Aware Lifecycle Manager
Link kopieren

16.3. Installing the Topology Aware Lifecycle Manager by using the web console
Link kopieren

16.4. Installing the Topology Aware Lifecycle Manager by using the CLI
Link kopieren

16.5. About the ClusterGroupUpgrade CR
Link kopieren

16.5.1. The UpgradeNotStarted state
Link kopieren

16.5.2. The UpgradeCannotStart state
Link kopieren

16.5.3. The UpgradeNotCompleted state
Link kopieren

16.5.4. The UpgradeTimedOut state
Link kopieren

16.5.5. The UpgradeCompleted state
Link kopieren

16.5.6. Blocking ClusterGroupUpgrade CRs
Link kopieren

16.6. Update policies on managed clusters
Link kopieren

16.6.1. Applying update policies to managed clusters
Link kopieren

16.7. Using the container image pre-cache feature
Link kopieren

16.7.1. Creating a ClusterGroupUpgrade CR with pre-caching
Link kopieren

16.8. Troubleshooting the Topology Aware Lifecycle Manager
Link kopieren

16.8.1. General troubleshooting
Link kopieren

16.8.2. Cannot modify the ClusterUpgradeGroup CR
Link kopieren

16.8.3. Managed policies
Link kopieren

16.8.4. Clusters
Link kopieren

16.8.5. Remediation Strategy
Link kopieren

16.8.6. Topology Aware Lifecycle Manager
Link kopieren