Questo contenuto non è disponibile nella lingua selezionata.
Chapter 18. Topology Aware Lifecycle Manager for cluster updates
You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of multiple single-node OpenShift clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.
Topology Aware Lifecycle Manager is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
18.1. About the Topology Aware Lifecycle Manager configuration Copia collegamentoCollegamento copiato negli appunti!
The Topology Aware Lifecycle Manager (TALM) manages the deployment of Red Hat Advanced Cluster Management (RHACM) policies for one or more OpenShift Container Platform clusters. Using TALM in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With TALM, you can control the following actions:
- The timing of the update
- The number of RHACM-managed clusters
- The subset of managed clusters to apply the policies to
- The update order of the clusters
- The set of policies remediated to the cluster
- The order of policies remediated to the cluster
TALM supports the orchestration of the OpenShift Container Platform y-stream and z-stream updates, and day-two operations on y-streams and z-streams.
18.2. About managed policies used with Topology Aware Lifecycle Manager Copia collegamentoCollegamento copiato negli appunti!
The Topology Aware Lifecycle Manager (TALM) uses RHACM policies for cluster updates.
TALM can be used to manage the rollout of any policy CR where the
remediationAction
inform
- Manual user creation of policy CRs
-
Automatically generated policies from the custom resource definition (CRD)
PolicyGenTemplate
For policies that update an Operator subscription with manual approval, TALM provides additional functionality that approves the installation of the updated Operator.
For more information about managed policies, see Policy Overview in the RHACM documentation.
For more information about the
PolicyGenTemplate
18.3. Installing the Topology Aware Lifecycle Manager by using the web console Copia collegamentoCollegamento copiato negli appunti!
You can use the OpenShift Container Platform web console to install the Topology Aware Lifecycle Manager.
Prerequisites
- Install the latest version of the RHACM Operator.
- Set up a hub cluster with disconnected regitry.
-
Log in as a user with privileges.
cluster-admin
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
OperatorHub. - Search for the Topology Aware Lifecycle Manager from the list of available Operators, and then click Install.
- Keep the default selection of Installation mode ["All namespaces on the cluster (default)"] and Installed Namespace ("openshift-operators") to ensure that the Operator is installed properly.
- Click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the namespace and its status is
All Namespaces.Succeeded
If the Operator is not installed successfully:
-
Navigate to the Operators
Installed Operators page and inspect the column for any errors or failures.Status -
Navigate to the Workloads
Pods page and check the logs in any containers in the pod that are reporting issues.cluster-group-upgrades-controller-manager
18.4. Installing the Topology Aware Lifecycle Manager by using the CLI Copia collegamentoCollegamento copiato negli appunti!
You can use the OpenShift CLI (
oc
Prerequisites
-
Install the OpenShift CLI ().
oc - Install the latest version of the RHACM Operator.
- Set up a hub cluster with disconnected registry.
-
Log in as a user with privileges.
cluster-admin
Procedure
Create a
CR:SubscriptionDefine the
CR and save the YAML file, for example,Subscription:talm-subscription.yamlapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: openshift-topology-aware-lifecycle-manager-subscription namespace: openshift-operators spec: channel: "stable" name: topology-aware-lifecycle-manager source: redhat-operators sourceNamespace: openshift-marketplaceCreate the
CR by running the following command:Subscription$ oc create -f talm-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-operatorsExample output
NAME DISPLAY VERSION REPLACES PHASE topology-aware-lifecycle-manager.4.11.x Topology Aware Lifecycle Manager 4.11.x SucceededVerify that the TALM is up and running:
$ oc get deploy -n openshift-operatorsExample output
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s
18.5. About the ClusterGroupUpgrade CR Copia collegamentoCollegamento copiato negli appunti!
The Topology Aware Lifecycle Manager (TALM) builds the remediation plan from the
ClusterGroupUpgrade
ClusterGroupUpgrade
- Clusters in the group
-
Blocking CRs
ClusterGroupUpgrade - Applicable list of managed policies
- Number of concurrent updates
- Applicable canary updates
- Actions to perform before and after the update
- Update timing
As TALM works through remediation of the policies to the specified clusters, the
ClusterGroupUpgrade
-
UpgradeNotStarted -
UpgradeCannotStart -
UpgradeNotComplete -
UpgradeTimedOut -
UpgradeCompleted -
PrecachingRequired
After TALM completes a cluster update, the cluster does not update again under the control of the same
ClusterGroupUpgrade
ClusterGroupUpgrade
- When you need to update the cluster again
-
When the cluster changes to non-compliant with the policy after being updated
inform
18.5.1. The UpgradeNotStarted state Copia collegamentoCollegamento copiato negli appunti!
The initial state of the
ClusterGroupUpgrade
UpgradeNotStarted
TALM builds a remediation plan based on the following fields:
-
The field specifies the labels of the clusters that you want to update.
clusterSelector -
The field specifies a list of clusters to update.
clusters -
The field specifies the clusters for canary updates.
canaries -
The field specifies the number of clusters to update in a batch.
maxConcurrency
You can use the
clusters
clusterSelector
The remediation plan starts with the clusters listed in the
canaries
Any failures during the update of a canary cluster stops the update process.
The
ClusterGroupUpgrade
UpgradeNotCompleted
enable
true
You can only make changes to the
spec
ClusterGroupUpgrade
UpgradeNotStarted
UpgradeCannotStart
Sample ClusterGroupUpgrade CR in the UpgradeNotStarted state
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
clusters:
- spoke1
enable: false
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-nto-sub-policy
remediationStrategy:
canaries:
- spoke1
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR is not enabled
reason: UpgradeNotStarted
status: "False"
type: Ready
copiedPolicies:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-nto-sub-policy
namespace: default
placementBindings:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
placementRules:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
remediationPlan:
- - spoke1
- 1
- Defines the list of clusters to update.
- 2
- Lists the user-defined set of policies to remediate.
- 3
- Defines the specifics of the cluster updates.
- 4
- Defines the clusters for canary updates.
- 5
- Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the
maxConcurrencyvalue. The clusters that are already compliant with all the managed policies are excluded from the remediation plan. - 6
- Displays information about the status of the updates.
18.5.2. The UpgradeCannotStart state Copia collegamentoCollegamento copiato negli appunti!
In the
UpgradeCannotStart
- Blocking CRs are missing from the system
- Blocking CRs have not yet finished
18.5.3. The UpgradeNotCompleted state Copia collegamentoCollegamento copiato negli appunti!
In the
UpgradeNotCompleted
UpgradeNotStarted
Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, TALM moves on to the next batch. The timeout value of a batch is the
spec.timeout
The managed policies apply in the order that they are listed in the
managedPolicies
ClusterGroupUpgrade
Sample ClusterGroupUpgrade CR in the UpgradeNotCompleted state
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
clusters:
- spoke1
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-nto-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant
reason: UpgradeNotCompleted
status: "False"
type: Ready
copiedPolicies:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-nto-sub-policy
namespace: default
placementBindings:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
placementRules:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-nto-sub-policy
remediationPlan:
- - spoke1
status:
currentBatch: 1
remediationPlanForBatch:
spoke1: 0
- 1
- The update starts when the value of the
spec.enablefield istrue. - 2
- The
statusfields change accordingly when the update begins. - 3
- Lists the clusters in the batch and the index of the policy that is being currently applied to each cluster. The index of the policies starts with
0and the index follows the order of thestatus.managedPoliciesForUpgradelist.
18.5.4. The UpgradeTimedOut state Copia collegamentoCollegamento copiato negli appunti!
In the
UpgradeTimedOut
ClusterGroupUpgrade
ClusterGroupUpgrade
TALM transitions to the
UpgradeTimedOut
- When the current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout.
-
When the clusters do not comply with the managed policies within the value specified in the
timeoutfield.remediationStrategy
If the policies are compliant, TALM transitions to the
UpgradeCompleted
18.5.5. The UpgradeCompleted state Copia collegamentoCollegamento copiato negli appunti!
In the
UpgradeCompleted
Sample ClusterGroupUpgrade CR in the UpgradeCompleted state
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
actions:
afterCompletion:
deleteObjects: true
clusters:
- spoke1
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-nto-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies
reason: UpgradeCompleted
status: "True"
type: Ready
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-nto-sub-policy
namespace: default
remediationPlan:
- - spoke1
status:
remediationPlanForBatch:
spoke1: -2
- 1
- The value of
spec.action.afterCompletion.deleteObjectsfield istrueby default. After the update is completed, TALM deletes the underlying RHACM objects that were created during the update. This option is to prevent the RHACM hub from continuously checking for compliance after a successful update. - 2
- The
statusfields show that the updates completed successfully. - 3
- Displays that all the policies are applied to the cluster.
In the
PrecachingRequired
18.5.6. Blocking ClusterGroupUpgrade CRs Copia collegamentoCollegamento copiato negli appunti!
You can create multiple
ClusterGroupUpgrade
For example, if you create
ClusterGroupUpgrade
ClusterGroupUpgrade
ClusterGroupUpgrade
ClusterGroupUpgrade
UpgradeComplete
One
ClusterGroupUpgrade
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with privileges.
cluster-admin - Create RHACM policies in the hub cluster.
Procedure
Save the content of the
CRs in theClusterGroupUpgrade,cgu-a.yaml, andcgu-b.yamlfiles.cgu-c.yamlapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-a namespace: default spec: blockingCRs:1 - name: cgu-c namespace: default clusters: - spoke1 - spoke2 - spoke3 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 2 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default placementBindings: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy placementRules: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy remediationPlan: - - spoke1 - - spoke2- 1
- Defines the blocking CRs. The
cgu-aupdate cannot start untilcgu-cis complete.
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-b namespace: default spec: blockingCRs:1 - name: cgu-a namespace: default clusters: - spoke4 - spoke5 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy placementRules: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy remediationPlan: - - spoke4 - - spoke5 status: {}- 1
- The
cgu-bupdate cannot start untilcgu-ais complete.
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-c namespace: default spec:1 clusters: - spoke6 enable: false managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR is not enabled reason: UpgradeNotStarted status: "False" type: Ready copiedPolicies: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy managedPoliciesCompliantBeforeUpgrade: - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy placementRules: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy remediationPlan: - - spoke6 status: {}- 1
- The
cgu-cupdate does not have any blocking CRs. TALM starts thecgu-cupdate when theenablefield is set totrue.
Create the
CRs by running the following command for each relevant CR:ClusterGroupUpgrade$ oc apply -f <name>.yamlStart the update process by running the following command for each relevant CR:
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'The following examples show
CRs where theClusterGroupUpgradefield is set toenable:trueExample for
cgu-awith blocking CRsapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-a namespace: default spec: blockingCRs: - name: cgu-c namespace: default clusters: - spoke1 - spoke2 - spoke3 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy remediationStrategy: canaries: - spoke1 maxConcurrency: 2 timeout: 240 status: conditions: - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet completed: [cgu-c]'1 reason: UpgradeCannotStart status: "False" type: Ready copiedPolicies: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default placementBindings: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy placementRules: - cgu-a-policy1-common-cluster-version-policy - cgu-a-policy2-common-pao-sub-policy - cgu-a-policy3-common-ptp-sub-policy remediationPlan: - - spoke1 - - spoke2 status: {}- 1
- Shows the list of blocking CRs.
Example for
cgu-bwith blocking CRsapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-b namespace: default spec: blockingCRs: - name: cgu-a namespace: default clusters: - spoke4 - spoke5 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet completed: [cgu-a]'1 reason: UpgradeCannotStart status: "False" type: Ready copiedPolicies: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy2-common-pao-sub-policy namespace: default - name: policy3-common-ptp-sub-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy placementRules: - cgu-b-policy1-common-cluster-version-policy - cgu-b-policy2-common-pao-sub-policy - cgu-b-policy3-common-ptp-sub-policy - cgu-b-policy4-common-sriov-sub-policy remediationPlan: - - spoke4 - - spoke5 status: {}- 1
- Shows the list of blocking CRs.
Example for
cgu-cwith blocking CRsapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-c namespace: default spec: clusters: - spoke6 enable: true managedPolicies: - policy1-common-cluster-version-policy - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy remediationStrategy: maxConcurrency: 1 timeout: 240 status: conditions: - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant1 reason: UpgradeNotCompleted status: "False" type: Ready copiedPolicies: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy managedPoliciesCompliantBeforeUpgrade: - policy2-common-pao-sub-policy - policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy1-common-cluster-version-policy namespace: default - name: policy4-common-sriov-sub-policy namespace: default placementBindings: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy placementRules: - cgu-c-policy1-common-cluster-version-policy - cgu-c-policy4-common-sriov-sub-policy remediationPlan: - - spoke6 status: currentBatch: 1 remediationPlanForBatch: spoke6: 0- 1
- The
cgu-cupdate does not have any blocking CRs.
18.6. Update policies on managed clusters Copia collegamentoCollegamento copiato negli appunti!
The Topology Aware Lifecycle Manager (TALM) remediates a set of
inform
ClusterGroupUpgrade
inform
enforce
One by one, TALM adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, TALM skips applying that policy on the compliant cluster. TALM then moves on to applying the next policy to the non-compliant cluster. After TALM completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts.
If a spoke cluster does not report any compliant state to RHACM, the managed policies on the hub cluster can be missing status information that TALM needs. TALM handles these cases in the following ways:
-
If a policy’s field is missing, TALM ignores the policy and adds a log entry. Then, TALM continues looking at the policy’s
status.compliantfield.status.status -
If a policy’s is missing, TALM produces an error.
status.status -
If a cluster’s compliance status is missing in the policy’s field, TALM considers that cluster to be non-compliant with that policy.
status.status
For more information about RHACM policies, see Policy overview.
18.6.1. Applying update policies to managed clusters Copia collegamentoCollegamento copiato negli appunti!
You can update your managed clusters by applying your policies.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with privileges.
cluster-admin - Create RHACM policies in the hub cluster.
Procedure
Save the contents of the
CR in theClusterGroupUpgradefile.cgu-1.yamlapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-1 namespace: default spec: managedPolicies:1 - policy1-common-cluster-version-policy - policy2-common-nto-sub-policy - policy3-common-ptp-sub-policy - policy4-common-sriov-sub-policy enable: false clusters:2 - spoke1 - spoke2 - spoke5 - spoke6 remediationStrategy: maxConcurrency: 23 timeout: 2404 Create the
CR by running the following command:ClusterGroupUpgrade$ oc create -f cgu-1.yamlCheck if the
CR was created in the hub cluster by running the following command:ClusterGroupUpgrade$ oc get cgu --all-namespacesExample output
NAMESPACE NAME AGE default cgu-1 8m55sCheck the status of the update by running the following command:
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jqExample output
{ "computedMaxConcurrency": 2, "conditions": [ { "lastTransitionTime": "2022-02-25T15:34:07Z", "message": "The ClusterGroupUpgrade CR is not enabled",1 "reason": "UpgradeNotStarted", "status": "False", "type": "Ready" } ], "copiedPolicies": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "managedPoliciesContent": { "policy1-common-cluster-version-policy": "null", "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]", "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" }, "managedPoliciesForUpgrade": [ { "name": "policy1-common-cluster-version-policy", "namespace": "default" }, { "name": "policy2-common-nto-sub-policy", "namespace": "default" }, { "name": "policy3-common-ptp-sub-policy", "namespace": "default" }, { "name": "policy4-common-sriov-sub-policy", "namespace": "default" } ], "managedPoliciesNs": { "policy1-common-cluster-version-policy": "default", "policy2-common-nto-sub-policy": "default", "policy3-common-ptp-sub-policy": "default", "policy4-common-sriov-sub-policy": "default" }, "placementBindings": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "placementRules": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "precaching": { "spec": {} }, "remediationPlan": [ [ "spoke1", "spoke2" ], [ "spoke5", "spoke6" ] ], "status": {} }- 1
- The
spec.enablefield in theClusterGroupUpgradeCR is set tofalse.
Check the status of the policies by running the following command:
$ oc get policies -AExample output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default cgu-policy1-common-cluster-version-policy enforce 17m1 default cgu-policy2-common-nto-sub-policy enforce 17m default cgu-policy3-common-ptp-sub-policy enforce 17m default cgu-policy4-common-sriov-sub-policy enforce 17m default policy1-common-cluster-version-policy inform NonCompliant 15h default policy2-common-nto-sub-policy inform NonCompliant 15h default policy3-common-ptp-sub-policy inform NonCompliant 18m default policy4-common-sriov-sub-policy inform NonCompliant 18m- 1
- The
spec.remediationActionfield of policies currently applied on the clusters is set toenforce. The managed policies ininformmode from theClusterGroupUpgradeCR remain ininformmode during the update.
Change the value of the
field tospec.enableby running the following command:true$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge
Verification
Check the status of the update again by running the following command:
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jqExample output
{ "computedMaxConcurrency": 2, "conditions": [1 { "lastTransitionTime": "2022-02-25T15:34:07Z", "message": "The ClusterGroupUpgrade CR has upgrade policies that are still non compliant", "reason": "UpgradeNotCompleted", "status": "False", "type": "Ready" } ], "copiedPolicies": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "managedPoliciesContent": { "policy1-common-cluster-version-policy": "null", "policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]", "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" }, "managedPoliciesForUpgrade": [ { "name": "policy1-common-cluster-version-policy", "namespace": "default" }, { "name": "policy2-common-nto-sub-policy", "namespace": "default" }, { "name": "policy3-common-ptp-sub-policy", "namespace": "default" }, { "name": "policy4-common-sriov-sub-policy", "namespace": "default" } ], "managedPoliciesNs": { "policy1-common-cluster-version-policy": "default", "policy2-common-nto-sub-policy": "default", "policy3-common-ptp-sub-policy": "default", "policy4-common-sriov-sub-policy": "default" }, "placementBindings": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "placementRules": [ "cgu-policy1-common-cluster-version-policy", "cgu-policy2-common-nto-sub-policy", "cgu-policy3-common-ptp-sub-policy", "cgu-policy4-common-sriov-sub-policy" ], "precaching": { "spec": {} }, "remediationPlan": [ [ "spoke1", "spoke2" ], [ "spoke5", "spoke6" ] ], "status": { "currentBatch": 1, "currentBatchStartedAt": "2022-02-25T15:54:16Z", "remediationPlanForBatch": { "spoke1": 0, "spoke2": 1 }, "startedAt": "2022-02-25T15:54:16Z" } }- 1
- Reflects the update progress of the current batch. Run this command again to receive updated information about the progress.
If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster.
Export the
file of the single-node cluster you want to check the installation progress for by running the following command:KUBECONFIG$ export KUBECONFIG=<cluster_kubeconfig_absolute_path>Check all the subscriptions present on the single-node cluster and look for the one in the policy you are trying to install through the
CR by running the following command:ClusterGroupUpgrade$ oc get subs -A | grep -i <subscription_name>Example output for
cluster-loggingpolicyNAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable
If one of the managed policies includes a
CR, check the status of platform updates in the current batch by running the following command against the spoke cluster:ClusterVersion$ oc get clusterversionExample output
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.5 True True 43s Working towards 4.9.7: 71 of 735 done (9% complete)Check the Operator subscription by running the following command:
$ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"Check the install plans present on the single-node cluster that is associated with the desired subscription by running the following command:
$ oc get installplan -n <subscription_namespace>Example output for
cluster-loggingOperatorNAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true1 - 1
- The install plans have their
Approvalfield set toManualand theirApprovedfield changes fromfalsetotrueafter TALM approves the install plan.
NoteWhen TALM is remediating a policy containing a subscription, it automatically approves any install plans attached to that subscription. Where multiple install plans are needed to get the operator to the latest known version, TALM might approve multiple install plans, upgrading through one or more intermediate versions to get to the final version.
Check if the cluster service version for the Operator of the policy that the
is installing reached theClusterGroupUpgradephase by running the following command:Succeeded$ oc get csv -n <operator_namespace>Example output for OpenShift Logging Operator
NAME DISPLAY VERSION REPLACES PHASE cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded
18.7. Creating a backup of cluster resources before upgrade Copia collegamentoCollegamento copiato negli appunti!
For single-node OpenShift, the Topology Aware Lifecycle Manager (TALM) can create a backup of a deployment before an upgrade. If the upgrade fails, you can recover the previous version and restore a cluster to a working state without requiring a reprovision of applications.
The container image backup starts when the
backup
true
ClusterGroupUpgrade
The backup process can be in the following statuses:
BackupStatePreparingToStart- The first reconciliation pass is in progress. The TALM deletes any spoke backup namespace and hub view resources that have been created in a failed upgrade attempt.
BackupStateStarting- The backup prerequisites and backup job are being created.
BackupStateActive- The backup is in progress.
BackupStateSucceeded- The backup has succeeded.
BackupStateTimeout- Artifact backup has been partially done.
BackupStateError- The backup has ended with a non-zero exit code.
If the backup fails and enters the
BackupStateTimeout
BackupStateError
18.7.1. Creating a ClusterGroupUpgrade CR with backup Copia collegamentoCollegamento copiato negli appunti!
For single-node OpenShift, you can create a backup of a deployment before an upgrade. If the upgrade fails you can use the
upgrade-recovery.sh
- Cluster backup
-
A snapshot of
etcdand static pod manifests. - Content backup
-
Backups of folders, for example,
/etc,/usr/local,/var/lib/kubelet. - Changed files backup
-
Any files managed by
machine-configthat have been changed. - Deployment
-
A pinned
ostreedeployment. - Images (Optional)
- Any container images that are in use.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with privileges.
cluster-admin - Install Red Hat Advanced Cluster Management (RHACM).
It is highly recommended that you create a recovery partition. The following is an example
SiteConfig
nodes:
- hostName: "snonode.sno-worker-0.e2e.bos.redhat.com"
role: "master"
rootDeviceHints:
hctl: "0:2:0:0"
deviceName: /dev/sda
........
........
#Disk /dev/sda: 893.3 GiB, 959119884288 bytes, 1873281024 sectors
diskPartition:
- device: /dev/sda
partitions:
- mount_point: /var/recovery
size: 51200
start: 800000
Procedure
Save the contents of the
CR with theClusterGroupUpgradefield set tobackupin thetruefile:clustergroupupgrades-group-du.yamlapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: du-upgrade-4918 namespace: ztp-group-du-sno spec: preCaching: true backup: true clusters: - cnfdb1 - cnfdb2 enable: false managedPolicies: - du-upgrade-platform-upgrade remediationStrategy: maxConcurrency: 2 timeout: 240To start the update, apply the
CR by running the following command:ClusterGroupUpgrade$ oc apply -f clustergroupupgrades-group-du.yaml
Verification
Check the status of the upgrade in the hub cluster by running the following command:
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'Example output
{ "backup": { "clusters": [ "cnfdb2", "cnfdb1" ], "status": { "cnfdb1": "Succeeded", "cnfdb2": "Succeeded" } }, "computedMaxConcurrency": 1, "conditions": [ { "lastTransitionTime": "2022-04-05T10:37:19Z", "message": "Backup is completed", "reason": "BackupCompleted", "status": "True", "type": "BackupDone" } ], "precaching": { "spec": {} }, "status": {}
18.7.2. Recovering a cluster after a failed upgrade Copia collegamentoCollegamento copiato negli appunti!
If an upgrade of a cluster fails, you can manually log in to the cluster and use the backup to return the cluster to its preupgrade state. There are two stages:
- Rollback
- If the attempted upgrade included a change to the platform OS deployment, you must roll back to the previous version before running the recovery script.
A rollback is only applicable to upgrades from TALM and single-node OpenShift. This process does not apply to rollbacks from any other upgrade type.
- Recovery
- The recovery shuts down containers and uses files from the backup partition to relaunch containers and restore clusters.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
- Install Red Hat Advanced Cluster Management (RHACM).
-
Log in as a user with privileges.
cluster-admin - Run an upgrade that is configured for backup.
Procedure
Delete the previously created
custom resource (CR) by running the following command:ClusterGroupUpgrade$ oc delete cgu/du-upgrade-4918 -n ztp-group-du-sno- Log in to the cluster that you want to recover.
Check the status of the platform OS deployment by running the following command:
$ oc ostree admin statusExample outputs
[root@lab-test-spoke2-node-0 core]# ostree admin status * rhcos c038a8f08458bbed83a77ece033ad3c55597e3f64edad66ea12fda18cbdceaf9.0 Version: 49.84.202202230006-0 Pinned: yes1 origin refspec: c038a8f08458bbed83a77ece033ad3c55597e3f64edad66ea12fda18cbdceaf9- 1
- The current deployment is pinned. A platform OS deployment rollback is not necessary.
[root@lab-test-spoke2-node-0 core]# ostree admin status * rhcos f750ff26f2d5550930ccbe17af61af47daafc8018cd9944f2a3a6269af26b0fa.0 Version: 410.84.202204050541-0 origin refspec: f750ff26f2d5550930ccbe17af61af47daafc8018cd9944f2a3a6269af26b0fa rhcos ad8f159f9dc4ea7e773fd9604c9a16be0fe9b266ae800ac8470f63abc39b52ca.0 (rollback)1 Version: 410.84.202203290245-0 Pinned: yes2 origin refspec: ad8f159f9dc4ea7e773fd9604c9a16be0fe9b266ae800ac8470f63abc39b52caTo trigger a rollback of the platform OS deployment, run the following command:
$ rpm-ostree rollback -rThe first phase of the recovery shuts down containers and restores files from the backup partition to the targeted directories. To begin the recovery, run the following command:
$ /var/recovery/upgrade-recovery.shWhen prompted, reboot the cluster by running the following command:
$ systemctl rebootAfter the reboot, restart the recovery by running the following command:
$ /var/recovery/upgrade-recovery.sh --resume
If the recovery utility fails, you can retry with the
--restart
$ /var/recovery/upgrade-recovery.sh --restart
Verification
To check the status of the recovery run the following command:
$ oc get clusterversion,nodes,clusteroperatorExample output
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS clusterversion.config.openshift.io/version 4.9.23 True False 86d Cluster version is 4.9.231 NAME STATUS ROLES AGE VERSION node/lab-test-spoke1-node-0 Ready master,worker 86d v1.22.3+b93fd352 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE clusteroperator.config.openshift.io/authentication 4.9.23 True False False 2d7h3 clusteroperator.config.openshift.io/baremetal 4.9.23 True False False 86d ..............
18.8. Using the container image pre-cache feature Copia collegamentoCollegamento copiato negli appunti!
Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed.
The time of the update is not set by TALM. You can apply the
ClusterGroupUpgrade
The container image pre-caching starts when the
preCaching
true
ClusterGroupUpgrade
enable
true
The pre-caching process can be in the following statuses:
PrecacheNotStartedThis is the initial state all clusters are automatically assigned to on the first reconciliation pass of the
CR.ClusterGroupUpgradeIn this state, TALM deletes any pre-caching namespace and hub view resources of spoke clusters that remain from previous incomplete updates. TALM then creates a new
resource for the spoke pre-caching namespace to verify its deletion in theManagedClusterViewstate.PrecachePreparingPrecachePreparing- Cleaning up any remaining resources from previous incomplete updates is in progress.
PrecacheStarting- Pre-caching job prerequisites and the job are created.
PrecacheActive- The job is in "Active" state.
PrecacheSucceeded- The pre-cache job has succeeded.
PrecacheTimeout- The artifact pre-caching has been partially done.
PrecacheUnrecoverableError- The job ends with a non-zero exit code.
18.8.1. Creating a ClusterGroupUpgrade CR with pre-caching Copia collegamentoCollegamento copiato negli appunti!
The pre-cache feature allows the required container images to be present on the spoke cluster before the update starts.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with privileges.
cluster-admin
Procedure
Save the contents of the
CR with theClusterGroupUpgradefield set topreCachingin thetruefile:clustergroupupgrades-group-du.yamlapiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: du-upgrade-4918 namespace: ztp-group-du-sno spec: preCaching: true1 clusters: - cnfdb1 - cnfdb2 enable: false managedPolicies: - du-upgrade-platform-upgrade remediationStrategy: maxConcurrency: 2 timeout: 240- 1
- The
preCachingfield is set totrue, which enables TALM to pull the container images before starting the update.
When you want to start the update, apply the
CR by running the following command:ClusterGroupUpgrade$ oc apply -f clustergroupupgrades-group-du.yaml
Verification
Check if the
CR exists in the hub cluster by running the following command:ClusterGroupUpgrade$ oc get cgu -AExample output
NAMESPACE NAME AGE ztp-group-du-sno du-upgrade-4918 10s1 - 1
- The CR is created.
Check the status of the pre-caching task by running the following command:
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'Example output
{ "conditions": [ { "lastTransitionTime": "2022-01-27T19:07:24Z", "message": "Precaching is not completed (required)",1 "reason": "PrecachingRequired", "status": "False", "type": "Ready" }, { "lastTransitionTime": "2022-01-27T19:07:24Z", "message": "Precaching is required and not done", "reason": "PrecachingNotDone", "status": "False", "type": "PrecachingDone" }, { "lastTransitionTime": "2022-01-27T19:07:34Z", "message": "Pre-caching spec is valid and consistent", "reason": "PrecacheSpecIsWellFormed", "status": "True", "type": "PrecacheSpecValid" } ], "precaching": { "clusters": [ "cnfdb1"2 ], "spec": { "platformImage": "image.example.io"}, "status": { "cnfdb1": "Active"} } }Check the status of the pre-caching job by running the following command on the spoke cluster:
$ oc get jobs,pods -n openshift-talm-pre-cacheExample output
NAME COMPLETIONS DURATION AGE job.batch/pre-cache 0/1 3m10s 3m10s NAME READY STATUS RESTARTS AGE pod/pre-cache--1-9bmlr 1/1 Running 0 3m10sCheck the status of the
CR by running the following command:ClusterGroupUpgrade$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'Example output
"conditions": [ { "lastTransitionTime": "2022-01-27T19:30:41Z", "message": "The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies", "reason": "UpgradeCompleted", "status": "True", "type": "Ready" }, { "lastTransitionTime": "2022-01-27T19:28:57Z", "message": "Precaching is completed", "reason": "PrecachingCompleted", "status": "True", "type": "PrecachingDone"1 }- 1
- The pre-cache tasks are done.
18.9. Troubleshooting the Topology Aware Lifecycle Manager Copia collegamentoCollegamento copiato negli appunti!
The Topology Aware Lifecycle Manager (TALM) is an OpenShift Container Platform Operator that remediates RHACM policies. When issues occur, use the
oc adm must-gather
For more information about related topics, see the following documentation:
- Red Hat Advanced Cluster Management for Kubernetes 2.4 Support Matrix
- Red Hat Advanced Cluster Management Troubleshooting
- The "Troubleshooting Operator issues" section
18.9.1. General troubleshooting Copia collegamentoCollegamento copiato negli appunti!
You can determine the cause of the problem by reviewing the following questions:
Is the configuration that you are applying supported?
- Are the RHACM and the OpenShift Container Platform versions compatible?
- Are the TALM and RHACM versions compatible?
Which of the following components is causing the problem?
To ensure that the
ClusterGroupUpgrade
-
Create the CR with the
ClusterGroupUpgradefield set tospec.enable.false - Wait for the status to be updated and go through the troubleshooting questions.
-
If everything looks as expected, set the field to
spec.enablein thetrueCR.ClusterGroupUpgrade
After you set the
spec.enable
true
ClusterUpgradeGroup
spec
18.9.2. Cannot modify the ClusterUpgradeGroup CR Copia collegamentoCollegamento copiato negli appunti!
- Issue
-
You cannot edit the
ClusterUpgradeGroupCR after enabling the update. - Resolution
Restart the procedure by performing the following steps:
Remove the old
CR by running the following command:ClusterGroupUpgrade$ oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>Check and fix the existing issues with the managed clusters and policies.
- Ensure that all the clusters are managed clusters and available.
-
Ensure that all the policies exist and have the field set to
spec.remediationAction.inform
Create a new
CR with the correct configurations.ClusterGroupUpgrade$ oc apply -f <ClusterGroupUpgradeCR_YAML>
18.9.3. Managed policies Copia collegamentoCollegamento copiato negli appunti!
Checking managed policies on the system
- Issue
- You want to check if you have the correct managed policies on the system.
- Resolution
Run the following command:
$ oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'Example output
["group-du-sno-validator-du-validator-policy", "policy2-common-nto-sub-policy", "policy3-common-ptp-sub-policy"]
Checking remediationAction mode
- Issue
-
You want to check if the
remediationActionfield is set toinformin thespecof the managed policies. - Resolution
Run the following command:
$ oc get policies --all-namespacesExample output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
Checking policy compliance state
- Issue
- You want to check the compliance state of policies.
- Resolution
Run the following command:
$ oc get policies --all-namespacesExample output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
18.9.4. Clusters Copia collegamentoCollegamento copiato negli appunti!
Checking if managed clusters are present
- Issue
-
You want to check if the clusters in the
ClusterGroupUpgradeCR are managed clusters. - Resolution
Run the following command:
$ oc get managedclustersExample output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.example.com:6443 True Unknown 13d spoke1 true https://api.spoke1.example.com:6443 True True 13d spoke3 true https://api.spoke3.example.com:6443 True True 27hAlternatively, check the TALM manager logs:
Get the name of the TALM manager by running the following command:
$ oc get pod -n openshift-operatorsExample output
NAME READY STATUS RESTARTS AGE cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45mCheck the TALM manager logs by running the following command:
$ oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c managerExample output
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}1 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem- 1
- The error message shows that the cluster is not a managed cluster.
Checking if managed clusters are available
- Issue
-
You want to check if the managed clusters specified in the
ClusterGroupUpgradeCR are available. - Resolution
Run the following command:
$ oc get managedclustersExample output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d spoke1 true https://api.spoke1.testlab.com:6443 True True 13d1 spoke3 true https://api.spoke3.testlab.com:6443 True True 27h2
Checking clusterSelector
- Issue
-
You want to check if the
clusterSelectorfield is specified in theClusterGroupUpgradeCR in at least one of the managed clusters. - Resolution
Run the following command:
$ oc get managedcluster --selector=upgrade=true1 - 1
- The label for the clusters you want to update is
upgrade:true.
Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
Checking if canary clusters are present
- Issue
You want to check if the canary clusters are present in the list of clusters.
Example
ClusterGroupUpgradeCRspec: clusters: - spoke1 - spoke3 clusterSelector: - upgrade2=true remediationStrategy: canaries: - spoke3 maxConcurrency: 2 timeout: 240- Resolution
Run the following commands:
$ oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'Example output
["spoke1", "spoke3"]Check if the canary clusters are present in the list of clusters that match
labels by running the following command:clusterSelector$ oc get managedcluster --selector=upgrade=trueExample output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
A cluster can be present in
spec.clusters
spec.clusterSelecter
Checking the pre-caching status on spoke clusters
Check the status of pre-caching by running the following command on the spoke cluster:
$ oc get jobs,pods -n openshift-talo-pre-cache
18.9.5. Remediation Strategy Copia collegamentoCollegamento copiato negli appunti!
Checking if remediationStrategy is present in the ClusterGroupUpgrade CR
- Issue
-
You want to check if the
remediationStrategyis present in theClusterGroupUpgradeCR. - Resolution
Run the following command:
$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'Example output
{"maxConcurrency":2, "timeout":240}
Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR
- Issue
-
You want to check if the
maxConcurrencyis specified in theClusterGroupUpgradeCR. - Resolution
Run the following command:
$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'Example output
2
18.9.6. Topology Aware Lifecycle Manager Copia collegamentoCollegamento copiato negli appunti!
Checking condition message and status in the ClusterGroupUpgrade CR
- Issue
-
You want to check the value of the
status.conditionsfield in theClusterGroupUpgradeCR. - Resolution
Run the following command:
$ oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'Example output
{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"The ClusterGroupUpgrade CR has managed policies that are missing:[policyThatDoesntExist]", "reason":"UpgradeCannotStart", "status":"False", "type":"Ready"}
Checking corresponding copied policies
- Issue
-
You want to check if every policy from
status.managedPoliciesForUpgradehas a corresponding policy instatus.copiedPolicies. - Resolution
Run the following command:
$ oc get cgu lab-upgrade -oyamlExample output
status: … copiedPolicies: - lab-upgrade-policy3-common-ptp-sub-policy managedPoliciesForUpgrade: - name: policy3-common-ptp-sub-policy namespace: default
Checking if status.remediationPlan was computed
- Issue
-
You want to check if
status.remediationPlanis computed. - Resolution
Run the following command:
$ oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'Example output
[["spoke2", "spoke3"]]
Errors in the TALM manager container
- Issue
- You want to check the logs of the manager container of TALM.
- Resolution
Run the following command:
$ oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c managerExample output
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}1 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem- 1
- Displays the error.