Chapter 12. Updating managed clusters with the Topology Aware Lifecycle Manager
You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of multiple clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.
			Using RHACM and PolicyGenerator CRs is the recommended approach for managing policies and deploying them to managed clusters. This replaces the use of PolicyGenTemplate CRs for this purpose. For more information about PolicyGenerator resources, see the RHACM Policy Generator documentation.
		
12.1. About the Topology Aware Lifecycle Manager configuration
The Topology Aware Lifecycle Manager (TALM) manages the deployment of Red Hat Advanced Cluster Management (RHACM) policies for one or more OpenShift Container Platform clusters. Using TALM in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With TALM, you can control the following actions:
- The timing of the update
- The number of RHACM-managed clusters
- The subset of managed clusters to apply the policies to
- The update order of the clusters
- The set of policies remediated to the cluster
- The order of policies remediated to the cluster
- The assignment of a canary cluster
For single-node OpenShift, the Topology Aware Lifecycle Manager (TALM) offers pre-caching images for clusters with limited bandwidth.
TALM supports the orchestration of the OpenShift Container Platform y-stream and z-stream updates, and day-two operations on y-streams and z-streams.
12.2. About managed policies used with Topology Aware Lifecycle Manager
The Topology Aware Lifecycle Manager (TALM) uses RHACM policies for cluster updates.
				TALM can be used to manage the rollout of any policy CR where the remediationAction field is set to inform. Supported use cases include the following:
			
- Manual user creation of policy CRs
- 
						Automatically generated policies from the PolicyGeneratororPolicyGentemplatecustom resource definition (CRD)
					Using the PolicyGentemplate CRD is the recommended method for automatic policy generation.
				
For policies that update an Operator subscription with manual approval, TALM provides additional functionality that approves the installation of the updated Operator.
For more information about managed policies, see Policy Overview in the RHACM documentation.
12.3. Installing the Topology Aware Lifecycle Manager by using the web console
You can use the OpenShift Container Platform web console to install the Topology Aware Lifecycle Manager.
Prerequisites
- Install the latest version of the RHACM Operator.
- TALM requires RHACM 2.9 or later.
- Set up a hub cluster with a disconnected registry.
- 
						Log in as a user with cluster-adminprivileges.
Procedure
- 
						In the OpenShift Container Platform web console, navigate to Operators OperatorHub. 
- Search for the Topology Aware Lifecycle Manager from the list of available Operators, and then click Install.
- Keep the default selection of Installation mode ["All namespaces on the cluster (default)"] and Installed Namespace ("openshift-operators") to ensure that the Operator is installed properly.
- Click Install.
Verification
To confirm that the installation is successful:
- 
						Navigate to the Operators Installed Operators page. 
- 
						Check that the Operator is installed in the All Namespacesnamespace and its status isSucceeded.
If the Operator is not installed successfully:
- 
						Navigate to the Operators Installed Operators page and inspect the Statuscolumn for any errors or failures.
- 
						Navigate to the Workloads Pods page and check the logs in any containers in the cluster-group-upgrades-controller-managerpod that are reporting issues.
12.4. Installing the Topology Aware Lifecycle Manager by using the CLI
				You can use the OpenShift CLI (oc) to install the Topology Aware Lifecycle Manager (TALM).
			
Prerequisites
- 
						Install the OpenShift CLI (oc).
- Install the latest version of the RHACM Operator.
- TALM requires RHACM 2.9 or later.
- Set up a hub cluster with disconnected registry.
- 
						Log in as a user with cluster-adminprivileges.
Procedure
- Create a - SubscriptionCR:- Define the - SubscriptionCR and save the YAML file, for example,- talm-subscription.yaml:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the - SubscriptionCR by running the following command:- oc create -f talm-subscription.yaml - $ oc create -f talm-subscription.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
Verification
- Verify that the installation succeeded by inspecting the CSV resource: - oc get csv -n openshift-operators - $ oc get csv -n openshift-operators- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME DISPLAY VERSION REPLACES PHASE topology-aware-lifecycle-manager.4.19.x Topology Aware Lifecycle Manager 4.19.x Succeeded - NAME DISPLAY VERSION REPLACES PHASE topology-aware-lifecycle-manager.4.19.x Topology Aware Lifecycle Manager 4.19.x Succeeded- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the TALM is up and running: - oc get deploy -n openshift-operators - $ oc get deploy -n openshift-operators- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s - NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
12.5. About the ClusterGroupUpgrade CR
				The Topology Aware Lifecycle Manager (TALM) builds the remediation plan from the ClusterGroupUpgrade CR for a group of clusters. You can define the following specifications in a ClusterGroupUpgrade CR:
			
- Clusters in the group
- 
						Blocking ClusterGroupUpgradeCRs
- Applicable list of managed policies
- Number of concurrent updates
- Applicable canary updates
- Actions to perform before and after the update
- Update timing
				You can control the start time of an update using the enable field in the ClusterGroupUpgrade CR. For example, if you have a scheduled maintenance window of four hours, you can prepare a ClusterGroupUpgrade CR with the enable field set to false.
			
				You can set the timeout by configuring the spec.remediationStrategy.timeout setting as follows:
			
spec
  remediationStrategy:
          maxConcurrency: 1
          timeout: 240
spec
  remediationStrategy:
          maxConcurrency: 1
          timeout: 240
				You can use the batchTimeoutAction to determine what happens if an update fails for a cluster. You can specify continue to skip the failing cluster and continue to upgrade other clusters, or abort to stop policy remediation for all clusters. Once the timeout elapses, TALM removes all enforce policies to ensure that no further updates are made to clusters.
			
				To apply the changes, you set the enabled field to true.
			
For more information see the "Applying update policies to managed clusters" section.
				As TALM works through remediation of the policies to the specified clusters, the ClusterGroupUpgrade CR can report true or false statuses for a number of conditions.
			
					After TALM completes a cluster update, the cluster does not update again under the control of the same ClusterGroupUpgrade CR. You must create a new ClusterGroupUpgrade CR in the following cases:
				
- When you need to update the cluster again
- 
							When the cluster changes to non-compliant with the informpolicy after being updated
12.5.1. Selecting clusters
TALM builds a remediation plan and selects clusters based on the following fields:
- 
							The clusterLabelSelectorfield specifies the labels of the clusters that you want to update. This consists of a list of the standard label selectors fromk8s.io/apimachinery/pkg/apis/meta/v1. Each selector in the list uses either label value pairs or label expressions. Matches from each selector are added to the final list of clusters along with the matches from theclusterSelectorfield and theclusterfield.
- 
							The clustersfield specifies a list of clusters to update.
- 
							The canariesfield specifies the clusters for canary updates.
- 
							The maxConcurrencyfield specifies the number of clusters to update in a batch.
- 
							The actionsfield specifiesbeforeEnableactions that TALM takes as it begins the update process, andafterCompletionactions that TALM takes as it completes policy remediation for each cluster.
					You can use the clusters, clusterLabelSelector, and clusterSelector fields together to create a combined list of clusters.
				
					The remediation plan starts with the clusters listed in the canaries field. Each canary cluster forms a single-cluster batch.
				
Sample ClusterGroupUpgrade CR with the enabled field set to false
- 1
- Specifies the action that TALM takes when it completes policy remediation for each cluster.
- 2
- Specifies the action that TALM takes as it begins the update process.
- 3
- Defines the list of clusters to update.
- 4
- Theenablefield is set tofalse.
- 5
- Lists the user-defined set of policies to remediate.
- 6
- Defines the specifics of the cluster updates.
- 7
- Defines the clusters for canary updates.
- 8
- Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by themaxConcurrencyvalue. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
- 9
- Displays the parameters for selecting clusters.
- 10
- Controls what happens if a batch times out. Possible values areabortorcontinue. If unspecified, the default iscontinue.
- 11
- Displays information about the status of the updates.
- 12
- TheClustersSelectedcondition shows that all selected clusters are valid.
- 13
- TheValidatedcondition shows that all selected clusters have been validated.
Any failures during the update of a canary cluster stops the update process.
					When the remediation plan is successfully created, you can you set the enable field to true and TALM starts to update the non-compliant clusters with the specified managed policies.
				
						You can only make changes to the spec fields if the enable field of the ClusterGroupUpgrade CR is set to false.
					
12.5.2. Validating
					TALM checks that all specified managed policies are available and correct, and uses the Validated condition to report the status and reasons as follows:
				
- true- Validation is completed. 
- false- Policies are missing or invalid, or an invalid platform image has been specified. 
12.5.3. Pre-caching
					Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed. On single-node OpenShift clusters, you can use pre-caching to avoid this. The container image pre-caching starts when you create a ClusterGroupUpgrade CR with the preCaching field set to true. TALM compares the available disk space with the estimated OpenShift Container Platform image size to ensure that there is enough space. If a cluster has insufficient space, TALM cancels pre-caching for that cluster and does not remediate policies on it.
				
					TALM uses the PrecacheSpecValid condition to report status information as follows:
				
- true- The pre-caching spec is valid and consistent. 
- false- The pre-caching spec is incomplete. 
					TALM uses the PrecachingSucceeded condition to report status information as follows:
				
- true- TALM has concluded the pre-caching process. If pre-caching fails for any cluster, the update fails for that cluster but proceeds for all other clusters. A message informs you if pre-caching has failed for any clusters. 
- false- Pre-caching is still in progress for one or more clusters or has failed for all clusters. 
For more information see the "Using the container image pre-cache feature" section.
12.5.4. Updating clusters
					TALM enforces the policies following the remediation plan. Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, TALM moves on to the next batch. The timeout value of a batch is the spec.timeout field divided by the number of batches in the remediation plan.
				
					TALM uses the Progressing condition to report the status and reasons as follows:
				
- true- TALM is remediating non-compliant policies. 
- false- The update is not in progress. Possible reasons for this are: - All clusters are compliant with all the managed policies.
- The update timed out as policy remediation took too long.
- Blocking CRs are missing from the system or have not yet completed.
- 
									The ClusterGroupUpgradeCR is not enabled.
 
						The managed policies apply in the order that they are listed in the managedPolicies field in the ClusterGroupUpgrade CR. One managed policy is applied to the specified clusters at a time. When a cluster complies with the current policy, the next managed policy is applied to it.
					
Sample ClusterGroupUpgrade CR in the Progressing state
- 1
- TheProgressingfields show that TALM is in the process of remediating policies.
12.5.5. Update status
					TALM uses the Succeeded condition to report the status and reasons as follows:
				
- true- All clusters are compliant with the specified managed policies. 
- false- Policy remediation failed as there were no clusters available for remediation, or because policy remediation took too long for one of the following reasons: - The current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout.
- 
									Clusters did not comply with the managed policies within the timeoutvalue specified in theremediationStrategyfield.
 
Sample ClusterGroupUpgrade CR in the Succeeded state
- 2
- In theProgressingfields, the status isfalseas the update has completed; clusters are compliant with all the managed policies.
- 3
- TheSucceededfields show that the validations completed successfully.
- 1
- Thestatusfield includes a list of clusters and their respective statuses. The status of a cluster can becompleteortimedout.
Sample ClusterGroupUpgrade CR in the timedout state
12.5.6. Blocking ClusterGroupUpgrade CRs
					You can create multiple ClusterGroupUpgrade CRs and control their order of application.
				
					For example, if you create ClusterGroupUpgrade CR C that blocks the start of ClusterGroupUpgrade CR A, then ClusterGroupUpgrade CR A cannot start until the status of ClusterGroupUpgrade CR C becomes UpgradeComplete.
				
					One ClusterGroupUpgrade CR can have multiple blocking CRs. In this case, all the blocking CRs must complete before the upgrade for the current CR can start.
				
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
- 
							Log in as a user with cluster-adminprivileges.
- Create RHACM policies in the hub cluster.
Procedure
- Save the content of the - ClusterGroupUpgradeCRs in the- cgu-a.yaml,- cgu-b.yaml, and- cgu-c.yamlfiles.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Defines the blocking CRs. Thecgu-aupdate cannot start untilcgu-cis complete.
 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Thecgu-bupdate cannot start untilcgu-ais complete.
 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Thecgu-cupdate does not have any blocking CRs. TALM starts thecgu-cupdate when theenablefield is set totrue.
 
- Create the - ClusterGroupUpgradeCRs by running the following command for each relevant CR:- oc apply -f <name>.yaml - $ oc apply -f <name>.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Start the update process by running the following command for each relevant CR: - oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'- $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The following examples show - ClusterGroupUpgradeCRs where the- enablefield is set to- true:- Example for - cgu-awith blocking CRs- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Shows the list of blocking CRs.
 - Example for - cgu-bwith blocking CRs- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Shows the list of blocking CRs.
 - Example for - cgu-cwith blocking CRs- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Thecgu-cupdate does not have any blocking CRs.
 
12.6. Update policies on managed clusters
				The Topology Aware Lifecycle Manager (TALM) remediates a set of inform policies for the clusters specified in the ClusterGroupUpgrade custom resource (CR). TALM remediates inform policies by controlling the remediationAction specification in a Policy CR through the bindingOverrides.remediationAction and subFilter specifications in the PlacementBinding CR. Each policy has its own corresponding RHACM placement rule and RHACM placement binding.
			
One by one, TALM adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, TALM skips applying that policy on the compliant cluster. TALM then moves on to applying the next policy to the non-compliant cluster. After TALM completes the updates in a batch, all clusters are removed from the placement rules associated with the policies. Then, the update of the next batch starts.
If a spoke cluster does not report any compliant state to RHACM, the managed policies on the hub cluster can be missing status information that TALM needs. TALM handles these cases in the following ways:
- 
						If a policy’s status.compliantfield is missing, TALM ignores the policy and adds a log entry. Then, TALM continues looking at the policy’sstatus.statusfield.
- 
						If a policy’s status.statusis missing, TALM produces an error.
- 
						If a cluster’s compliance status is missing in the policy’s status.statusfield, TALM considers that cluster to be non-compliant with that policy.
				The ClusterGroupUpgrade CR’s batchTimeoutAction determines what happens if an upgrade fails for a cluster. You can specify continue to skip the failing cluster and continue to upgrade other clusters, or specify abort to stop the policy remediation for all clusters. Once the timeout elapses, TALM removes all the resources it created to ensure that no further updates are made to clusters.
			
Example upgrade policy
For more information about RHACM policies, see Policy overview.
12.6.1. Configuring Operator subscriptions for managed clusters that you install with TALM
					Topology Aware Lifecycle Manager (TALM) can only approve the install plan for an Operator if the Subscription custom resource (CR) of the Operator contains the status.state.AtLatestKnown field.
				
Procedure
- Add the - status.state.AtLatestKnownfield to the- SubscriptionCR of the Operator:- Example Subscription CR - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Thestatus.state: AtLatestKnownfield is used for the latest Operator version available from the Operator catalog.
 Note- When a new version of the Operator is available in the registry, the associated policy becomes non-compliant. 
- 
							Apply the changed Subscriptionpolicy to your managed clusters with aClusterGroupUpgradeCR.
12.6.2. Applying update policies to managed clusters
You can update your managed clusters by applying your policies.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- TALM requires RHACM 2.9 or later.
- Provision one or more managed clusters.
- 
							Log in as a user with cluster-adminprivileges.
- Create RHACM policies in the hub cluster.
Procedure
- Save the contents of the - ClusterGroupUpgradeCR in the- cgu-1.yamlfile.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The name of the policies to apply.
- 2
- The list of clusters to update.
- 3
- ThemaxConcurrencyfield signifies the number of clusters updated at the same time.
- 4
- The update timeout in minutes.
- 5
- Controls what happens if a batch times out. Possible values areabortorcontinue. If unspecified, the default iscontinue.
 
- Create the - ClusterGroupUpgradeCR by running the following command:- oc create -f cgu-1.yaml - $ oc create -f cgu-1.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Check if the - ClusterGroupUpgradeCR was created in the hub cluster by running the following command:- oc get cgu --all-namespaces - $ oc get cgu --all-namespaces- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAMESPACE NAME AGE STATE DETAILS default cgu-1 8m55 NotEnabled Not Enabled - NAMESPACE NAME AGE STATE DETAILS default cgu-1 8m55 NotEnabled Not Enabled- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the status of the update by running the following command: - oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq- $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Thespec.enablefield in theClusterGroupUpgradeCR is set tofalse.
 
 
- Change the value of the - spec.enablefield to- trueby running the following command:- oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge- $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Check the status of the update by running the following command: - oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq- $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Reflects the update progress of the current batch. Run this command again to receive updated information about the progress.
 
- Check the status of the policies by running the following command: - oc get policies -A - oc get policies -A- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 
									The spec.remediationActionvalue changes toenforcefor the child policies applied to the clusters from the current batch.
- 
									The spec.remedationActionvalue remainsinformfor the child policies in the rest of the clusters.
- 
									After the batch is complete, the spec.remediationActionvalue changes back toinformfor the enforced child policies.
 
- 
									The 
- If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster. - Export the - KUBECONFIGfile of the single-node cluster you want to check the installation progress for by running the following command:- export KUBECONFIG=<cluster_kubeconfig_absolute_path> - $ export KUBECONFIG=<cluster_kubeconfig_absolute_path>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check all the subscriptions present on the single-node cluster and look for the one in the policy you are trying to install through the - ClusterGroupUpgradeCR by running the following command:- oc get subs -A | grep -i <subscription_name> - $ oc get subs -A | grep -i <subscription_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output for - cluster-loggingpolicy- NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable - NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- If one of the managed policies includes a - ClusterVersionCR, check the status of platform updates in the current batch by running the following command against the spoke cluster:- oc get clusterversion - $ oc get clusterversion- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.19.5 True True 43s Working towards 4.4.19.7: 71 of 735 done (9% complete) - NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.19.5 True True 43s Working towards 4.4.19.7: 71 of 735 done (9% complete)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the Operator subscription by running the following command: - oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"- $ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the install plans present on the single-node cluster that is associated with the desired subscription by running the following command: - oc get installplan -n <subscription_namespace> - $ oc get installplan -n <subscription_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output for - cluster-loggingOperator- NAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true - NAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The install plans have theirApprovalfield set toManualand theirApprovedfield changes fromfalsetotrueafter TALM approves the install plan.
 Note- When TALM is remediating a policy containing a subscription, it automatically approves any install plans attached to that subscription. Where multiple install plans are needed to get the operator to the latest known version, TALM might approve multiple install plans, upgrading through one or more intermediate versions to get to the final version. 
- Check if the cluster service version for the Operator of the policy that the - ClusterGroupUpgradeis installing reached the- Succeededphase by running the following command:- oc get csv -n <operator_namespace> - $ oc get csv -n <operator_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output for OpenShift Logging Operator - NAME DISPLAY VERSION REPLACES PHASE cluster-logging.v6.2.1 Red Hat OpenShift Logging 6.2.1 Succeeded - NAME DISPLAY VERSION REPLACES PHASE cluster-logging.v6.2.1 Red Hat OpenShift Logging 6.2.1 Succeeded- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
12.7. Using the container image pre-cache feature
Single-node OpenShift clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed.
					The time of the update is not set by TALM. You can apply the ClusterGroupUpgrade CR at the beginning of the update by manual application or by external automation.
				
				The container image pre-caching starts when the preCaching field is set to true in the ClusterGroupUpgrade CR.
			
				TALM uses the PrecacheSpecValid condition to report status information as follows:
			
- true- The pre-caching spec is valid and consistent. 
- false- The pre-caching spec is incomplete. 
				TALM uses the PrecachingSucceeded condition to report status information as follows:
			
- true- TALM has concluded the pre-caching process. If pre-caching fails for any cluster, the update fails for that cluster but proceeds for all other clusters. A message informs you if pre-caching has failed for any clusters. 
- false- Pre-caching is still in progress for one or more clusters or has failed for all clusters. 
				After a successful pre-caching process, you can start remediating policies. The remediation actions start when the enable field is set to true. If there is a pre-caching failure on a cluster, the upgrade fails for that cluster. The upgrade process continues for all other clusters that have a successful pre-cache.
			
The pre-caching process can be in the following statuses:
- NotStarted- This is the initial state all clusters are automatically assigned to on the first reconciliation pass of the - ClusterGroupUpgradeCR. In this state, TALM deletes any pre-caching namespace and hub view resources of spoke clusters that remain from previous incomplete updates. TALM then creates a new- ManagedClusterViewresource for the spoke pre-caching namespace to verify its deletion in the- PrecachePreparingstate.
- PreparingToStart- Cleaning up any remaining resources from previous incomplete updates is in progress. 
- Starting- Pre-caching job prerequisites and the job are created. 
- Active- The job is in "Active" state. 
- Succeeded- The pre-cache job succeeded. 
- PrecacheTimeout- The artifact pre-caching is partially done. 
- UnrecoverableError- The job ends with a non-zero exit code. 
12.7.1. Using the container image pre-cache filter
The pre-cache feature typically downloads more images than a cluster needs for an update. You can control which pre-cache images are downloaded to a cluster. This decreases download time, and saves bandwidth and storage.
You can see a list of all images to be downloaded using the following command:
oc adm release info <ocp-version>
$ oc adm release info <ocp-version>
					The following ConfigMap example shows how you can exclude images using the excludePrecachePatterns field.
				
- 1
- TALM excludes all images with names that include any of the patterns listed here.
12.7.2. Creating a ClusterGroupUpgrade CR with pre-caching
For single-node OpenShift, the pre-cache feature allows the required container images to be present on the spoke cluster before the update starts.
						For pre-caching, TALM uses the spec.remediationStrategy.timeout value from the ClusterGroupUpgrade CR. You must set a timeout value that allows sufficient time for the pre-caching job to complete. When you enable the ClusterGroupUpgrade CR after pre-caching has completed, you can change the timeout value to a duration that is appropriate for the update.
					
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
- 
							Log in as a user with cluster-adminprivileges.
Procedure
- Save the contents of the - ClusterGroupUpgradeCR with the- preCachingfield set to- truein the- clustergroupupgrades-group-du.yamlfile:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- ThepreCachingfield is set totrue, which enables TALM to pull the container images before starting the update.
 
- When you want to start pre-caching, apply the - ClusterGroupUpgradeCR by running the following command:- oc apply -f clustergroupupgrades-group-du.yaml - $ oc apply -f clustergroupupgrades-group-du.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Check if the - ClusterGroupUpgradeCR exists in the hub cluster by running the following command:- oc get cgu -A - $ oc get cgu -A- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAMESPACE NAME AGE STATE DETAILS ztp-group-du-sno du-upgrade-4918 10s InProgress Precaching is required and not done - NAMESPACE NAME AGE STATE DETAILS ztp-group-du-sno du-upgrade-4918 10s InProgress Precaching is required and not done- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The CR is created.
 
- Check the status of the pre-caching task by running the following command: - oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'- $ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Displays the list of identified clusters.
 
- Check the status of the pre-caching job by running the following command on the spoke cluster: - oc get jobs,pods -n openshift-talo-pre-cache - $ oc get jobs,pods -n openshift-talo-pre-cache- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME COMPLETIONS DURATION AGE job.batch/pre-cache 0/1 3m10s 3m10s NAME READY STATUS RESTARTS AGE pod/pre-cache--1-9bmlr 1/1 Running 0 3m10s - NAME COMPLETIONS DURATION AGE job.batch/pre-cache 0/1 3m10s 3m10s NAME READY STATUS RESTARTS AGE pod/pre-cache--1-9bmlr 1/1 Running 0 3m10s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the status of the - ClusterGroupUpgradeCR by running the following command:- oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'- $ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The pre-cache tasks are done.
 
12.8. Troubleshooting the Topology Aware Lifecycle Manager
				The Topology Aware Lifecycle Manager (TALM) is an OpenShift Container Platform Operator that remediates RHACM policies. When issues occur, use the oc adm must-gather command to gather details and logs and to take steps in debugging the issues.
			
For more information about related topics, see the following documentation:
- Red Hat Advanced Cluster Management for Kubernetes 2.4 Support Matrix
- Red Hat Advanced Cluster Management Troubleshooting
- The "Troubleshooting Operator issues" section
12.8.1. General troubleshooting
You can determine the cause of the problem by reviewing the following questions:
- Is the configuration that you are applying supported? - Are the RHACM and the OpenShift Container Platform versions compatible?
- Are the TALM and RHACM versions compatible?
 
- Which of the following components is causing the problem? 
					To ensure that the ClusterGroupUpgrade configuration is functional, you can do the following:
				
- 
							Create the ClusterGroupUpgradeCR with thespec.enablefield set tofalse.
- Wait for the status to be updated and go through the troubleshooting questions.
- 
							If everything looks as expected, set the spec.enablefield totruein theClusterGroupUpgradeCR.
						After you set the spec.enable field to true in the ClusterUpgradeGroup CR, the update procedure starts and you cannot edit the CR’s spec fields anymore.
					
12.8.2. Cannot modify the ClusterUpgradeGroup CR
- Issue
- 
								You cannot edit the ClusterUpgradeGroupCR after enabling the update.
- Resolution
- Restart the procedure by performing the following steps: - Remove the old - ClusterGroupUpgradeCR by running the following command:- oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name> - $ oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check and fix the existing issues with the managed clusters and policies. - Ensure that all the clusters are managed clusters and available.
- 
												Ensure that all the policies exist and have the spec.remediationActionfield set toinform.
 
- Create a new - ClusterGroupUpgradeCR with the correct configurations.- oc apply -f <ClusterGroupUpgradeCR_YAML> - $ oc apply -f <ClusterGroupUpgradeCR_YAML>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
12.8.3. Managed policies
Checking managed policies on the system
- Issue
- You want to check if you have the correct managed policies on the system.
- Resolution
- Run the following command: - oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'- $ oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - ["group-du-sno-validator-du-validator-policy", "policy2-common-nto-sub-policy", "policy3-common-ptp-sub-policy"] - ["group-du-sno-validator-du-validator-policy", "policy2-common-nto-sub-policy", "policy3-common-ptp-sub-policy"]- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Checking remediationAction mode
- Issue
- 
								You want to check if the remediationActionfield is set toinformin thespecof the managed policies.
- Resolution
- Run the following command: - oc get policies --all-namespaces - $ oc get policies --all-namespaces- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h - NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Checking policy compliance state
- Issue
- You want to check the compliance state of policies.
- Resolution
- Run the following command: - oc get policies --all-namespaces - $ oc get policies --all-namespaces- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h - NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
12.8.4. Clusters
Checking if managed clusters are present
- Issue
- 
								You want to check if the clusters in the ClusterGroupUpgradeCR are managed clusters.
- Resolution
- Run the following command: - oc get managedclusters - $ oc get managedclusters- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.example.com:6443 True Unknown 13d spoke1 true https://api.spoke1.example.com:6443 True True 13d spoke3 true https://api.spoke3.example.com:6443 True True 27h - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.example.com:6443 True Unknown 13d spoke1 true https://api.spoke1.example.com:6443 True True 13d spoke3 true https://api.spoke3.example.com:6443 True True 27h- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Alternatively, check the TALM manager logs: - Get the name of the TALM manager by running the following command: - oc get pod -n openshift-operators - $ oc get pod -n openshift-operators- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY STATUS RESTARTS AGE cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45m - NAME READY STATUS RESTARTS AGE cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45m- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the TALM manager logs by running the following command: - oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager - $ oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem- ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}- 1 - sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The error message shows that the cluster is not a managed cluster.
 
 
 
Checking if managed clusters are available
- Issue
- 
								You want to check if the managed clusters specified in the ClusterGroupUpgradeCR are available.
- Resolution
- Run the following command: - oc get managedclusters - $ oc get managedclusters- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d spoke1 true https://api.spoke1.testlab.com:6443 True True 13d- 1 - spoke3 true https://api.spoke3.testlab.com:6443 True True 27h- 2 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Checking clusterLabelSelector
- Issue
- 
								You want to check if the clusterLabelSelectorfield specified in theClusterGroupUpgradeCR matches at least one of the managed clusters.
- Resolution
- Run the following command: - oc get managedcluster --selector=upgrade=true - $ oc get managedcluster --selector=upgrade=true- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The label for the clusters you want to update isupgrade:true.
 - Example output - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Checking if canary clusters are present
- Issue
- You want to check if the canary clusters are present in the list of clusters. - Example - ClusterGroupUpgradeCR- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Resolution
- Run the following commands: - oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'- $ oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - ["spoke1", "spoke3"] - ["spoke1", "spoke3"]- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Check if the canary clusters are present in the list of clusters that match - clusterLabelSelectorlabels by running the following command:- oc get managedcluster --selector=upgrade=true - $ oc get managedcluster --selector=upgrade=true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
						A cluster can be present in spec.clusters and also be matched by the spec.clusterLabelSelector label.
					
Checking the pre-caching status on spoke clusters
- Check the status of pre-caching by running the following command on the spoke cluster: - oc get jobs,pods -n openshift-talo-pre-cache - $ oc get jobs,pods -n openshift-talo-pre-cache- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
12.8.5. Remediation Strategy
Checking if remediationStrategy is present in the ClusterGroupUpgrade CR
- Issue
- 
								You want to check if the remediationStrategyis present in theClusterGroupUpgradeCR.
- Resolution
- Run the following command: - oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'- $ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - {"maxConcurrency":2, "timeout":240}- {"maxConcurrency":2, "timeout":240}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR
- Issue
- 
								You want to check if the maxConcurrencyis specified in theClusterGroupUpgradeCR.
- Resolution
- Run the following command: - oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'- $ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - 2 - 2- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
12.8.6. Topology Aware Lifecycle Manager
Checking condition message and status in the ClusterGroupUpgrade CR
- Issue
- 
								You want to check the value of the status.conditionsfield in theClusterGroupUpgradeCR.
- Resolution
- Run the following command: - oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'- $ oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - {"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"Missing managed policies:[policyList]", "reason":"NotAllManagedPoliciesExist", "status":"False", "type":"Validated"}- {"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"Missing managed policies:[policyList]", "reason":"NotAllManagedPoliciesExist", "status":"False", "type":"Validated"}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Checking if status.remediationPlan was computed
- Issue
- 
								You want to check if status.remediationPlanis computed.
- Resolution
- Run the following command: - oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'- $ oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - [["spoke2", "spoke3"]] - [["spoke2", "spoke3"]]- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Errors in the TALM manager container
- Issue
- You want to check the logs of the manager container of TALM.
- Resolution
- Run the following command: - oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager - $ oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem- ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}- 1 - sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Displays the error.
 
Clusters are not compliant to some policies after a ClusterGroupUpgrade CR has completed
- Issue
- The policy compliance status that TALM uses to decide if remediation is needed has not yet fully updated for all clusters. This may be because: - The CGU was run too soon after a policy was created or updated.
- 
										The remediation of a policy affects the compliance of subsequent policies in the ClusterGroupUpgradeCR.
 
- Resolution
- 
								Create and apply a new ClusterGroupUpdateCR with the same specification.
Auto-created ClusterGroupUpgrade CR in the GitOps ZTP workflow has no managed policies
- Issue
- 
								If there are no policies for the managed cluster when the cluster becomes Ready, aClusterGroupUpgradeCR with no policies is auto-created. Upon completion of theClusterGroupUpgradeCR, the managed cluster is labeled asztp-done. If thePolicyGeneratororPolicyGenTemplateCRs were not pushed to the Git repository within the required time afterSiteConfigresources were pushed, this might result in no policies being available for the target cluster when the cluster becameReady.
- Resolution
- 
								Verify that the policies you want to apply are available on the hub cluster, then create a ClusterGroupUpgradeCR with the required policies.
					You can either manually create the ClusterGroupUpgrade CR or trigger auto-creation again. To trigger auto-creation of the ClusterGroupUpgrade CR, remove the ztp-done label from the cluster and delete the empty ClusterGroupUpgrade CR that was previously created in the zip-install namespace.
				
Pre-caching has failed
- Issue
- Pre-caching might fail for one of the following reasons: - There is not enough free space on the node.
- For a disconnected environment, the pre-cache image has not been properly mirrored.
- There was an issue when creating the pod.
 
- Resolution
- To check if pre-caching has failed due to insufficient space, check the log of the pre-caching pod in the node. - Find the name of the pod using the following command: - oc get pods -n openshift-talo-pre-cache - $ oc get pods -n openshift-talo-pre-cache- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the logs to see if the error is related to insufficient space using the following command: - oc logs -n openshift-talo-pre-cache <pod name> - $ oc logs -n openshift-talo-pre-cache <pod name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- If there is no log, check the pod status using the following command: - oc describe pod -n openshift-talo-pre-cache <pod name> - $ oc describe pod -n openshift-talo-pre-cache <pod name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If the pod does not exist, check the job status to see why it could not create a pod using the following command: - oc describe job -n openshift-talo-pre-cache pre-cache - $ oc describe job -n openshift-talo-pre-cache pre-cache- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
Matching policies and ManagedCluster CRs before the managed cluster is available
- Issue
- You want RHACM to match policies and managed clusters before the managed clusters become available.
- Resolution
- To ensure that TALM correctly applies the RHACM policies specified in the - spec.managedPoliciesfield of the- ClusterGroupUpgrade(CGU) CR, TALM needs to match these policies to the managed cluster before the managed cluster is available. The RHACM- PolicyGeneratoruses the generated- PlacementCR to do this automatically. By default, this- PlacementCR includes the necessary tolerations to ensure proper TALM behavior.- The expected - spec.tolerationssettings in the- PlacementCR are as follows:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - If you use a custom - PlacementCR instead of the one generated by the RHACM- PolicyGenerator, include these tolerations in that- PlacementCR.- For more information on placements in RHACM, see Placement overview. - For more information on tolerations in RHACM, see Placing managed clusters by using taints and tolerations.