Troubleshooting
View a list of troubleshooting topics for your cluster. You can also use the must-gather command to collect logs.
Abstract
Chapter 1. Troubleshooting
			Before using the Troubleshooting guide, you can run the oc adm must-gather command to gather details, logs, and take steps in debugging issues. For more details, see Running the must-gather command to troubleshoot.
		
Additionally, check your role-based access. See Role-based access control for details.
1.1. Documented troubleshooting
View the list of troubleshooting topics for Red Hat Advanced Cluster Management for Kubernetes:
Installation
To get to the original installing tasks, view Installing.
Cluster management
To get to the original cluster management tasks, view Managing your clusters.
- Troubleshooting an offline cluster
- Troubleshooting cluster with pending import status
- Troubleshooting imported clusters offline after certificate change
- Troubleshooting cluster status changing from offline to available
- Troubleshooting cluster creation on VMware vSphere
- Troubleshooting cluster in console with pending or failed status
- Troubleshooting OpenShift Container Platform version 3.11 cluster import failure
- Troubleshooting Klusterlet with degraded conditions
- Troubleshooting Klusterlet application manager on managed clusters
- Troubleshooting Object storage channel secret
- Troubleshooting managedcluster resource
- Namespace remains after deleting a cluster
- Auto-import-secret-exists error when importing a cluster
Application management
To get to the original application management, view Managing applications.
Governance
To get to the original security guide, view Risk and compliance.
Console observability
Console observability includes Search and the Visual Web Terminal, along with header and navigation function. To get to the original observability guide, view Observability in the console.
1.2. Running the must-gather command to troubleshoot
				To get started with troubleshooting, learn about the troubleshooting scenarios for users to run the must-gather command to debug the issues, then see the procedures to start using the command.
			
Required access: Cluster administrator
1.2.1. Must-gather scenarios
- Scenario one: Use the Documented troubleshooting section to see if a solution to your problem is documented. The guide is organized by the major functions of the product. - With this scenario, you check the guide to see if your solution is in the documentation. For instance, for trouble with creating a cluster, you might find a solution in the Manage cluster section. 
- 
							Scenario two: If your problem is not documented with steps to resolve, run the must-gathercommand and use the output to debug the issue.
- 
							Scenario three: If you cannot debug the issue using your output from the must-gathercommand, then share your output with Red Hat Support.
1.2.2. Must-gather procedure
					See the following procedure to start using the must-gather command:
				
- 
							Learn about the must-gathercommand and install the prerequisites that you need at Gathering data about your cluster in the Red Hat OpenShift Container Platform documentation.
- Log in to your cluster. For the usual use-case, you should run the - must-gatherwhile you are logged into your hub cluster.- Note: If you want to check your managed clusters, find the - gather-managed.logfile that is located in the the- cluster-scoped-resourcesdirectory:- <your-directory>/cluster-scoped-resources/gather-managed.log> - <your-directory>/cluster-scoped-resources/gather-managed.log>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Check for managed clusters that are not set - Truefor the JOINED and AVAILABLE column. You can run the- must-gathercommand on those clusters that are not connected with- Truestatus.
- Add the Red Hat Advanced Cluster Management for Kubernetes image that is used for gathering data and the directory. Run the following command, where you insert the image and the directory for the output: - oc adm must-gather --image=registry.redhat.io/rhacm2/acm-must-gather-rhel8:v2.3.0 --dest-dir=<directory> - oc adm must-gather --image=registry.redhat.io/rhacm2/acm-must-gather-rhel8:v2.3.0 --dest-dir=<directory>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Go to your specified directory to see your output, which is organized in the following levels: - 
									Two peer levels: cluster-scoped-resourcesandnamespaceresources.
- Sub-level for each: API group for the custom resource definitions for both cluster-scope and namespace-scoped resources.
- 
									Next level for each: YAML file sorted by kind.
 
- 
									Two peer levels: 
1.2.3. Must-gather in a disconnected environment
					Complete the following steps to run the must-gather command in a disconnected environment:
				
- In a disconnected environment, mirror the Red Hat operator catalog images into their mirror registry. For more information, see Install on disconnected networks.
- Run the following command to extract logs, which reference the image from their mirror registry:
REGISTRY=registry.example.com:5000 IMAGE=$REGISTRY/rhacm2/acm-must-gather-rhel8@sha256:ff9f37eb400dc1f7d07a9b6f2da9064992934b69847d17f59e385783c071b9d8 oc adm must-gather --image=$IMAGE --dest-dir=./data
REGISTRY=registry.example.com:5000
IMAGE=$REGISTRY/rhacm2/acm-must-gather-rhel8@sha256:ff9f37eb400dc1f7d07a9b6f2da9064992934b69847d17f59e385783c071b9d8
oc adm must-gather --image=$IMAGE --dest-dir=./data1.3. Troubleshooting installation status stuck in installing or pending
				When installing Red Hat Advanced Cluster Management, the MultiClusterHub remains in Installing phase, or multiple pods maintain a Pending status.
			
1.3.1. Symptom: Stuck in Pending status
					More than ten minutes passed since you installed MultiClusterHub and one or more components from the status.components field of the MultiClusterHub resource report ProgressDeadlineExceeded. Resource constraints on the cluster might be the issue.
				
					Check the pods in the namespace where Multiclusterhub was installed. You might see Pending with a status similar to the following:
				
reason: Unschedulable
message: '0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/master:
        }, that the pod didn't tolerate.'
reason: Unschedulable
message: '0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/master:
        }, that the pod didn't tolerate.'In this case, the worker nodes resources are not sufficient in the cluster to run the product.
1.3.2. Resolving the problem: Adjust worker node sizing
If you have this problem, then your cluster needs to be updated with either larger or more worker nodes. See Sizing your cluster for guidelines on sizing your cluster.
1.4. Troubleshooting reinstallation failure
When reinstalling Red Hat Advanced Cluster Management for Kubernetes, the pods do not start.
1.4.1. Symptom: Reinstallation failure
If your pods do not start after you install Red Hat Advanced Cluster Management, it is likely that Red Hat Advanced Cluster Management was previously installed, and not all of the pieces were removed before you attempted this installation.
In this case, the pods do not start after completing the installation process.
1.4.2. Resolving the problem: Reinstallation failure
If you have this problem, complete the following steps:
- Run the uninstallation process to remove the current components by following the steps in Uninstalling.
- Install the Helm CLI binary version 3.2.0, or later, by following the instructions at Installing Helm.
- 
							Ensure that your Red Hat OpenShift Container Platform CLI is configured to run occommands. See Getting started with the OpenShift CLI in the OpenShift Container Platform documentation for more information about how to configure theoccommands.
- Copy the following script into a file: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - <namespace>in the script with the name of the namespace where Red Hat Advanced Cluster Management was installed. Ensure that you specify the correct namespace, as the namespace is cleaned out and deleted.
- Run the script to remove the artifacts from the previous installation.
- Run the installation. See Installing while connected online.
1.5. Troubleshooting an offline cluster
There are a few common causes for a cluster showing an offline status.
1.5.1. Symptom: Cluster status is offline
					After you complete the procedure for creating a cluster, you cannot access it from the Red Hat Advanced Cluster Management console, and it shows a status of offline.
				
1.5.2. Resolving the problem: Cluster status is offline
- Determine if the managed cluster is available. You can check this in the Clusters area of the Red Hat Advanced Cluster Management console. - If it is not available, try restarting the managed cluster. 
- If the managed cluster status is still offline, complete the following steps: - 
									Run the oc get managedcluster <cluster_name> -o yamlcommand on the hub cluster. Replace<cluster_name>with the name of your cluster.
- 
									Find the status.conditionssection.
- 
									Check the messages for type: ManagedClusterConditionAvailableand resolve any problems.
 
- 
									Run the 
1.6. Troubleshooting cluster with pending import status
If you receive Pending import continually on the console of your cluster, follow the procedure to troubleshoot the problem.
1.6.1. Symptom: Cluster with pending import status
After importing a cluster by using the Red Hat Advanced Cluster Management console, the cluster appears in the console with a status of Pending import.
1.6.2. Identifying the problem: Cluster with pending import status
- Run the following command on the managed cluster to view the Kubernetes pod names that are having the issue: - kubectl get pod -n open-cluster-management-agent | grep klusterlet-registration-agent - kubectl get pod -n open-cluster-management-agent | grep klusterlet-registration-agent- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Run the following command on the managed cluster to find the log entry for the error: - kubectl logs <registration_agent_pod> -n open-cluster-management-agent - kubectl logs <registration_agent_pod> -n open-cluster-management-agent- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace registration_agent_pod with the pod name that you identified in step 1. 
- 
							Search the returned results for text that indicates there was a networking connectivity problem. Example includes: no such host.
1.6.3. Resolving the problem: Cluster with pending import status
- Retrieve the port number that is having the problem by entering the following command on the hub cluster: - oc get infrastructure cluster -o yaml | grep apiServerURL - oc get infrastructure cluster -o yaml | grep apiServerURL- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Ensure that the hostname from the managed cluster can be resolved, and that outbound connectivity to the host and port is occurring. - If the communication cannot be established by the managed cluster, the cluster import is not complete. The cluster status for the managed cluster is Pending import. 
1.7. Troubleshooting cluster with already exists error
				If you are unable to import an OpenShift Container Platform cluster into Red Hat Advanced Cluster Management MultiClusterHub and receive an AlreadyExists error, follow the procedure to troubleshoot the problem.
			
1.7.1. Symptom: Already exists error log when importing OpenShift Container Platform cluster
					An error log shows up when importing an OpenShift Container Platform cluster into Red Hat Advanced Cluster Management MultiClusterHub:
				
1.7.2. Identifying the problem: Already exists when importing OpenShift Container Platform cluster
					Check if there are any Red Hat Advanced Cluster Management-related resources on the cluster that you want to import to new the Red Hat Advanced Cluster Management MultiClusterHub by running the following commands:
				
oc get all -n open-cluster-management-agent oc get all -n open-cluster-management-agent-addon
oc get all -n open-cluster-management-agent
oc get all -n open-cluster-management-agent-addon1.7.3. Resolving the problem: Already exists when importing OpenShift Container Platform cluster
Run the following commands to remove pre-existing resources:
oc delete namespaces open-cluster-management-agent open-cluster-management-agent-addon --wait=false
oc get crds | grep open-cluster-management.io | awk '{print $1}' | xargs oc delete crds --wait=false
oc get crds | grep open-cluster-management.io | awk '{print $1}' | xargs oc patch crds --type=merge -p '{"metadata":{"finalizers": []}}'
oc delete namespaces open-cluster-management-agent open-cluster-management-agent-addon --wait=false
oc get crds | grep open-cluster-management.io | awk '{print $1}' | xargs oc delete crds --wait=false
oc get crds | grep open-cluster-management.io | awk '{print $1}' | xargs oc patch crds --type=merge -p '{"metadata":{"finalizers": []}}'1.8. Troubleshooting cluster creation on VMware vSphere
If you experience a problem when creating a Red Hat OpenShift Container Platform cluster on VMware vSphere, see the following troubleshooting information to see if one of them addresses your problem.
				Note: Sometimes when the cluster creation process fails on VMware vSphere, the link is not enabled for you to view the logs. If this happens, you can identify the problem by viewing the log of the hive-controllers pod. The hive-controllers log is in the hive namespace.
			
1.8.1. Managed cluster creation fails with certificate IP SAN error
1.8.1.1. Symptom: Managed cluster creation fails with certificate IP SAN error
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails with an error message that indicates a certificate IP SAN error.
1.8.1.2. Identifying the problem: Managed cluster creation fails with certificate IP SAN error
The deployment of the managed cluster fails and returns the following errors in the deployment log:
time="2020-08-07T15:27:55Z" level=error msg="Error: error setting up new vSphere SOAP client: Post https://147.1.1.1/sdk: x509: cannot validate certificate for xx.xx.xx.xx because it doesn't contain any IP SANs" time="2020-08-07T15:27:55Z" level=error
time="2020-08-07T15:27:55Z" level=error msg="Error: error setting up new vSphere SOAP client: Post https://147.1.1.1/sdk: x509: cannot validate certificate for xx.xx.xx.xx because it doesn't contain any IP SANs"
time="2020-08-07T15:27:55Z" level=error1.8.1.3. Resolving the problem: Managed cluster creation fails with certificate IP SAN error
Use the VMware vCenter server fully-qualified host name instead of the IP address in the credential. You can also update the VMware vCenter CA certificate to contain the IP SAN.
1.8.2. Managed cluster creation fails with unknown certificate authority
1.8.2.1. Symptom: Managed cluster creation fails with unknown certificate authority
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails because the certificate is signed by an unknown authority.
1.8.2.2. Identifying the problem: Managed cluster creation fails with unknown certificate authority
The deployment of the managed cluster fails and returns the following errors in the deployment log:
Error: error setting up new vSphere SOAP client: Post https://vspherehost.com/sdk: x509: certificate signed by unknown authority"
Error: error setting up new vSphere SOAP client: Post https://vspherehost.com/sdk: x509: certificate signed by unknown authority"1.8.2.3. Resolving the problem: Managed cluster creation fails with unknown certificate authority
Ensure you entered the correct certificate from the certificate authority when creating the credential.
1.8.3. Managed cluster creation fails with expired certificate
1.8.3.1. Symptom: Managed cluster creation fails with expired certificate
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails because the certificate is expired or is not yet valid.
1.8.3.2. Identifying the problem: Managed cluster creation fails with expired certificate
The deployment of the managed cluster fails and returns the following errors in the deployment log:
x509: certificate has expired or is not yet valid
x509: certificate has expired or is not yet valid1.8.3.3. Resolving the problem: Managed cluster creation fails with expired certificate
Ensure that the time on your ESXi hosts is synchronized.
1.8.4. Managed cluster creation fails with insufficient privilege for tagging
1.8.4.1. Symptom: Managed cluster creation fails with insufficient privilege for tagging
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails because there is insufficient privilege to use tagging.
1.8.4.2. Identifying the problem: Managed cluster creation fails with insufficient privilege for tagging
The deployment of the managed cluster fails and returns the following errors in the deployment log:
1.8.4.3. Resolving the problem: Managed cluster creation fails with insufficient privilege for tagging
Ensure that your VMware vCenter required account privileges are correct. See Image registry removed during information for more information.
1.8.5. Managed cluster creation fails with invalid dnsVIP
1.8.5.1. Symptom: Managed cluster creation fails with invalid dnsVIP
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails because there is an invalid dnsVIP.
1.8.5.2. Identifying the problem: Managed cluster creation fails with invalid dnsVIP
If you see the following message when trying to deploy a new managed cluster with VMware vSphere, it is because you have an older OpenShift Container Platform release image that does not support VMware Installer Provisioned Infrastructure (IPI):
failed to fetch Master Machines: failed to load asset \\\"Install Config\\\": invalid \\\"install-config.yaml\\\" file: platform.vsphere.dnsVIP: Invalid value: \\\"\\\": \\\"\\\" is not a valid IP
failed to fetch Master Machines: failed to load asset \\\"Install Config\\\": invalid \\\"install-config.yaml\\\" file: platform.vsphere.dnsVIP: Invalid value: \\\"\\\": \\\"\\\" is not a valid IP1.8.5.3. Resolving the problem: Managed cluster creation fails with invalid dnsVIP
Select a release image from a later version of OpenShift Container Platform that supports VMware Installer Provisioned Infrastructure.
1.8.6. Managed cluster creation fails with incorrect network type
1.8.6.1. Symptom: Managed cluster creation fails with incorrect network type
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails because there is an incorrect network type specified.
1.8.6.2. Identifying the problem: Managed cluster creation fails with incorrect network type
If you see the following message when trying to deploy a new managed cluster with VMware vSphere, it is because you have an older OpenShift Container Platform image that does not support VMware Installer Provisioned Infrastructure (IPI):
1.8.6.3. Resolving the problem: Managed cluster creation fails with incorrect network type
Select a valid VMware vSphere network type for the specified VMware cluster.
1.8.7. Managed cluster creation fails with an error processing disk changes
1.8.7.1. Symptom: Adding the VMware vSphere managed cluster fails due to an error processing disk changes
After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere, the cluster fails because there is an error when processing disk changes.
1.8.7.2. Identifying the problem: Adding the VMware vSphere managed cluster fails due to an error processing disk changes
A message similar to the following is displayed in the logs:
ERROR ERROR Error: error reconfiguring virtual machine: error processing disk changes post-clone: disk.0: ServerFaultCode: NoPermission: RESOURCE (vm-71:2000), ACTION (queryAssociatedProfile): RESOURCE (vm-71), ACTION (PolicyIDByVirtualDisk)
ERROR
ERROR Error: error reconfiguring virtual machine: error processing disk changes post-clone: disk.0: ServerFaultCode: NoPermission: RESOURCE (vm-71:2000), ACTION (queryAssociatedProfile): RESOURCE (vm-71), ACTION (PolicyIDByVirtualDisk)1.8.7.3. Resolving the problem: Adding the VMware vSphere managed cluster fails due to an error processing disk changes
Use the VMware vSphere client to give the user All privileges for Profile-driven Storage Privileges.
1.9. Troubleshooting OpenShift Container Platform version 3.11 cluster import failure
1.9.1. Symptom: OpenShift Container Platform version 3.11 cluster import failure
After you attempt to import a Red Hat OpenShift Container Platform version 3.11 cluster, the import fails with a log message that resembles the following content:
1.9.2. Identifying the problem: OpenShift Container Platform version 3.11 cluster import failure
					This often occurs because the installed version of the kubectl command-line tool is 1.11, or earlier. Run the following command to see which version of the kubectl command-line tool you are running:
				
kubectl version
kubectl versionIf the returned data lists version 1.11, or earlier, complete one of the fixes in Resolving the problem: OpenShift Container Platform version 3.11 cluster import failure.
1.9.3. Resolving the problem: OpenShift Container Platform version 3.11 cluster import failure
You can resolve this issue by completing one of the following procedures:
- Install the latest version of the - kubectlcommand-line tool.- 
									Download the latest version of the kubectltool from: Install and Set Up kubectl in the Kubernetes documentation.
- 
									Import the cluster again after upgrading your kubectltool.
 
- 
									Download the latest version of the 
- Run a file that contains the import command. - Start the procedure in Importing a managed cluster with the CLI.
- 
									When you create the command to import your cluster, copy that command into a YAML file named import.yaml.
- Run the following command to import the cluster again from the file: - oc apply -f import.yaml - oc apply -f import.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
1.10. Troubleshooting imported clusters offline after certificate change
				Installing a custom apiserver certificate is supported, but one or more clusters that were imported before you changed the certificate information can have an offline status.
			
1.10.1. Symptom: Clusters offline after certificate change
					After you complete the procedure for updating a certificate secret, one or more of your clusters that were online are now displaying an offline status in the Red Hat Advanced Cluster Management for Kubernetes console.
				
1.10.2. Identifying the problem: Clusters offline after certificate change
					After updating the information for a custom API server certificate, clusters that were imported and running before the new certificate are now in an offline state.
				
					The errors that indicate that the certificate is the problem are found in the logs for the pods in the open-cluster-management-agent namespace of the offline managed cluster. The following examples are similar to the errors that are displayed in the logs:
				
					Log of work-agent:
				
E0917 03:04:05.874759 1 manifestwork_controller.go:179] Reconcile work test-1-klusterlet-addon-workmgr fails with err: Failed to update work status with err Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks/test-1-klusterlet-addon-workmgr": x509: certificate signed by unknown authority E0917 03:04:05.874887 1 base_controller.go:231] "ManifestWorkAgent" controller failed to sync "test-1-klusterlet-addon-workmgr", err: Failed to update work status with err Get "api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks/test-1-klusterlet-addon-workmgr": x509: certificate signed by unknown authority E0917 03:04:37.245859 1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManifestWork: failed to list *v1.ManifestWork: Get "api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks?resourceVersion=607424": x509: certificate signed by unknown authority
E0917 03:04:05.874759       1 manifestwork_controller.go:179] Reconcile work test-1-klusterlet-addon-workmgr fails with err: Failed to update work status with err Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks/test-1-klusterlet-addon-workmgr": x509: certificate signed by unknown authority
E0917 03:04:05.874887       1 base_controller.go:231] "ManifestWorkAgent" controller failed to sync "test-1-klusterlet-addon-workmgr", err: Failed to update work status with err Get "api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks/test-1-klusterlet-addon-workmgr": x509: certificate signed by unknown authority
E0917 03:04:37.245859       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManifestWork: failed to list *v1.ManifestWork: Get "api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/namespaces/test-1/manifestworks?resourceVersion=607424": x509: certificate signed by unknown authority
					Log of registration-agent:
				
I0917 02:27:41.525026       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management-agent", Name:"open-cluster-management-agent", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ManagedClusterAvailableConditionUpdated' update managed cluster "test-1" available condition to "True", due to "Managed cluster is available"
E0917 02:58:26.315984       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1beta1.CertificateSigningRequest: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true"": x509: certificate signed by unknown authority
E0917 02:58:26.598343       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManagedCluster: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true": x509: certificate signed by unknown authority
E0917 02:58:27.613963       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManagedCluster: failed to list *v1.ManagedCluster: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true"": x509: certificate signed by unknown authority
I0917 02:27:41.525026       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management-agent", Name:"open-cluster-management-agent", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ManagedClusterAvailableConditionUpdated' update managed cluster "test-1" available condition to "True", due to "Managed cluster is available"
E0917 02:58:26.315984       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1beta1.CertificateSigningRequest: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true"": x509: certificate signed by unknown authority
E0917 02:58:26.598343       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManagedCluster: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true": x509: certificate signed by unknown authority
E0917 02:58:27.613963       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1.ManagedCluster: failed to list *v1.ManagedCluster: Get "https://api.aaa-ocp.dev02.location.com:6443/apis/cluster.management.io/v1/managedclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dtest-1&resourceVersion=607408&timeout=9m33s&timeoutSeconds=573&watch=true"": x509: certificate signed by unknown authority1.10.3. Resolving the problem: Clusters offline after certificate change
To manually restore your clusters after updating your certificate information, complete the following steps for each managed cluster:
- Manually import the cluster again. Red Hat OpenShift Container Platform clusters that were created from Red Hat Advanced Cluster Management will resynchronize every 2 hours, so you can skip this step for those clusters. - On the hub cluster, display the import command by entering the following command: - oc get secret -n ${CLUSTER_NAME} ${CLUSTER_NAME}-import -ojsonpath='{.data.import\.yaml}' | base64 --decode > import.yaml- oc get secret -n ${CLUSTER_NAME} ${CLUSTER_NAME}-import -ojsonpath='{.data.import\.yaml}' | base64 --decode > import.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace CLUSTER_NAME with the name of the managed cluster that you are importing. 
- On the managed cluster, apply the - import.yamlfile:- oc apply -f import.yaml - oc apply -f import.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
1.11. Namespace remains after deleting a cluster
When you remove a managed cluster, the namespace is normally removed as part of the cluster removal process. In rare cases, the namespace remains with some artifacts in it. In that case, you must manually remove the namespace.
1.11.1. Symptom: Namespace remains after deleting a cluster
After removing a managed cluster, the namespace is not removed.
1.11.2. Resolving the problem: Namespace remains after deleting a cluster
Complete the following steps to remove the namespace manually:
- Run the following command to produce a list of the resources that remain in the <cluster_name> namespace: - oc api-resources --verbs=list --namespaced -o name | grep -E '^secrets|^serviceaccounts|^managedclusteraddons|^roles|^rolebindings|^manifestworks|^leases|^managedclusterinfo|^appliedmanifestworks' | xargs -n 1 oc get --show-kind --ignore-not-found -n <cluster_name> - oc api-resources --verbs=list --namespaced -o name | grep -E '^secrets|^serviceaccounts|^managedclusteraddons|^roles|^rolebindings|^manifestworks|^leases|^managedclusterinfo|^appliedmanifestworks' | xargs -n 1 oc get --show-kind --ignore-not-found -n <cluster_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace cluster_name with the name of the namespace for the cluster that you attempted to remove. 
- Delete each identified resource on the list that does not have a status of - Deleteby entering the following command to edit the list:- oc edit <resource_kind> <resource_name> -n <namespace> - oc edit <resource_kind> <resource_name> -n <namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace resource_kind with the kind of the resource. Replace resource_name with the name of the resource. Replace namespace with the name of the namespace of the resource. 
- 
							Locate the finalizerattribute in the in the metadata.
- 
							Delete the non-Kubernetes finalizers by using the vi editor ddcommand.
- 
							Save the list and exit the vieditor by entering the:wqcommand.
- Delete the namespace by entering the following command: - oc delete ns <cluster-name> - oc delete ns <cluster-name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace cluster-name with the name of the namespace that you are trying to delete. 
1.12. Auto-import-secret-exists error when importing a cluster
Your cluster import fails with an error message that reads: auto import secret exists.
1.12.1. Symptom: Auto import secret exists error when importing a cluster
					When importing a hive cluster for management, an auto-import-secret already exists error is displayed.
				
1.12.2. Resolving the problem: Auto-import-secret-exists error when importing a cluster
This problem occurs when you attempt to import a cluster that was previously managed by Red Hat Advanced Cluster Management. When this happens, the secrets conflict when you try to reimport the cluster.
To work around this problem, complete the following steps:
- To manually delete the existing - auto-import-secret, run the following command on the hub cluster:- oc delete secret auto-import-secret -n <cluster-namespace> - oc delete secret auto-import-secret -n <cluster-namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - cluster-namespacewith the namespace of your cluster.
- Import your cluster again using the procedure in Importing a target managed cluster to a hub cluster.
Your cluster is imported.
1.13. Troubleshooting cluster status changing from offline to available
				The status of the managed cluster alternates between offline and available without any manual change to the environment or cluster.
			
1.13.1. Symptom: Cluster status changing from offline to available
					When the network that connects the managed cluster to the hub cluster is unstable, the status of the managed cluster that is reported by the hub cluster cycles between offline and available.
				
1.13.2. Resolving the problem: Cluster status changing from offline to available
To attempt to resolve this issue, complete the following steps:
- Edit your - ManagedClusterspecification on the hub cluster by entering the following command:- oc edit managedcluster <cluster-name> - oc edit managedcluster <cluster-name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace cluster-name with the name of your managed cluster. 
- 
							Increase the value of leaseDurationSecondsin yourManagedClusterspecification. The default value is 5 minutes, but that might not be enough time to maintain the connection with the network issues. Specify a greater amount of time for the lease. For example, you can raise the setting to 20 minutes.
1.14. Troubleshooting cluster in console with pending or failed status
If you observe Pending status or Failed status in the console for a cluster you created, follow the procedure to troubleshoot the problem.
1.14.1. Symptom: Cluster in console with pending or failed status
After creating a new cluster by using the Red Hat Advanced Cluster Management for Kubernetes console, the cluster does not progress beyond the status of Pending or displays Failed status.
1.14.2. Identifying the problem: Cluster in console with pending or failed status
If the cluster displays Failed status, navigate to the details page for the cluster and follow the link to the logs provided. If no logs are found or the cluster displays Pending status, continue with the following procedure to check for logs:
- Procedure 1 - Run the following command on the hub cluster to view the names of the Kubernetes pods that were created in the namespace for the new cluster: - oc get pod -n <new_cluster_name> - oc get pod -n <new_cluster_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - new_cluster_namewith the name of the cluster that you created.
- If no pod that contains the string - provisionin the name is listed, continue with Procedure 2. If there is a pod with- provisionin the title, run the following command on the hub cluster to view the logs of that pod:- oc logs <new_cluster_name_provision_pod_name> -n <new_cluster_name> -c hive - oc logs <new_cluster_name_provision_pod_name> -n <new_cluster_name> -c hive- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - new_cluster_name_provision_pod_namewith the name of the cluster that you created, followed by the pod name that contains- provision.
- Search for errors in the logs that might explain the cause of the problem.
 
- Procedure 2 - If there is not a pod with - provisionin its name, the problem occurred earlier in the process. Complete the following procedure to view the logs:- Run the following command on the hub cluster: - oc describe clusterdeployments -n <new_cluster_name> - oc describe clusterdeployments -n <new_cluster_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - new_cluster_namewith the name of the cluster that you created. For more information about cluster installation logs, see Gathering installation logs in the Red Hat OpenShift documentation.
- See if there is additional information about the problem in the Status.Conditions.Message and Status.Conditions.Reason entries of the resource.
 
1.14.3. Resolving the problem: Cluster in console with pending or failed status
After you identify the errors in the logs, determine how to resolve the errors before you destroy the cluster and create it again.
The following example provides a possible log error of selecting an unsupported zone, and the actions that are required to resolve it:
No subnets provided for zones
No subnets provided for zonesWhen you created your cluster, you selected one or more zones within a region that are not supported. Complete one of the following actions when you recreate your cluster to resolve the issue:
- Select a different zone within the region.
- Omit the zone that does not provide the support, if you have other zones listed.
- Select a different region for your cluster.
After determining the issues from the log, destroy the cluster and recreate it.
See Creating a cluster for more information about creating a cluster.
1.15. Troubleshooting application Git server connection
				Logs from the open-cluster-management namespace display failure to clone the Git repository.
			
1.15.1. Symptom: Git server connection
					The logs from the subscription controller pod multicluster-operators-hub-subscription-<random-characters> in the open-cluster-management namespace indicates that it fails to clone the Git repository. You receive a x509: certificate signed by unknown authority error, or BadGateway error.
				
1.15.2. Resolving the problem: Git server connection
Important: Upgrade if you are on a previous version.
- Save apps.open-cluster-management.io_channels_crd.yaml as the same file name.
- On the Red Hat Advanced Cluster Management cluster, run the following command to apply the file: - oc apply -f apps.open-cluster-management.io_channels_crd.yaml - oc apply -f apps.open-cluster-management.io_channels_crd.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- In the - open-cluster-managementnamespace, edit the- advanced-cluster-management.v2.2.0CSV, run the following command and edit:- oc edit csv advanced-cluster-management.v2.2.0 -n open-cluster-management - oc edit csv advanced-cluster-management.v2.2.0 -n open-cluster-management- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Find the following containers: - 
									multicluster-operators-standalone-subscription
- multicluster-operators-hub-subscription- Replace the container images with the following: - quay.io/open-cluster-management/multicluster-operators-subscription:2.2-PR337-91af6cb37d427d22160b2c055589a4418dada4eb - quay.io/open-cluster-management/multicluster-operators-subscription:2.2-PR337-91af6cb37d427d22160b2c055589a4418dada4eb- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 - The update recreates the following pods in the - open-cluster-managementnamespace:- 
									multicluster-operators-standalone-subscription-<random-characters>
- 
									multicluster-operators-hub-subscription-<random-characters>
 
- 
									
- Check that the new pods are running with the new docker image. Run the following command, then find the new docker image:
oc get pod multicluster-operators-standalone-subscription-<random-characters> -n open-cluster-management -o yaml oc get pod multicluster-operators-hub-subscription-<random-characters> -n open-cluster-management -o yaml
oc get pod multicluster-operators-standalone-subscription-<random-characters> -n open-cluster-management -o yaml
oc get pod multicluster-operators-hub-subscription-<random-characters> -n open-cluster-management -o yaml- Update the images on managed clusters. - On the hub cluster, run the following command by replacing - CLUSTER_NAMEwith the actual managed cluster name:- oc annotate klusterletaddonconfig -n CLUSTER_NAME CLUSTER_NAME klusterletaddonconfig-pause=true --overwrite=true - oc annotate klusterletaddonconfig -n CLUSTER_NAME CLUSTER_NAME klusterletaddonconfig-pause=true --overwrite=true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Run the following command, replacing - CLUSTER_NAMEwith the actual managed cluster name:- oc edit manifestwork -n CLUSTER_NAME CLUSTER_NAME-klusterlet-addon-appmgr - oc edit manifestwork -n CLUSTER_NAME CLUSTER_NAME-klusterlet-addon-appmgr- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Find - spec.global.imageOverrides.multicluster_operators_subscriptionand set the value to:- quay.io/open-cluster-management/multicluster-operators-subscription:2.2-PR337-91af6cb37d427d22160b2c055589a4418dada4eb - quay.io/open-cluster-management/multicluster-operators-subscription:2.2-PR337-91af6cb37d427d22160b2c055589a4418dada4eb- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - This recreates the - klusterlet-addon-appmgr-<random-characters>pod in- open-cluster-management-agent-addonnamespace on the managed cluster.
- Check that the new pod is running with the new docker image.
- When you create an application through the console or the CLI, add `insecureSkipVerify: true' in the channel spec manually. See the following example: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
1.16. Troubleshooting Grafana
				When you query some time-consuming metrics in the Grafana explorer, you might encounter a Gateway Time-out error.
			
1.16.1. Symptom: Grafana explorer gateway timeout
					If you hit the Gateway Time-out error when you query some time-consuming metrics in the Grafana explorer, it is possible that the timeout is caused by the multicloud-console route in the open-cluster-management namespace.
				
1.16.2. Resolving the problem: Configure the multicloud-console route
If you have this problem, complete the following steps:
- Verify that the default configuration of Grafana has expected timeout settings: - To verify that the default timeout setting of Grafana, run the following command: - oc get secret grafana-config -n open-cluster-management-observability -o jsonpath="{.data.grafana\.ini}" | base64 -d | grep dataproxy -A 4- oc get secret grafana-config -n open-cluster-management-observability -o jsonpath="{.data.grafana\.ini}" | base64 -d | grep dataproxy -A 4- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The following timeout settings should be displayed: - [dataproxy] timeout = 300 dial_timeout = 30 keep_alive_seconds = 300 - [dataproxy] timeout = 300 dial_timeout = 30 keep_alive_seconds = 300- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To verify the default data source query timeout for Grafana, run the following command: - oc get secret/grafana-datasources -n open-cluster-management-observability -o jsonpath="{.data.datasources\.yaml}" | base64 -d | grep queryTimeout- oc get secret/grafana-datasources -n open-cluster-management-observability -o jsonpath="{.data.datasources\.yaml}" | base64 -d | grep queryTimeout- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The following timeout settings should be displayed: - queryTimeout: 300s - queryTimeout: 300s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- If you verified the default configuration of Grafana has expected timeout settings, then you can configure the - multicloud-consoleroute in the- open-cluster-managementnamespace by running the following command:- oc annotate route multicloud-console -n open-cluster-management --overwrite haproxy.router.openshift.io/timeout=300s - oc annotate route multicloud-console -n open-cluster-management --overwrite haproxy.router.openshift.io/timeout=300s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
					Refresh the Grafana page and try to query the metrics again. The Gateway Time-out error is no longer displayed.
				
1.17. Troubleshooting local cluster not selected with placement rule
				The managed clusters are selected with a placement rule, but the local-cluster (hub cluster that is also managed) is not selected. The placement rule user is not granted to permission to create deployable resources in the local-cluster namespace.
			
1.17.1. Symptom: Troubleshooting local cluster not selected
					All managed clusters are selected with a placement rule, but the local-cluster is not. The placement rule user is not granted permission to create the deployable resources in the local-cluster namespace.
				
1.17.2. Resolving the problem: Troubleshooting local cluster not selected
					To resolve this issue, you need to grant the deployable administrative permission in the local-cluster namespace. Complete the following steps:
				
- Confirm that the list of managed clusters does include - local-cluster, and that the placement rule- decisionslist does not display the local cluster. Run the following command and view the results:- % oc get managedclusters - % oc get managedclusters- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true True True 56d cluster1 true True True 16h - NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true True True 56d cluster1 true True True 16h- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a - Rolein your- .yamlfile to grant the deployable administrative permission in the- local-clusternamespace. See the following example:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a - RoleBindingresource to grant the placement rule user access to the- local-clusternamespace. See the following example:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
1.18. Troubleshooting application Kubernetes deployment version
				A managed cluster with a deprecated Kubernetes apiVersion might not be supported. See the Kubernetes issue for more details about the deprecated API version.
			
1.18.1. Symptom: Application deployment version
If one or more of your application resources in the Subscription YAML file uses the deprecated API, you might receive an error similar to the following error:
failed to install release: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1"
failed to install release: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for
kind "Deployment" in version "extensions/v1beta1"
					Or with new Kubernetes API version in your YAML file named old.yaml for instance, you might receive the following error:
				
error: unable to recognize "old.yaml": no matches for kind "Deployment" in version "deployment/v1beta1"
error: unable to recognize "old.yaml": no matches for kind "Deployment" in version "deployment/v1beta1"1.18.2. Resolving the problem: Application deployment version
- Update the - apiVersionin the resource. For example, if the error displays for Deployment kind in the subscription YAML file, you need to update the- apiVersionfrom- extensions/v1beta1to- apps/v1.- See the following example: - apiVersion: apps/v1 kind: Deployment - apiVersion: apps/v1 kind: Deployment- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify the available versions by running the following command on the managed cluster: - kubectl explain <resource> - kubectl explain <resource>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
							Check for VERSION.
1.19. Troubleshooting standalone subscription memory
				The multicluster-operators-standalone-subscription pod restarts regularly because of a memory issue.
			
1.19.1. Symptom: Standalone subscription memory
					When Operator Lifecycle Manager (OLM) deploys all operators, not only the multicluster-subscription-operator, the multicluster-operators-standalone-subscription pod restarts because not enough memory is allocated to the standalone subscription container.
				
					The memory limit of the multicluster-operators-standalone-subscription pod was increased to 2GB in the multicluster subscription community operator CSV, but this resource limit setting is ignored by OLM.
				
1.19.2. Resolving the problem: Standalone subscription memory
- After installation, find the operator subscription CR that subscribes the multicluster subscription community operator. Run the following command: - % oc get sub -n open-cluster-management acm-operator-subscription - % oc get sub -n open-cluster-management acm-operator-subscription- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Edit the operator subscription custom resource by appending the - spec.config.resources- .yamlfile to define resource limits.- Note: Do not create a new operator subscription custom resource that subscribes the same multicluster subscription community operator. Because two operator subscriptions are linked to one operator, the operator pods are - "killed"and restarted by the two operator subscription custom resources.- See the following updated - .yamlfile example:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- After the resource is saved, ensure that the standalone subscription pod is restarted with 2GB memory limit. Run the following command: - % oc get pods -n open-cluster-management multicluster-operators-standalone-subscription-7c8cbf885f-c94kz -o yaml - % oc get pods -n open-cluster-management multicluster-operators-standalone-subscription-7c8cbf885f-c94kz -o yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
1.20. Troubleshooting Klusterlet with degraded conditions
				The Klusterlet degraded conditions can help to diagnose the status of Klusterlet agents on managed cluster. If a Klusterlet is in the degraded condition, the Klusterlet agents on managed cluster might have errors that need to be troubleshooted. See the following information for Klusterlet degraded conditions that are set to True.
			
1.20.1. Symptom: Klusterlet is in the degraded condition
					After deploying a Klusterlet on managed cluster, the KlusterletRegistrationDegraded or KlusterletWorkDegraded condition displays a status of True.
				
1.20.2. Identifying the problem: Klusterlet is in the degraded condition
- Run the following command on the managed cluster to view the Klusterlet status: - kubectl get klusterlets klusterlet -oyaml - kubectl get klusterlets klusterlet -oyaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
							Check KlusterletRegistrationDegradedorKlusterletWorkDegradedto see if the condition is set toTrue. Proceed to Resolving the problem for any degraded conditions that are listed.
1.20.3. Resolving the problem: Klusterlet is in the degraded condition
See the following list of degraded statuses and how you can attempt to resolve those issues:
- 
							If the KlusterletRegistrationDegradedcondition with a status of True and the condition reason is: BootStrapSecretMissing, you need create a bootstrap secret onopen-cluster-management-agentnamespace.
- 
							If the KlusterletRegistrationDegradedcondition displays True and the condition reason is a BootstrapSecretError, or BootstrapSecretUnauthorized, then the current bootstrap secret is invalid. Delete the current bootstrap secret and recreate a valid bootstrap secret onopen-cluster-management-agentnamespace.
- 
							If the KlusterletRegistrationDegradedandKlusterletWorkDegradeddisplays True and the condition reason is HubKubeConfigSecretMissing, delete the Klusterlet and recreate it.
- 
							If the KlusterletRegistrationDegradedandKlusterletWorkDegradeddisplays True and the condition reason is: ClusterNameMissing, KubeConfigMissing, HubConfigSecretError, or HubConfigSecretUnauthorized, delete the hub cluster kubeconfig secret fromopen-cluster-management-agentnamespace. The registration agent will bootstrap again to get a new hub cluster kubecofnig secret.
- 
							If the KlusterletRegistrationDegradeddisplays True and the condition reason is GetRegistrationDeploymentFailed, or UnavailableRegistrationPod, you can check the condition message to get the problem details and attempt to resolve.
- 
							If the KlusterletWorkDegradeddisplays True and the condition reason is GetWorkDeploymentFailed ,or UnavailableWorkPod, you can check the condition message to get the problem details and attempt to resolve.
1.21. Troubleshooting Klusterlet application manager on managed clusters
				When you upgrade from Red Hat Advanced Cluster Management for Kubernetes, the klusterlet-addon-appmgr pod on Red Hat OpenShift Container Platform managed clusters version 4.5 and 4.6 are OOMKilled.
			
1.21.1. Symptom: Klusterlet application manager on managed cluster
					You receive an error for the klusterlet-addon-appmgr pod on Red Hat OpenShift Container Platform managed clusters version 4.5 and 4.6: OOMKilled.
				
1.21.2. Resolving the problem: Klusterlet application manager on managed cluster
					For Red Hat Advanced Cluster Management for Kubernetes 2.1.x and 2.2, you need to manually increase the memory limit of the pod to 8Gb. See the following steps.
				
- On your hub cluster, annotate the - klusterletaddonconfigto pause replication. See the following command:- oc annotate klusterletaddonconfig -n ${CLUSTER_NAME} ${CLUSTER_NAME} klusterletaddonconfig-pause=true -- overwrite=true- oc annotate klusterletaddonconfig -n ${CLUSTER_NAME} ${CLUSTER_NAME} klusterletaddonconfig-pause=true -- overwrite=true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- On your hub cluster, scale down the - klusterlet-addon-operator. See the following command:- oc edit manifestwork ${CLUSTER_NAME}-klusterlet-addon-operator -n ${CLUSTER_NAME}- oc edit manifestwork ${CLUSTER_NAME}-klusterlet-addon-operator -n ${CLUSTER_NAME}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Find the - klusterlet-addon-operatorDeployment and add- replicas: 0to the spec to scale down.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - On the managed cluster, the - open-cluster-management-agent-addon/klusterlet-addon-operatorpod will be terminated.
- Log in to the managed cluster to manually increase the memory limit in the - appmgrpod.- Run the following command: - % oc edit deployments -n open-cluster-management-agent-addon klusterlet-addon-appmgr - % oc edit deployments -n open-cluster-management-agent-addon klusterlet-addon-appmgr- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - For example, if the limit is 5G, increase the limit to 8G. - resources: limits: memory: 2Gi -> 8Gi requests: memory: 128Mi -> 256Mi- resources: limits: memory: 2Gi -> 8Gi requests: memory: 128Mi -> 256Mi- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
1.22. Troubleshooting Object storage channel secret
				If you change the SecretAccessKey, the subscription of an Object storage channel cannot pick up the updated secret automatically and you receive an error.
			
1.22.1. Symptom: Object storage channel secret
The subscription of an Object storage channel cannot pick up the updated secret automatically. This prevents the subscription operator from reconciliation and deploys resources from Object storage to the managed cluster.
1.22.2. Resolving the problem: Object storage channel secret
You need to manually input the credentials to create a secret, then refer to the secret within a channel.
- Annotate the subscription CR in order to generate a reconcile single to subscription operator. See the following - dataspecification:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Run - oc annotateto test:- oc annotate appsub -n <subscription-namespace> <subscription-name> test=true - oc annotate appsub -n <subscription-namespace> <subscription-name> test=true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
After you run the command, you can go to the Application console to verify that the resource is deployed to the managed cluster. Or you can log in to the managed cluster to see if the application resource is created at the given namespace.
1.23. Troubleshooting observability
				After you install the observability component, the component might be stuck and an Installing status is displayed.
			
1.23.1. Symptom: MultiClusterObservability resource status stuck
					If the observability status is stuck in an Installing status after you install and create the Observability custom resource definition (CRD), it is possible that there is no value defined for the spec:storageConfig:storageClass parameter. Alternatively, the observability component automatically finds the default storageClass, but if there is no value for the storage, the component remains stuck with the Installing status.
				
1.23.2. Resolving the problem: MultiClusterObservability resource status stuck
If you have this problem, complete the following steps:
- Verify that the observability components are installed: - To verify that the - multicluster-observability-operator, run the following command:- kubectl get pods -n open-cluster-management|grep observability - kubectl get pods -n open-cluster-management|grep observability- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To verify that the appropriate CRDs are present, run the following command: - kubectl get crd|grep observ - kubectl get crd|grep observ- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The following CRDs must be displayed before you enable the component: - multiclusterobservabilities.observability.open-cluster-management.io observabilityaddons.observability.open-cluster-management.io observatoria.core.observatorium.io - multiclusterobservabilities.observability.open-cluster-management.io observabilityaddons.observability.open-cluster-management.io observatoria.core.observatorium.io- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- If you create your own storageClass for a Bare Metal cluster, see How to create an NFS provisioner in the cluster or out of the cluster.
- 
							To ensure that the observability component can find the default storageClass, update the storageClassparameter in themulticluster-observability-operatorCRD. Your parameter might resemble the following value:
storageclass.kubernetes.io/is-default-class: "true"
storageclass.kubernetes.io/is-default-class: "true"The observability component status is updated to a Ready status when the installation is complete. If the installation fails to complete, the Fail status is displayed.
1.24. Troubleshooting OpenShift monitoring service
				Observability service in a managed cluster needs to scrape metrics from the OpenShift Container Platform monitoring stack. The metrics-collector is not installed if the OpenShift Container Platform monitoring stack is not ready.
			
1.24.1. Symptom: OpenShift monitoring service is not ready
					The endpoint-observability-operator-x pod checks if the prometheus-k8s service is available in the openshift-monitoring namespace. If the service is not present in the openshift-monitoring namespace, then the metrics-collector is not deployed. You might receive the following error message: Failed to get prometheus resource.
				
1.24.2. Resolving the problem: OpenShift monitoring service is not ready
If you have this problem, complete the following steps:
- Log in to your OpenShift Container Platform cluster.
- 
							Access the openshift-monitoringnamespace to verify that theprometheus-k8sservice is available.
- 
							Restart endpoint-observability-operator-xpod in theopen-cluster-management-addon-observabilitynamespace of the managed cluster.
1.25. Undesired label value in managedcluster resource
When you import a managed cluster, the observability components are installed by default. Your placement rule might resemble the following information:
status:
  decisions:
  - clusterName: sample-managed-cluster
    clusterNamespace: sample-managed-cluster
status:
  decisions:
  - clusterName: sample-managed-cluster
    clusterNamespace: sample-managed-clusterIf the managed cluster is not included in the placement rule, the observability components are not installed.
1.25.1. Symptom: Undesired label value in managedcluster resource
If you find that the imported cluster is not included, the observability service for your managed cluster resource might be disabled.
					Remember: When you enable the service, the vendor:OpenShift label is added to represent the target managed cluster. Observability service is only supported on OpenShift Container Platform managed cluster.
				
1.25.2. Resolving the problem: Undesired label value in managedcluster resource
					If you have this problem, enable the observability service for the target managed cluster and update labels in the managedcluster resource.
				
Complete the following steps:
- Log in to your Red Hat Advanced Cluster Management cluster.
- Change the - observabilityparameter value to- enabledby updating the placement rule. Run the following command:- oc edit placementrule -n open-cluster-management-observability - oc edit placementrule -n open-cluster-management-observability- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that OpenShift is listed as vendor for the target managed cluster by running the following command: - oc get managedcluster <CLUSTER NAME> -o yaml - oc get managedcluster <CLUSTER NAME> -o yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Update the - metadata.labels.vendorparameter value to- OpenShift.
1.26. Troubleshooting search aggregator pod status
				The search-aggregator fail to run.
			
1.26.1. Symptom 1: Search aggregator pod in Not Ready state
					Search aggregator pods are in a Not Ready state if the redisgraph-user-secret is updated. You might receive the following error:
				
1.26.2. Resolving the problem: Search aggregator pod in Not Ready state
					If you have this problem, delete the search-aggregator and search-api pods to restart the pods. Run the following commands to delete the previously mentioned pods.
				
oc delete pod -n open-cluster-management <search-aggregator> oc delete pod -n open-cluster-management <search-api>
oc delete pod -n open-cluster-management <search-aggregator>
oc delete pod -n open-cluster-management <search-api>1.26.3. Symptom 2: Search redisgraph pod in Pending state
					The search-redisgraph pod fail to run when it is in Pending state.
				
1.26.4. Resolving the problem: Search redisgraph pod in Pending state
If you have this problem complete the following steps:
- Check the pod events on the hub cluster namespace with the following command: - oc describe pod search-redisgraph-0 - oc describe pod search-redisgraph-0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If you have created a - searchcustomizationCR, check if the storage class and storage size is valid, and check if a PVC can be created. List the PVC by running the following command:- oc get pvc <storageclassname>-search-redisgraph-0 - oc get pvc <storageclassname>-search-redisgraph-0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Make sure the PVC can be bound to the - search-redisgraph-0pod. If the problem is still not resolved , delete the StatefulSet- search-redisgraph. The search operator recreates the StatefulSet. Run the following command:- oc delete statefulset -n open-cluster-management search-redisgraph - oc delete statefulset -n open-cluster-management search-redisgraph- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
1.27. Troubleshooting metrics-collector
				When the observability-client-ca-certificate secret is not refreshed in the managed cluster, you might receive an internal server error.