Este contenido no está disponible en el idioma seleccionado.
Chapter 10. Troubleshooting hosted control planes
If you encounter issues with hosted control planes, see the following information to guide you through troubleshooting.
10.1. Gathering information to troubleshoot hosted control planes Copiar enlaceEnlace copiado en el portapapeles!
When you need to troubleshoot an issue with hosted control plane clusters, you can gather information by running the must-gather command. The command generates output for the management cluster and the hosted cluster.
The output for the management cluster contains the following content:
- Cluster-scoped resources: These resources are node definitions of the management cluster.
-
The
hypershift-dumpcompressed file: This file is useful if you need to share the content with other people. - Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
- Network logs: These logs include the OVN northbound and southbound databases and the status for each one.
- Hosted clusters: This level of output involves all of the resources inside of the hosted cluster.
The output for the hosted cluster contains the following content:
- Cluster-scoped resources: These resources include all of the cluster-wide objects, such as nodes and CRDs.
- Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
Although the output does not contain any secret objects from the cluster, it can contain references to the names of secrets.
Prerequisites
-
You must have
cluster-adminaccess to the management cluster. -
You need the
namevalue for theHostedClusterresource and the namespace where the CR is deployed. -
You must have the
hcpcommand line interface installed. For more information, see Installing the hosted control planes command line interface. -
You must have the OpenShift CLI (
oc) installed. -
You must ensure that the
kubeconfigfile is loaded and is pointing to the management cluster.
Procedure
To gather the output for troubleshooting, enter the following command:
oc adm must-gather --image=registry.redhat.io/multicluster-engine/must-gather-rhel9:v<mce_version> \ /usr/bin/gather hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE hosted-cluster-name=HOSTEDCLUSTERNAME \ --dest-dir=NAME ; tar -cvzf NAME.tgz NAME
$ oc adm must-gather --image=registry.redhat.io/multicluster-engine/must-gather-rhel9:v<mce_version> \ /usr/bin/gather hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE hosted-cluster-name=HOSTEDCLUSTERNAME \ --dest-dir=NAME ; tar -cvzf NAME.tgz NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
-
You replace
<mce_version>with the version of multicluster engine Operator that you are using; for example,2.6. -
The
hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACEparameter is optional. If you do not include it, the command runs as though the hosted cluster is in the default namespace, which isclusters. -
The
--dest-dir=NAMEparameter is optional. Specify that parameter if you want to save the results of the command to a compressed file, replacingNAMEwith the name of the directory where you want to save the results.
-
You replace
10.2. Restarting hosted control plane components Copiar enlaceEnlace copiado en el portapapeles!
If you are an administrator for hosted control planes, you can use the hypershift.openshift.io/restart-date annotation to restart all control plane components for a particular HostedCluster resource. For example, you might need to restart control plane components for certificate rotation.
Procedure
To restart a control plane, annotate the
HostedClusterresource by entering the following command:oc annotate hostedcluster \ -n <hosted_cluster_namespace> \ <hosted_cluster_name> \ hypershift.openshift.io/restart-date=$(date --iso-8601=seconds)
$ oc annotate hostedcluster \ -n <hosted_cluster_namespace> \ <hosted_cluster_name> \ hypershift.openshift.io/restart-date=$(date --iso-8601=seconds)1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The control plane is restarted whenever the value of the annotation changes. The
datecommand serves as the source of a unique string. The annotation is treated as a string, not a timestamp.
Verification
After you restart a control plane, the following hosted control planes components are typically restarted:
You might see some additional components restarting as a side effect of changes implemented by the other components.
- catalog-operator
- certified-operators-catalog
- cluster-api
- cluster-autoscaler
- cluster-policy-controller
- cluster-version-operator
- community-operators-catalog
- control-plane-operator
- hosted-cluster-config-operator
- ignition-server
- ingress-operator
- konnectivity-agent
- konnectivity-server
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- machine-approver
- oauth-openshift
- olm-operator
- openshift-apiserver
- openshift-controller-manager
- openshift-oauth-apiserver
- packageserver
- redhat-marketplace-catalog
- redhat-operators-catalog
10.3. Pausing the reconciliation of a hosted cluster and hosted control plane Copiar enlaceEnlace copiado en el portapapeles!
If you are a cluster instance administrator, you can pause the reconciliation of a hosted cluster and hosted control plane. You might want to pause reconciliation when you back up and restore an etcd database or when you need to debug problems with a hosted cluster or hosted control plane.
Procedure
To pause reconciliation for a hosted cluster and hosted control plane, populate the
pausedUntilfield of theHostedClusterresource.To pause the reconciliation until a specific time, enter the following command:
oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"<timestamp>"}}' --type=merge$ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"<timestamp>"}}' --type=merge1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify a timestamp in the RFC339 format, for example,
2024-03-03T03:28:48Z. The reconciliation is paused until the specified time is passed.
To pause the reconciliation indefinitely, enter the following command:
oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=merge$ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=mergeCopy to Clipboard Copied! Toggle word wrap Toggle overflow The reconciliation is paused until you remove the field from the
HostedClusterresource.When the pause reconciliation field is populated for the
HostedClusterresource, the field is automatically added to the associatedHostedControlPlaneresource.
To remove the
pausedUntilfield, enter the following patch command:oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":null}}' --type=merge$ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":null}}' --type=mergeCopy to Clipboard Copied! Toggle word wrap Toggle overflow
10.4. Scaling down the data plane to zero Copiar enlaceEnlace copiado en el portapapeles!
If you are not using the hosted control plane, to save the resources and cost you can scale down a data plane to zero.
Ensure you are prepared to scale down the data plane to zero. Because the workload from the worker nodes disappears after scaling down.
Procedure
Set the
kubeconfigfile to access the hosted cluster by running the following command:export KUBECONFIG=<install_directory>/auth/kubeconfig
$ export KUBECONFIG=<install_directory>/auth/kubeconfigCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the name of the
NodePoolresource associated to your hosted cluster by running the following command:oc get nodepool --namespace <HOSTED_CLUSTER_NAMESPACE>
$ oc get nodepool --namespace <HOSTED_CLUSTER_NAMESPACE>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: To prevent the pods from draining, add the
nodeDrainTimeoutfield in theNodePoolresource by running the following command:oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>
$ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo allow the node draining process to continue for a certain period of time, you can set the value of the
nodeDrainTimeoutfield accordingly, for example,nodeDrainTimeout: 1m.Scale down the
NodePoolresource associated to your hosted cluster by running the following command:oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=0
$ oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAfter scaling down the data plan to zero, some pods in the control plane stay in the
Pendingstatus and the hosted control plane stays up and running. If necessary, you can scale up theNodePoolresource.Optional: Scale up the
NodePoolresource associated to your hosted cluster by running the following command:oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=1
$ oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow After rescaling the
NodePoolresource, wait for couple of minutes for theNodePoolresource to become available in aReadystate.
Verification
Verify that the value for the
nodeDrainTimeoutfield is greater than0sby running the following command:oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -ojsonpath='{.spec.nodeDrainTimeout}'$ oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -ojsonpath='{.spec.nodeDrainTimeout}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow