Início
Produtos
OpenShift Container Platform
4.16
Hosted control planes
Chapter 10. Troubleshooting hosted control planes

Este conteúdo não está disponível no idioma selecionado.

Chapter 10. Troubleshooting hosted control planes

If you encounter issues with hosted control planes, see the following information to guide you through troubleshooting.

10.1. Gathering information to troubleshoot hosted control planes
Copiar o link

When you need to troubleshoot an issue with hosted control plane clusters, you can gather information by running the must-gather command. The command generates output for the management cluster and the hosted cluster.

The output for the management cluster contains the following content:

Cluster-scoped resources: These resources are node definitions of the management cluster.
The hypershift-dump compressed file: This file is useful if you need to share the content with other people.
Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
Network logs: These logs include the OVN northbound and southbound databases and the status for each one.
Hosted clusters: This level of output involves all of the resources inside of the hosted cluster.

The output for the hosted cluster contains the following content:

Cluster-scoped resources: These resources include all of the cluster-wide objects, such as nodes and CRDs.
Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.

Although the output does not contain any secret objects from the cluster, it can contain references to the names of secrets.

Prerequisites

You must have cluster-admin access to the management cluster.
You need the name value for the HostedCluster resource and the namespace where the CR is deployed.
You must have the hcp command line interface installed. For more information, see Installing the hosted control planes command line interface.
You must have the OpenShift CLI (oc) installed.
You must ensure that the kubeconfig file is loaded and is pointing to the management cluster.

Procedure

To gather the output for troubleshooting, enter the following command:
```
oc adm must-gather --image=registry.redhat.io/multicluster-engine/must-gather-rhel9:v<mce_version> \
  /usr/bin/gather hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE hosted-cluster-name=HOSTEDCLUSTERNAME \
  --dest-dir=NAME ; tar -cvzf NAME.tgz NAME
```
```
$ oc adm must-gather --image=registry.redhat.io/multicluster-engine/must-gather-rhel9:v<mce_version> \
  /usr/bin/gather hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE hosted-cluster-name=HOSTEDCLUSTERNAME \
  --dest-dir=NAME ; tar -cvzf NAME.tgz NAME
```
Copy to Clipboard Toggle word wrap
where:
- You replace <mce_version> with the version of multicluster engine Operator that you are using; for example, 2.6.
- The hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE parameter is optional. If you do not include it, the command runs as though the hosted cluster is in the default namespace, which is clusters.
- The --dest-dir=NAME parameter is optional. Specify that parameter if you want to save the results of the command to a compressed file, replacing NAME with the name of the directory where you want to save the results.

10.2. Restarting hosted control plane components
Copiar o link

If you are an administrator for hosted control planes, you can use the hypershift.openshift.io/restart-date annotation to restart all control plane components for a particular HostedCluster resource. For example, you might need to restart control plane components for certificate rotation.

Procedure

To restart a control plane, annotate the HostedCluster resource by entering the following command:
```
oc annotate hostedcluster \
  -n <hosted_cluster_namespace> \
  <hosted_cluster_name> \
  hypershift.openshift.io/restart-date=$(date --iso-8601=seconds)
```
```
$ oc annotate hostedcluster \
  -n <hosted_cluster_namespace> \
  <hosted_cluster_name> \
  hypershift.openshift.io/restart-date=$(date --iso-8601=seconds) 
```
1
Copy to Clipboard Toggle word wrap
1
The control plane is restarted whenever the value of the annotation changes. The date command serves as the source of a unique string. The annotation is treated as a string, not a timestamp.

Verification

After you restart a control plane, the following hosted control planes components are typically restarted:

Note

You might see some additional components restarting as a side effect of changes implemented by the other components.

catalog-operator
certified-operators-catalog
cluster-api
cluster-autoscaler
cluster-policy-controller
cluster-version-operator
community-operators-catalog
control-plane-operator
hosted-cluster-config-operator
ignition-server
ingress-operator
konnectivity-agent
konnectivity-server
kube-apiserver
kube-controller-manager
kube-scheduler
machine-approver
oauth-openshift
olm-operator
openshift-apiserver
openshift-controller-manager
openshift-oauth-apiserver
packageserver
redhat-marketplace-catalog
redhat-operators-catalog

10.3. Pausing the reconciliation of a hosted cluster and hosted control plane
Copiar o link

If you are a cluster instance administrator, you can pause the reconciliation of a hosted cluster and hosted control plane. You might want to pause reconciliation when you back up and restore an etcd database or when you need to debug problems with a hosted cluster or hosted control plane.

Procedure

To pause reconciliation for a hosted cluster and hosted control plane, populate the pausedUntil field of the HostedCluster resource.
- To pause the reconciliation until a specific time, enter the following command:
  $ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"<timestamp>"}}' --type=merge
  1
  Copy to Clipboard Toggle word wrap
  1
  Specify a timestamp in the RFC339 format, for example, 2024-03-03T03:28:48Z. The reconciliation is paused until the specified time is passed.
- To pause the reconciliation indefinitely, enter the following command:
  $ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=merge
  Copy to Clipboard Toggle word wrap
  The reconciliation is paused until you remove the field from the HostedCluster resource.
  When the pause reconciliation field is populated for the HostedCluster resource, the field is automatically added to the associated HostedControlPlane resource.

To remove the pausedUntil field, enter the following patch command:

oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":null}}' --type=merge

$ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":null}}' --type=merge

Copy to Clipboard

Toggle word wrap

10.4. Scaling down the data plane to zero
Copiar o link

If you are not using the hosted control plane, to save the resources and cost you can scale down a data plane to zero.

Note

Ensure you are prepared to scale down the data plane to zero. Because the workload from the worker nodes disappears after scaling down.

Procedure

Set the kubeconfig file to access the hosted cluster by running the following command:
```
export KUBECONFIG=<install_directory>/auth/kubeconfig
```
```
$ export KUBECONFIG=<install_directory>/auth/kubeconfig
```
Copy to Clipboard Toggle word wrap
Get the name of the NodePool resource associated to your hosted cluster by running the following command:
```
oc get nodepool --namespace <HOSTED_CLUSTER_NAMESPACE>
```
```
$ oc get nodepool --namespace <HOSTED_CLUSTER_NAMESPACE>
```
Copy to Clipboard Toggle word wrap

Optional: To prevent the pods from draining, add the nodeDrainTimeout field in the NodePool resource by running the following command:

oc edit nodepool <nodepool_name>  --namespace <hosted_cluster_namespace>

$ oc edit nodepool <nodepool_name>  --namespace <hosted_cluster_namespace>

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: hypershift.openshift.io/v1alpha1
kind: NodePool
metadata:
# ...
  name: nodepool-1
  namespace: clusters
# ...
spec:
  arch: amd64
  clusterName: clustername 
  management:
    autoRepair: false
    replace:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0
      strategy: RollingUpdate
    upgradeType: Replace
  nodeDrainTimeout: 0s 
# ...

apiVersion: hypershift.openshift.io/v1alpha1
kind: NodePool
metadata:
# ...
  name: nodepool-1
  namespace: clusters
# ...
spec:
  arch: amd64
  clusterName: clustername


  management:
    autoRepair: false
    replace:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0
      strategy: RollingUpdate
    upgradeType: Replace
  nodeDrainTimeout: 0s


# ...

Copy to Clipboard

Toggle word wrap

1: Defines the name of your hosted cluster.
2: Specifies the total amount of time that the controller spends to drain a node. By default, the nodeDrainTimeout: 0s setting blocks the node draining process.

Note

To allow the node draining process to continue for a certain period of time, you can set the value of the nodeDrainTimeout field accordingly, for example, nodeDrainTimeout: 1m.

Scale down the NodePool resource associated to your hosted cluster by running the following command:
```
oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=0
```
```
$ oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=0
```
Copy to Clipboard Toggle word wrap
Note
After scaling down the data plan to zero, some pods in the control plane stay in the Pending status and the hosted control plane stays up and running. If necessary, you can scale up the NodePool resource.
Optional: Scale up the NodePool resource associated to your hosted cluster by running the following command:
```
oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=1
```
```
$ oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=1
```
Copy to Clipboard Toggle word wrap
After rescaling the NodePool resource, wait for couple of minutes for the NodePool resource to become available in a Ready state.

Verification

Verify that the value for the nodeDrainTimeout field is greater than 0s by running the following command:

oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -ojsonpath='{.spec.nodeDrainTimeout}'

$ oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -ojsonpath='{.spec.nodeDrainTimeout}'

Copy to Clipboard

Toggle word wrap

Voltar ao topo

Este conteúdo não está disponível no idioma selecionado.

Chapter 10. Troubleshooting hosted control planes

10.1. Gathering information to troubleshoot hosted control planes
Copiar o link

10.2. Restarting hosted control plane components
Copiar o link

10.3. Pausing the reconciliation of a hosted cluster and hosted control plane
Copiar o link

10.4. Scaling down the data plane to zero
Copiar o link

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este conteúdo não está disponível no idioma selecionado.

Chapter 10. Troubleshooting hosted control planes

10.1. Gathering information to troubleshoot hosted control planesCopiar o linkLink copiado para a área de transferência!

10.2. Restarting hosted control plane componentsCopiar o linkLink copiado para a área de transferência!

10.3. Pausing the reconciliation of a hosted cluster and hosted control planeCopiar o linkLink copiado para a área de transferência!

10.4. Scaling down the data plane to zeroCopiar o linkLink copiado para a área de transferência!

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. Gathering information to troubleshoot hosted control planes
Copiar o link

10.2. Restarting hosted control plane components
Copiar o link

10.3. Pausing the reconciliation of a hosted cluster and hosted control plane
Copiar o link

10.4. Scaling down the data plane to zero
Copiar o link