Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 3. Restarting the cluster gracefully


This document describes the process to restart your cluster after a graceful shutdown.

Even though the cluster is expected to be functional after the restart, the cluster might not recover due to unexpected conditions, for example:

  • etcd data corruption during shutdown
  • Node failure due to hardware
  • Network connectivity issues

If your cluster fails to recover, follow the steps to restore to a previous cluster state.

3.1. Prerequisites

3.2. Restarting the cluster

You can restart your cluster after it has been shut down gracefully.

Prerequisites

  • You have access to the cluster as a user with the
    cluster-admin
    role.
  • This procedure assumes that you gracefully shut down the cluster.

Procedure

  1. Turn on the control plane nodes.

    • If you are using the

      admin.kubeconfig
      from the cluster installation and the API virtual IP address (VIP) is up, complete the following steps:

      1. Set the
        KUBECONFIG
        environment variable to the
        admin.kubeconfig
        path.
      2. For each control plane node in the cluster, run the following command:

        $ oc adm uncordon <node>
    • If you do not have access to your

      admin.kubeconfig
      credentials, complete the following steps:

      1. Use SSH to connect to a control plane node.
      2. Copy the
        localhost-recovery.kubeconfig
        file to the
        /root
        directory.
      3. Use that file to run the following command for each control plane node in the cluster:

        $ oc adm uncordon <node>
  2. Power on any cluster dependencies, such as external storage or an LDAP server.
  3. Start all cluster machines.

    Use the appropriate method for your cloud environment to start the machines, for example, from your cloud provider’s web console.

    Wait approximately 10 minutes before continuing to check the status of control plane nodes.

  4. Verify that all control plane nodes are ready.

    $ oc get nodes -l node-role.kubernetes.io/master

    The control plane nodes are ready if the status is

    Ready
    , as shown in the following output:

    NAME                           STATUS   ROLES    AGE   VERSION
    ip-10-0-168-251.ec2.internal   Ready    master   75m   v1.25.0
    ip-10-0-170-223.ec2.internal   Ready    master   75m   v1.25.0
    ip-10-0-211-16.ec2.internal    Ready    master   75m   v1.25.0
  5. If the control plane nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.

    1. Get the list of current CSRs:

      $ oc get csr
    2. Review the details of a CSR to verify that it is valid:

      $ oc describe csr <csr_name> 
      1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    3. Approve each valid CSR:

      $ oc adm certificate approve <csr_name>
  6. After the control plane nodes are ready, verify that all worker nodes are ready.

    $ oc get nodes -l node-role.kubernetes.io/worker

    The worker nodes are ready if the status is

    Ready
    , as shown in the following output:

    NAME                           STATUS   ROLES    AGE   VERSION
    ip-10-0-179-95.ec2.internal    Ready    worker   64m   v1.25.0
    ip-10-0-182-134.ec2.internal   Ready    worker   64m   v1.25.0
    ip-10-0-250-100.ec2.internal   Ready    worker   64m   v1.25.0
  7. If the worker nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.

    1. Get the list of current CSRs:

      $ oc get csr
    2. Review the details of a CSR to verify that it is valid:

      $ oc describe csr <csr_name> 
      1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    3. Approve each valid CSR:

      $ oc adm certificate approve <csr_name>
  8. Verify that the cluster started properly.

    1. Check that there are no degraded cluster Operators.

      $ oc get clusteroperators

      Check that there are no cluster Operators with the

      DEGRADED
      condition set to
      True
      .

      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
      authentication                             4.12.0    True        False         False      59m
      cloud-credential                           4.12.0    True        False         False      85m
      cluster-autoscaler                         4.12.0    True        False         False      73m
      config-operator                            4.12.0    True        False         False      73m
      console                                    4.12.0    True        False         False      62m
      csi-snapshot-controller                    4.12.0    True        False         False      66m
      dns                                        4.12.0    True        False         False      76m
      etcd                                       4.12.0    True        False         False      76m
      ...
    2. Check that all nodes are in the

      Ready
      state:

      $ oc get nodes

      Check that the status for all nodes is

      Ready
      .

      NAME                           STATUS   ROLES    AGE   VERSION
      ip-10-0-168-251.ec2.internal   Ready    master   82m   v1.25.0
      ip-10-0-170-223.ec2.internal   Ready    master   82m   v1.25.0
      ip-10-0-179-95.ec2.internal    Ready    worker   70m   v1.25.0
      ip-10-0-182-134.ec2.internal   Ready    worker   70m   v1.25.0
      ip-10-0-211-16.ec2.internal    Ready    master   82m   v1.25.0
      ip-10-0-250-100.ec2.internal   Ready    worker   69m   v1.25.0

      If the cluster did not start properly, you might need to restore your cluster using an etcd backup.

  9. After the control plane and worker nodes are ready, mark all the nodes in the cluster as schedulable. Run the following command:

    for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc adm uncordon ${node} ; done
Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2026 Red Hat
Nach oben