Accueiil
Products
OpenShift Container Platform
4.17
Backup and restore
Chapter 3. Restarting the cluster gracefully

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Restarting the cluster gracefully

This document describes the process to restart your cluster after a graceful shutdown.

Even though the cluster is expected to be functional after the restart, the cluster might not recover due to unexpected conditions, for example:

etcd data corruption during shutdown
Node failure due to hardware
Network connectivity issues

If your cluster fails to recover, follow the steps to restore to a previous cluster state.

3.1. Prerequisites
Copier lien

You have gracefully shut down your cluster.

3.2. Restarting the cluster
Copier lien

You can restart your cluster after it has been shut down gracefully.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
This procedure assumes that you gracefully shut down the cluster.

Procedure

Turn on the control plane nodes.
- If you are using the admin.kubeconfig from the cluster installation and the API virtual IP address (VIP) is up, complete the following steps:
  1. Set the KUBECONFIG environment variable to the admin.kubeconfig path.
  2. For each control plane node in the cluster, run the following command:
    
    $ oc adm uncordon <node>
    
    Copy to Clipboard Toggle word wrap
- If you do not have access to your admin.kubeconfig credentials, complete the following steps:
  1. Use SSH to connect to a control plane node.
  2. Copy the localhost-recovery.kubeconfig file to the /root directory.
  3. Use that file to run the following command for each control plane node in the cluster:
    
    $ oc adm uncordon <node>
    
    Copy to Clipboard Toggle word wrap
Power on any cluster dependencies, such as external storage or an LDAP server.
Start all cluster machines.
Use the appropriate method for your cloud environment to start the machines, for example, from your cloud provider’s web console.
Wait approximately 10 minutes before continuing to check the status of control plane nodes.

Verify that all control plane nodes are ready.

oc get nodes -l node-role.kubernetes.io/master

$ oc get nodes -l node-role.kubernetes.io/master

Copy to Clipboard

Toggle word wrap

The control plane nodes are ready if the status is Ready, as shown in the following output:

NAME                           STATUS   ROLES                  AGE   VERSION
ip-10-0-168-251.ec2.internal   Ready    control-plane,master   75m   v1.30.3
ip-10-0-170-223.ec2.internal   Ready    control-plane,master   75m   v1.30.3
ip-10-0-211-16.ec2.internal    Ready    control-plane,master   75m   v1.30.3

NAME                           STATUS   ROLES                  AGE   VERSION
ip-10-0-168-251.ec2.internal   Ready    control-plane,master   75m   v1.30.3
ip-10-0-170-223.ec2.internal   Ready    control-plane,master   75m   v1.30.3
ip-10-0-211-16.ec2.internal    Ready    control-plane,master   75m   v1.30.3

Copy to Clipboard

Toggle word wrap

If the control plane nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.
1. Get the list of current CSRs:
  $ oc get csr
  Copy to Clipboard Toggle word wrap
2. Review the details of a CSR to verify that it is valid:
  $ oc describe csr <csr_name>
  1
  Copy to Clipboard Toggle word wrap
  1
  <csr_name> is the name of a CSR from the list of current CSRs.
3. Approve each valid CSR:
  $ oc adm certificate approve <csr_name>
  Copy to Clipboard Toggle word wrap

After the control plane nodes are ready, verify that all worker nodes are ready.

oc get nodes -l node-role.kubernetes.io/worker

$ oc get nodes -l node-role.kubernetes.io/worker

Copy to Clipboard

Toggle word wrap

The worker nodes are ready if the status is Ready, as shown in the following output:

NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-179-95.ec2.internal    Ready    worker   64m   v1.30.3
ip-10-0-182-134.ec2.internal   Ready    worker   64m   v1.30.3
ip-10-0-250-100.ec2.internal   Ready    worker   64m   v1.30.3

NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-179-95.ec2.internal    Ready    worker   64m   v1.30.3
ip-10-0-182-134.ec2.internal   Ready    worker   64m   v1.30.3
ip-10-0-250-100.ec2.internal   Ready    worker   64m   v1.30.3

Copy to Clipboard

Toggle word wrap

If the worker nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.
1. Get the list of current CSRs:
  $ oc get csr
  Copy to Clipboard Toggle word wrap
2. Review the details of a CSR to verify that it is valid:
  $ oc describe csr <csr_name>
  1
  Copy to Clipboard Toggle word wrap
  1
  <csr_name> is the name of a CSR from the list of current CSRs.
3. Approve each valid CSR:
  $ oc adm certificate approve <csr_name>
  Copy to Clipboard Toggle word wrap

After the control plane and compute nodes are ready, mark all the nodes in the cluster as schedulable by running the following command:

for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc adm uncordon ${node} ; done

$ for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc adm uncordon ${node} ; done

Copy to Clipboard

Toggle word wrap

Verify that the cluster started properly.

Check that there are no degraded cluster Operators.

oc get clusteroperators

$ oc get clusteroperators

Copy to Clipboard

Toggle word wrap

Check that there are no cluster Operators with the DEGRADED condition set to True.

NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.17.0    True        False         False      59m
cloud-credential                           4.17.0    True        False         False      85m
cluster-autoscaler                         4.17.0    True        False         False      73m
config-operator                            4.17.0    True        False         False      73m
console                                    4.17.0    True        False         False      62m
csi-snapshot-controller                    4.17.0    True        False         False      66m
dns                                        4.17.0    True        False         False      76m
etcd                                       4.17.0    True        False         False      76m
...

NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.17.0    True        False         False      59m
cloud-credential                           4.17.0    True        False         False      85m
cluster-autoscaler                         4.17.0    True        False         False      73m
config-operator                            4.17.0    True        False         False      73m
console                                    4.17.0    True        False         False      62m
csi-snapshot-controller                    4.17.0    True        False         False      66m
dns                                        4.17.0    True        False         False      76m
etcd                                       4.17.0    True        False         False      76m
...

Copy to Clipboard

Toggle word wrap

Check that all nodes are in the Ready state:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Check that the status for all nodes is Ready.

NAME                           STATUS   ROLES                  AGE   VERSION
ip-10-0-168-251.ec2.internal   Ready    control-plane,master   82m   v1.30.3
ip-10-0-170-223.ec2.internal   Ready    control-plane,master   82m   v1.30.3
ip-10-0-179-95.ec2.internal    Ready    worker                 70m   v1.30.3
ip-10-0-182-134.ec2.internal   Ready    worker                 70m   v1.30.3
ip-10-0-211-16.ec2.internal    Ready    control-plane,master   82m   v1.30.3
ip-10-0-250-100.ec2.internal   Ready    worker                 69m   v1.30.3

NAME                           STATUS   ROLES                  AGE   VERSION
ip-10-0-168-251.ec2.internal   Ready    control-plane,master   82m   v1.30.3
ip-10-0-170-223.ec2.internal   Ready    control-plane,master   82m   v1.30.3
ip-10-0-179-95.ec2.internal    Ready    worker                 70m   v1.30.3
ip-10-0-182-134.ec2.internal   Ready    worker                 70m   v1.30.3
ip-10-0-211-16.ec2.internal    Ready    control-plane,master   82m   v1.30.3
ip-10-0-250-100.ec2.internal   Ready    worker                 69m   v1.30.3

Copy to Clipboard

Toggle word wrap

If the cluster did not start properly, you might need to restore your cluster using an etcd backup. For more information, see "Restoring to a previous cluster state".

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Restarting the cluster gracefully

3.1. Prerequisites
Copier lien

3.2. Restarting the cluster
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Restarting the cluster gracefully

3.1. PrerequisitesCopier lienLien copié sur presse-papiers!

3.2. Restarting the clusterCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. Prerequisites
Copier lien

3.2. Restarting the cluster
Copier lien