This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 3. Restarting the cluster gracefully
This document describes the process to restart your cluster after a graceful shutdown.
Even though the cluster is expected to be functional after the restart, the cluster might not recover due to unexpected conditions, for example:
- etcd data corruption during shutdown
- Node failure due to hardware
- Network connectivity issues
If your cluster fails to recover, follow the steps to restore to a previous cluster state.
3.1. Prerequisites
- You have gracefully shut down your cluster.
3.2. Restarting the cluster
You can restart your cluster after it has been shut down gracefully.
Prerequisites
- 
						You have access to the cluster as a user with the cluster-adminrole.
- This procedure assumes that you gracefully shut down the cluster.
Procedure
- Power on any cluster dependencies, such as external storage or an LDAP server.
- Start all cluster machines. - Use the appropriate method for your cloud environment to start the machines, for example, from your cloud provider’s web console. - Wait approximately 10 minutes before continuing to check the status of control plane nodes. 
- Verify that all control plane nodes are ready. - oc get nodes -l node-role.kubernetes.io/master - $ oc get nodes -l node-role.kubernetes.io/master- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The control plane nodes are ready if the status is - Ready, as shown in the following output:- NAME STATUS ROLES AGE VERSION ip-10-0-168-251.ec2.internal Ready master 75m v1.23.0 ip-10-0-170-223.ec2.internal Ready master 75m v1.23.0 ip-10-0-211-16.ec2.internal Ready master 75m v1.23.0 - NAME STATUS ROLES AGE VERSION ip-10-0-168-251.ec2.internal Ready master 75m v1.23.0 ip-10-0-170-223.ec2.internal Ready master 75m v1.23.0 ip-10-0-211-16.ec2.internal Ready master 75m v1.23.0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If the control plane nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved. - Get the list of current CSRs: - oc get csr - $ oc get csr- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Review the details of a CSR to verify that it is valid: - oc describe csr <csr_name> - $ oc describe csr <csr_name>- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- <csr_name>is the name of a CSR from the list of current CSRs.
 
- Approve each valid CSR: - oc adm certificate approve <csr_name> - $ oc adm certificate approve <csr_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- After the control plane nodes are ready, verify that all worker nodes are ready. - oc get nodes -l node-role.kubernetes.io/worker - $ oc get nodes -l node-role.kubernetes.io/worker- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The worker nodes are ready if the status is - Ready, as shown in the following output:- NAME STATUS ROLES AGE VERSION ip-10-0-179-95.ec2.internal Ready worker 64m v1.23.0 ip-10-0-182-134.ec2.internal Ready worker 64m v1.23.0 ip-10-0-250-100.ec2.internal Ready worker 64m v1.23.0 - NAME STATUS ROLES AGE VERSION ip-10-0-179-95.ec2.internal Ready worker 64m v1.23.0 ip-10-0-182-134.ec2.internal Ready worker 64m v1.23.0 ip-10-0-250-100.ec2.internal Ready worker 64m v1.23.0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If the worker nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved. - Get the list of current CSRs: - oc get csr - $ oc get csr- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Review the details of a CSR to verify that it is valid: - oc describe csr <csr_name> - $ oc describe csr <csr_name>- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- <csr_name>is the name of a CSR from the list of current CSRs.
 
- Approve each valid CSR: - oc adm certificate approve <csr_name> - $ oc adm certificate approve <csr_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Verify that the cluster started properly. - Check that there are no degraded cluster Operators. - oc get clusteroperators - $ oc get clusteroperators- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Check that there are no cluster Operators with the - DEGRADEDcondition set to- True.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check that all nodes are in the - Readystate:- oc get nodes - $ oc get nodes- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Check that the status for all nodes is - Ready.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
If the cluster did not start properly, you might need to restore your cluster using an etcd backup.