Home
Products
OpenShift Container Platform
4.11
Backup and restore
Chapter 2. Shutting down the cluster gracefully

Chapter 2. Shutting down the cluster gracefully

This document describes the process to gracefully shut down your cluster. You might need to temporarily shut down your cluster for maintenance reasons, or to save on resource costs.

2.1. Prerequisites
Copy link

Take an etcd backup prior to shutting down the cluster.
Important
It is important to take an etcd backup before performing this procedure so that your cluster can be restored if you encounter any issues when restarting the cluster.
For example, the following conditions can cause the restarted cluster to malfunction:
- etcd data corruption during shutdown
- Node failure due to hardware
- Network connectivity issues
If your cluster fails to recover, follow the steps to restore to a previous cluster state.

2.2. Shutting down the cluster
Copy link

You can shut down your cluster in a graceful manner so that it can be restarted at a later date.

Note

You can shut down a cluster until a year from the installation date and expect it to restart gracefully. After a year from the installation date, the cluster certificates expire.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have taken an etcd backup.

Procedure

If you plan to shut down the cluster for an extended period of time, determine the date that cluster certificates expire.
You must restart the cluster prior to the date that certificates expire. As the cluster restarts, the process might require you to manually approve the pending certificate signing requests (CSRs) to recover kubelet certificates.
1. Check the expiration date for the kube-apiserver-to-kubelet-signer CA certificate:
  $ oc -n openshift-kube-apiserver-operator get secret kube-apiserver-to-kubelet-signer -o jsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}{"\n"}'
  Example output
  2023-08-05T14:37:50Z
2. Check the expiration date for the kubelet certificates:
  1. Start a debug session for a control plane node by running the following command:
    
    $ oc debug node/<node_name>
  2. Change your root directory to /host by running the following command:
    
    sh-4.4# chroot /host
  3. Check the kubelet client certificate expiration date by running the following command:
    
    sh-5.1# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate
    
    Example output
    
    notAfter=Jun 6 10:50:07 2023 GMT
  4. Check the kubelet server certificate expiration date by running the following command:
    
    sh-5.1# openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate
    
    Example output
    
    notAfter=Jun 6 10:50:07 2023 GMT
  5. Exit the debug session.
  6. Repeat these steps to check certificate expiration dates on all control plane nodes. To ensure that the cluster can restart gracefully, plan to restart it before the earliest certificate expiration date.
Shut down all of the nodes in the cluster. You can do this from your cloud provider’s web console, or run the following loop:
```
$ for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do oc debug node/${node} -- chroot /host shutdown -h 1; done 
```
1
1
-h 1 indicates how long, in minutes, this process lasts before the control-plane nodes are shut down. For large-scale clusters with 10 nodes or more, set to 10 minutes or longer to make sure all the compute nodes have time to shut down first.
Example output
```
Starting pod/ip-10-0-130-169us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Shutdown scheduled for Mon 2021-09-13 09:36:17 UTC, use 'shutdown -c' to cancel.

Removing debug pod ...
Starting pod/ip-10-0-150-116us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Shutdown scheduled for Mon 2021-09-13 09:36:29 UTC, use 'shutdown -c' to cancel.
```
Shutting down the nodes using one of these methods allows pods to terminate gracefully, which reduces the chance for data corruption.
Note
Adjust the shut down time to be longer for large-scale clusters:
$ for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do oc debug node/${node} -- chroot /host shutdown -h 10; done
Note
It is not necessary to drain control plane nodes of the standard pods that ship with OpenShift Container Platform prior to shutdown.
Cluster administrators are responsible for ensuring a clean restart of their own workloads after the cluster is restarted. If you drained control plane nodes prior to shutdown because of custom workloads, you must mark the control plane nodes as schedulable before the cluster will be functional again after restart.
Shut off any cluster dependencies that are no longer needed, such as external storage or an LDAP server. Be sure to consult your vendor’s documentation before doing so.
Important
If you deployed your cluster on a cloud-provider platform, do not shut down, suspend, or delete the associated cloud resources. If you delete the cloud resources of a suspended virtual machine, OpenShift Container Platform might not restore successfully.

Chapter 2. Shutting down the cluster gracefully

2.1. Prerequisites
Copy link

2.2. Shutting down the cluster
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. Shutting down the cluster gracefully

2.1. PrerequisitesCopy linkLink copied to clipboard!

2.2. Shutting down the clusterCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Prerequisites
Copy link

2.2. Shutting down the cluster
Copy link