Chapter 4. Hibernating an OpenShift Container Platform cluster
You can hibernate your OpenShift Container Platform cluster for up to 90 days.
4.1. About cluster hibernation
OpenShift Container Platform clusters can be hibernated in order to save money on cloud hosting costs. You can hibernate your OpenShift Container Platform cluster for up to 90 days and expect it to resume successfully.
You must wait at least 24 hours after cluster installation before hibernating your cluster to allow for the first certification rotation.
If you must hibernate your cluster before the 24 hour certificate rotation, use the following procedure instead: Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs.
When hibernating a cluster, you must hibernate all cluster nodes. It is not supported to suspend only certain nodes.
After resuming, it can take up to 45 minutes for the cluster to become ready.
4.2. Prerequisites
Take an etcd backup prior to hibernating the cluster.
ImportantIt is important to take an etcd backup before hibernating so that your cluster can be restored if you encounter any issues when resuming the cluster.
For example, the following conditions can cause the resumed cluster to malfunction:
- etcd data corruption during hibernation
- Node failure due to hardware
- Network connectivity issues
If your cluster fails to recover, follow the steps to restore to a previous cluster state.
4.3. Hibernating a cluster
You can hibernate a cluster for up to 90 days. The cluster can recover if certificates expire while the cluster was in hibernation.
Prerequisites
The cluster has been running for at least 24 hours to allow the first certificate rotation to complete.
ImportantIf you must hibernate your cluster before the 24 hour certificate rotation, use the following procedure instead: Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs.
- You have taken an etcd backup.
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
- Confirm that your cluster has been installed for at least 24 hours.
Ensure that all nodes are in a good state by running the following command:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION ci-ln-812tb4k-72292-8bcj7-master-0 Ready control-plane,master 32m v1.31.3 ci-ln-812tb4k-72292-8bcj7-master-1 Ready control-plane,master 32m v1.31.3 ci-ln-812tb4k-72292-8bcj7-master-2 Ready control-plane,master 32m v1.31.3 Ci-ln-812tb4k-72292-8bcj7-worker-a-zhdvk Ready worker 19m v1.31.3 ci-ln-812tb4k-72292-8bcj7-worker-b-9hrmv Ready worker 19m v1.31.3 ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 Ready worker 19m v1.31.3
All nodes should show
Ready
in theSTATUS
column.Ensure that all cluster Operators are in a good state by running the following command:
$ oc get clusteroperators
Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.18.0-0 True False False 51m baremetal 4.18.0-0 True False False 72m cloud-controller-manager 4.18.0-0 True False False 75m cloud-credential 4.18.0-0 True False False 77m cluster-api 4.18.0-0 True False False 42m cluster-autoscaler 4.18.0-0 True False False 72m config-operator 4.18.0-0 True False False 72m console 4.18.0-0 True False False 55m ...
All cluster Operators should show
AVAILABLE
=True
,PROGRESSING
=False
, andDEGRADED
=False
.Ensure that all machine config pools are in a good state by running the following command:
$ oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-87871f187930e67233c837e1d07f49c7 True False False 3 3 3 0 96m worker rendered-worker-3c4c459dc5d90017983d7e72928b8aed True False False 3 3 3 0 96m
All machine config pools should show
UPDATING
=False
andDEGRADED
=False
.Stop the cluster virtual machines:
Use the tools native to your cluster’s cloud environment to shut down the cluster’s virtual machines.
ImportantIf you use a bastion virtual machine, do not shut down this virtual machine.
Additional resources
4.4. Resuming a hibernated cluster
When you resume a hibernated cluster within 90 days, you might have to approve certificate signing requests (CSRs) for the nodes to become ready.
It can take around 45 minutes for the cluster to resume, depending on the size of your cluster.
Prerequisites
- You hibernated your cluster less than 90 days ago.
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Within 90 days of cluster hibernation, resume the cluster virtual machines:
Use the tools native to your cluster’s cloud environment to resume the cluster’s virtual machines.
- Wait about 5 minutes, depending on the number of nodes in your cluster.
Approve CSRs for the nodes:
Check that there is a CSR for each node in the
NotReady
state:$ oc get csr
Example output
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-4dwsd 37m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 24h Pending csr-4vrbr 49m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-master-1 24h Pending csr-4wk5x 51m kubernetes.io/kubelet-serving system:node:ci-ln-812tb4k-72292-8bcj7-master-1 <none> Pending csr-84vb6 51m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending
Approve each valid CSR by running the following command:
$ oc adm certificate approve <csr_name>
Verify that all necessary CSRs were approved by running the following command:
$ oc get csr
Example output
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-4dwsd 37m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 24h Approved,Issued csr-4vrbr 49m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-master-1 24h Approved,Issued csr-4wk5x 51m kubernetes.io/kubelet-serving system:node:ci-ln-812tb4k-72292-8bcj7-master-1 <none> Approved,Issued csr-84vb6 51m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued
CSRs should show
Approved,Issued
in theCONDITION
column.
Verify that all nodes now show as ready by running the following command:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION ci-ln-812tb4k-72292-8bcj7-master-0 Ready control-plane,master 32m v1.31.3 ci-ln-812tb4k-72292-8bcj7-master-1 Ready control-plane,master 32m v1.31.3 ci-ln-812tb4k-72292-8bcj7-master-2 Ready control-plane,master 32m v1.31.3 Ci-ln-812tb4k-72292-8bcj7-worker-a-zhdvk Ready worker 19m v1.31.3 ci-ln-812tb4k-72292-8bcj7-worker-b-9hrmv Ready worker 19m v1.31.3 ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 Ready worker 19m v1.31.3
All nodes should show
Ready
in theSTATUS
column. It might take a few minutes for all nodes to become ready after approving the CSRs.Wait for cluster Operators to restart to load the new certificates.
This might take 5 or 10 minutes.
Verify that all cluster Operators are in a good state by running the following command:
$ oc get clusteroperators
Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.18.0-0 True False False 51m baremetal 4.18.0-0 True False False 72m cloud-controller-manager 4.18.0-0 True False False 75m cloud-credential 4.18.0-0 True False False 77m cluster-api 4.18.0-0 True False False 42m cluster-autoscaler 4.18.0-0 True False False 72m config-operator 4.18.0-0 True False False 72m console 4.18.0-0 True False False 55m ...
All cluster Operators should show
AVAILABLE
=True
,PROGRESSING
=False
, andDEGRADED
=False
.