Chapter 5. Rebooting the overcloud
After a minor Red Hat OpenStack version update, reboot your overcloud. The reboot refreshes the nodes with any associated kernel, system-level, and container component updates. These updates may provide performance and security benefits.
Plan downtime to perform the following reboot procedures.
5.1. Rebooting Controller and composable nodes
Complete the following steps to reboot Controller nodes and standalone nodes based on composable roles, excluding Compute nodes and Ceph Storage nodes.
Procedure
- Log in to the node that you want to reboot.
Optional: If the node uses Pacemaker resources, stop the cluster:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo pcs cluster stop
[heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster stop
Reboot the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo reboot
[heat-admin@overcloud-controller-0 ~]$ sudo reboot
- Wait until the node boots.
Check the services. For example:
If the node uses Pacemaker services, check that the node has rejoined the cluster:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo pcs status
[heat-admin@overcloud-controller-0 ~]$ sudo pcs status
If the node uses Systemd services, check that all services are enabled:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo systemctl status
[heat-admin@overcloud-controller-0 ~]$ sudo systemctl status
If the node uses containerized services, check that all containers on the node are active:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo podman ps
[heat-admin@overcloud-controller-0 ~]$ sudo podman ps
5.2. Rebooting a Ceph Storage (OSD) cluster
Complete the following steps to reboot a cluster of Ceph Storage (OSD) nodes.
Prerequisites
On a Ceph Monitor or Controller node that is running the
ceph-mon
service, check that the Red Hat Ceph Storage cluster status is healthy and the pg status isactive+clean
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo podman exec -it ceph-mon-controller-0 ceph -s
$ sudo podman exec -it ceph-mon-controller-0 ceph -s
If the Ceph cluster is healthy, it returns a status of
HEALTH_OK
.If the Ceph cluster status is unhealthy, it returns a status of
HEALTH_WARN
orHEALTH_ERR
. For troubleshooting guidance, see the Red Hat Ceph Storage 4 Troubleshooting Guide.
Procedure
Log in to a Ceph Monitor or Controller node that is running the
ceph-mon
service, and disable Ceph Storage cluster rebalancing temporarily:Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo podman exec -it ceph-mon-controller-0 ceph osd set noout sudo podman exec -it ceph-mon-controller-0 ceph osd set norebalance
$ sudo podman exec -it ceph-mon-controller-0 ceph osd set noout $ sudo podman exec -it ceph-mon-controller-0 ceph osd set norebalance
NoteIf you have a multistack or distributed compute node (DCN) architecture, you must specify the cluster name when you set the
noout
andnorebalance
flags. For example:sudo podman exec -it ceph-mon-controller-0 ceph osd set noout --cluster <cluster_name>
- Select the first Ceph Storage node that you want to reboot and log in to the node.
Reboot the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo reboot
$ sudo reboot
- Wait until the node boots.
Log into the node and check the cluster status:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo podman exec -it ceph-mon-controller-0 ceph status
$ sudo podman exec -it ceph-mon-controller-0 ceph status
Check that the
pgmap
reports allpgs
as normal (active+clean
).- Log out of the node, reboot the next node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
When complete, log in to a Ceph Monitor or Controller node that is running the
ceph-mon
service, and re-enable cluster rebalancing:Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo podman exec -it ceph-mon-controller-0 ceph osd unset noout sudo podman exec -it ceph-mon-controller-0 ceph osd unset norebalance
$ sudo podman exec -it ceph-mon-controller-0 ceph osd unset noout $ sudo podman exec -it ceph-mon-controller-0 ceph osd unset norebalance
NoteIf you have a multistack or distributed compute node (DCN) architecture, you must specify the cluster name when you unset the
noout
andnorebalance
flags. For example:sudo podman exec -it ceph-mon-controller-0 ceph osd set noout --cluster <cluster_name>
Perform a final status check to verify that the cluster reports
HEALTH_OK
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo podman exec -it ceph-mon-controller-0 ceph status
$ sudo podman exec -it ceph-mon-controller-0 ceph status
5.3. Rebooting Compute nodes
Complete the following steps to reboot Compute nodes. To ensure minimal downtime of instances in your Red Hat OpenStack Platform environment, this procedure also includes instructions about migrating instances from the Compute node that you want to reboot. This involves the following workflow:
- Decide whether to migrate instances to another Compute node before rebooting the node.
- Select and disable the Compute node you want to reboot so that it does not provision new instances.
- Migrate the instances to another Compute node.
- Reboot the empty Compute node.
- Enable the empty Compute node.
Prerequisites
Before you reboot the Compute node, you must decide whether to migrate instances to another Compute node while the node is rebooting.
Review the list of migration constraints that you might run into when migrating virtual machine instances between Compute nodes. For more information, see Migration constraints in Configuring the Compute Service for Instance Creation.
If you cannot migrate the instances, you can set the following core template parameters to control the state of the instances after the Compute node reboots:
NovaResumeGuestsStateOnHostBoot
-
Determines whether to return instances to the same state on the Compute node after reboot. When set to
False
, the instances remain down and you must start them manually. Default value is:False
NovaResumeGuestsShutdownTimeout
-
Number of seconds to wait for an instance to shut down before rebooting. It is not recommended to set this value to
0
. Default value is: 300
For more information about overcloud parameters and their usage, see Overcloud Parameters.
Procedure
-
Log in to the undercloud as the
stack
user. List all Compute nodes and their UUIDs:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow source ~/stackrc
$ source ~/stackrc (undercloud) $ openstack server list --name compute
Identify the UUID of the Compute node that you want to reboot.
From the undercloud, select a Compute node. Disable the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow source ~/overcloudrc
$ source ~/overcloudrc (overcloud) $ openstack compute service list (overcloud) $ openstack compute service set <hostname> nova-compute --disable
List all instances on the Compute node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack server list --host <hostname> --all-projects
(overcloud) $ openstack server list --host <hostname> --all-projects
- If you decide not to migrate instances, skip to this step.
If you decide to migrate the instances to another Compute node, use one of the following commands:
Migrate the instance to a different host:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack server migrate <instance_id> --live <target_host> --wait
(overcloud) $ openstack server migrate <instance_id> --live <target_host> --wait
Let
nova-scheduler
automatically select the target host:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ nova live-migration <instance_id>
(overcloud) $ nova live-migration <instance_id>
Live migrate all instances at once:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nova host-evacuate-live <hostname>
$ nova host-evacuate-live <hostname>
NoteThe
nova
command might cause some deprecation warnings, which are safe to ignore.
- Wait until migration completes.
Confirm that the migration was successful:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack server list --host <hostname> --all-projects
(overcloud) $ openstack server list --host <hostname> --all-projects
- Continue to migrate instances until none remain on the chosen Compute node.
Log in to the Compute node and reboot the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo reboot
[heat-admin@overcloud-compute-0 ~]$ sudo reboot
- Wait until the node boots.
Re-enable the Compute node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow source ~/overcloudrc
$ source ~/overcloudrc (overcloud) $ openstack compute service set <hostname> nova-compute --enable
Check that the Compute node is enabled:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack compute service list
(overcloud) $ openstack compute service list