Chapter 2. How Instance HA Works
OpenStack uses Instance HA to automate the process of evacuating instances from a Compute node when that node fails. The following procedure describes the sequence of events that are triggered when a Compute node fails.
-
At the time of failure, the
IPMI
agent performs first-layer fencing and physically resets the node to ensure that it is powered off. Evacuating instances from online Compute nodes might result in data corruption or in multiple identical instances running on the overcloud. When the node is powered off, it is considered fenced. After the physical IPMI fencing, the
fence-nova
agent performs second-layer fencing and marks the fenced node with the“evacuate=yes”
cluster per-node attribute. To do this, the agent runs the following command:$ attrd_updater -n evacuate -A name="evacuate" host="FAILEDHOST" value="yes"
Where FAILEDHOST is the hostname of the failed Compute node.
-
The
nova-evacuate
agent continually runs in the background, periodically checking the cluster for nodes with the“evacuate=yes”
attribute. Whennova-evacuate
detects that the fenced node contains this attribute, the agent starts evacuating the node using the process described in Evacuate Instances. -
While the failed node boots up from the IPMI reset, the
nova-compute
process on that node starts automatically. Because the node was fenced earlier, it does not run any new instances until Pacemaker unfences it. -
When Pacemaker sees that the Compute node is online again, it tries to start the
compute-unfence-trigger
resource on the node, reverting the force-down API call and setting the node as enabled again.
2.1. Designating specific instances to be evacuated
By default, all instances are to be evacuated, but it is also possible to tag images or flavors for evacuation.
To tag an image:
$ openstack image set --tag evacuable ID-OF-THE-IMAGE
To tag a flavor:
$ nova flavor-key ID-OF-THE-FLAVOR set evacuable=true