이 콘텐츠는 선택한 언어로 제공되지 않습니다.
High Availability for Compute Instances
Configure High Availability for Compute Instances
Abstract
Making open source more inclusive 링크 복사링크가 클립보드에 복사되었습니다!
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.
Chapter 1. Overview 링크 복사링크가 클립보드에 복사되었습니다!
This guide describes how to manage Instance High Availability (Instance HA). Instance HA allows Red Hat OpenStack Platform to automatically evacuate and re-spawn instances on a different Compute node when their host Compute node fails.
The evacuation process that is triggered by Instance HA is similar to what users can do manually, as described in Evacuate Instances.
Instance HA works on shared storage or local storage environments, which means that evacuated instances maintain the same network configuration (static IP, floating IP, and so on) and the same characteristics inside the new host, even if they are spawned from scratch.
Instance HA is managed by the following resource agents:
| Agent name | Name inside cluster | Role |
|---|---|---|
|
|
| Marks a Compute node for evacuation when the node becomes unavailable. |
|
|
| Evacuates instances from failed nodes. This agent runs on one of the Controller nodes. |
|
|
| Releases a fenced node and enables the node to run instances again. |
Chapter 2. How Instance HA Works 링크 복사링크가 클립보드에 복사되었습니다!
OpenStack uses Instance HA to automate the process of evacuating instances from a Compute node when that node fails. The following procedure describes the sequence of events that are triggered when a Compute node fails.
-
At the time of failure, the
IPMIagent performs first-layer fencing and physically resets the node to ensure that it is powered off. Evacuating instances from online Compute nodes might result in data corruption or in multiple identical instances running on the overcloud. When the node is powered off, it is considered fenced. After the physical IPMI fencing, the
fence-novaagent performs second-layer fencing and marks the fenced node with the“evacuate=yes”cluster per-node attribute. To do this, the agent runs the following command:attrd_updater -n evacuate -A name="evacuate" host="FAILEDHOST" value="yes"
$ attrd_updater -n evacuate -A name="evacuate" host="FAILEDHOST" value="yes"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Where FAILEDHOST is the hostname of the failed Compute node.
-
The
nova-evacuateagent continually runs in the background, periodically checking the cluster for nodes with the“evacuate=yes”attribute. Whennova-evacuatedetects that the fenced node contains this attribute, the agent starts evacuating the node using the process described in Evacuate Instances. -
While the failed node boots up from the IPMI reset, the
nova-computeprocess on that node starts automatically. Because the node was fenced earlier, it does not run any new instances until Pacemaker unfences it. -
When Pacemaker sees that the Compute node is online again, it tries to start the
compute-unfence-triggerresource on the node, reverting the force-down API call and setting the node as enabled again.
2.1. Designating specific instances to be evacuated 링크 복사링크가 클립보드에 복사되었습니다!
By default, all instances are to be evacuated, but it is also possible to tag images or flavors for evacuation.
To tag an image:
openstack image set --tag evacuable ID-OF-THE-IMAGE
$ openstack image set --tag evacuable ID-OF-THE-IMAGE
To tag a flavor:
nova flavor-key ID-OF-THE-FLAVOR set evacuable=true
$ nova flavor-key ID-OF-THE-FLAVOR set evacuable=true
Chapter 3. Installing and configuring Instance HA 링크 복사링크가 클립보드에 복사되었습니다!
Instance HA is deployed and configured with the director. However, there are a few additional steps that you need to perform to prepare for the deployment.
This section includes all the steps needed to configure a new Instance HA deployment on a new overcloud with the goal of enabling Instance HA on a subset of Compute nodes with a custom role.
- If you want to enable instance HA in a different environment, such as an existing overcloud using either standard or custom roles, follow only the procedures that are relevant to your deployment and adapt your templates accordingly.
- Compute node host names and Pacemaker remote resource names must comply with the W3C naming conventions. For more information, see Declaring Namespaces and Names and Tokens in the W3C documentation.
-
Instance HA requires Compute nodes to use the same
internal_apinetwork that Controller nodes use. Therefore, deploying Instance HA in a Spine-Leaf environment is not supported, because that deployment requires a separate network for each leaf. - Disabling Instance HA with the director after installation is not supported. For a workaround to manually remove Instance HA components from your deployment, see the article How can I remove Instance HA components from the controller nodes? . This workaround is provided as-is. You must verify the procedure in a test environment before implementing in production.
For general information on deploying the overcloud, see the Director Installation and Usage guide. For information on custom roles, see Composable Services and Custom Roles.
Configure the Instance HA role, flavor, and profile
Add the
ComputeInstanceHArole to your roles data file and regenerate the file. For example:openstack overcloud roles generate -o ~/my_roles_data.yaml Controller Compute ComputeInstanceHA
$ openstack overcloud roles generate -o ~/my_roles_data.yaml Controller Compute ComputeInstanceHACopy to Clipboard Copied! Toggle word wrap Toggle overflow The
ComputeInstanceHArole includes all the services in the defaultComputerole as well as theComputeInstanceHAand thePacemakerRemoteservices. For general information about custom roles and about the roles-data.yaml, see the Roles section in the Advanced Overcloud Customization guide.Create the
compute-instance-haflavor to tag Compute nodes that you want to designate for Instance HA. For example:source ~/stackrc openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 compute-instance-ha openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-instance-ha" compute-instance-ha
$ source ~/stackrc $ openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 compute-instance-ha $ openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-instance-ha" compute-instance-haCopy to Clipboard Copied! Toggle word wrap Toggle overflow Tag each Compute node that you want to designate for Instance HA with the
compute-instance-haprofile.openstack baremetal node set --property capabilities='profile:compute-instance-ha,boot_option:local' <NODE UUID>
$ openstack baremetal node set --property capabilities='profile:compute-instance-ha,boot_option:local' <NODE UUID>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Map the
ComputeInstanceHArole to thecompute-instance-haflavor by creating an environment file with the following content:parameter_defaults: OvercloudComputeInstanceHAFlavor: compute-instance-ha
parameter_defaults: OvercloudComputeInstanceHAFlavor: compute-instance-haCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Enable fencing
Enable fencing on all Controller and Compute nodes in the overcloud by creating an environment file with fencing information. Make sure to create the environment file in an accessible location, such as ~/templates. For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For more information about fencing and STONITH configuration, see the Fencing Controller Nodes with STONITH section of the High Availability Deployment and Usage guide.
Instance HA uses shared storage by default. If shared storage is not configured for your Compute instance, then add the following parameter to the environment file that you created in the previous step:
parameter_defaults: ExtraConfig: tripleo::instanceha::no_shared_storage: trueparameter_defaults: ExtraConfig: tripleo::instanceha::no_shared_storage: trueCopy to Clipboard Copied! Toggle word wrap Toggle overflow See Section 3.1, “Considerations for Shared Storage” for details on how to boot from an OpenStack Block Storage (cinder) volume rather than configuring shared storage for the disk image of instances.
Deploy the overcloud
Run the openstack overcloud deploy command with the -e option for each environment file that you created, as well as the compute-instanceha.yaml environment file. For example:
openstack overcloud deploy --templates \ -e <FLAVOR_ENV_FILE> \ -e <FENCING_ENV_FILE> \ -e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml
$ openstack overcloud deploy --templates \
-e <FLAVOR_ENV_FILE> \
-e <FENCING_ENV_FILE> \
-e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml
- Do not modify the compute-instanceha.yaml environment file.
- Make sure to include the path to each environment file that you want to include in the overcloud deployment.
-
You can configure Instance HA for your overcloud at any time after creating the undercloud. If you already deployed the overcloud, you need to rerun the
overcloud deploycommand with the new Instance HA files.
After the deployment is complete, each Compute node will include a STONITH device and a GuestNode service.
Chapter 4. Testing Evacuation with Instance HA 링크 복사링크가 클립보드에 복사되었습니다!
The following procedure involves deliberately crashing a Compute node. Doing this forces the automated evacuation of instances through Instance HA.
Boot one or more instances on the overcloud before crashing the Compute node that hosts the instances to test.
stack@director $ . overcloudrc stack@director $ nova boot --image cirros --flavor 2 test-failover stack@director $ nova list --fields name,status,host
stack@director $ . overcloudrc stack@director $ nova boot --image cirros --flavor 2 test-failover stack@director $ nova list --fields name,status,hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow Log in to the Compute node that hosts the instances and change to the
rootuser. Replacecompute-nwith the name of the Compute node:stack@director $ . stackrc stack@director $ ssh -l heat-admin compute-n heat-admin@compute-n $ su -
stack@director $ . stackrc stack@director $ ssh -l heat-admin compute-n heat-admin@compute-n $ su -Copy to Clipboard Copied! Toggle word wrap Toggle overflow Crash the Compute node.
root@compute-n $ echo c > /proc/sysrq-trigger
root@compute-n $ echo c > /proc/sysrq-triggerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait a few minutes and then verify that these instances re-spawned on another Compute nodes.
stack@director $ nova list --fields name,status,host stack@director $ nova service-list
stack@director $ nova list --fields name,status,host stack@director $ nova service-listCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 5. Disabling Instance HA from earlier versions 링크 복사링크가 클립보드에 복사되었습니다!
If you upgrade to Red Hat OpenStack Platform 13 from earlier versions, you must manually disable Instance HA before you upgrade. This includes major and minor upgrades, as well as fast-forward upgrades.
From Red Hat OpenStack Platform 13 and later, Instance HA is installed and upgraded with the director. No manual rollback is required if you upgrade from version 13 to a later version.
Disable Instance HA without STONITH devices
To disable Instance HA that was deployed without STONITH devices, run the following command as the stack user on the undercloud:
stack@director $ ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \ -e release="[rhos-NN]" -e instance_ha_action="uninstall"
stack@director $ ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \
-e release="[rhos-NN]" -e instance_ha_action="uninstall"
Replace the value of the "[rhos-NN]" field with the actual version of OpenStack Platform. For example: "rhos-10"
Disable Instance HA with STONITH devices
If you deployed Instance HA with the stonith_devices option, you need to specify this option when you disable Instance HA. For example, if your Instance HA configuration excludes STONITH devices, use the following command syntax:
stack@director $ ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \ -e release="[rhos-NN]" -e instance_ha_action="uninstall" -e stonith_devices=”none”
stack@director $ ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \
-e release="[rhos-NN]" -e instance_ha_action="uninstall" -e stonith_devices=”none”
Replace the value of the "[rhos-NN]" field with the actual version of OpenStack Platform. For example: "rhos-10"