Chapter 4. Managing high availability services with Pacemaker
此内容没有您所选择的语言版本。
Chapter 4. Managing high availability services with Pacemaker
The Pacemaker service manages core container and active-passive services, such as Galera, RabbitMQ, Redis, and HAProxy. You use Pacemaker to view and manage general information about the managed services, virtual IP addresses, power management, and fencing.
Pacemaker manages Red Hat OpenStack Platform (RHOSP) services as Bundle Set resources, or bundles. Most of these services are active-active services that start in the same way and always run on each Controller node.
Pacemaker manages the following resource types:
Bundle
A bundle resource configures and replicates the same container on all Controller nodes, maps the necessary storage paths to the container directories, and sets specific attributes related to the resource itself.
Container
A container can run different kinds of resources, from simple systemd services like HAProxy to complex services like Galera, which requires specific resource agents that control and set the state of the service on the different nodes.
Important
You cannot use podman or systemctl to manage bundles or containers. You can use the commands to check the status of the services, but you must use Pacemaker to perform actions on these services.
Podman containers that Pacemaker controls have a RestartPolicy set to no by Podman. This is to ensure that Pacemaker, and not Podman, controls the container start and stop actions.
Simple Bundle Set resources (simple bundles)
A simple Bundle Set resource, or simple bundle, is a set of containers that each include the same Pacemaker services that you want to deploy across the Controller nodes.
The following example shows a list of simple bundles from the output of the pcs status command:
Podman container set: haproxy-bundle [192.168.24.1:8787/rhosp-rhel8/openstack-haproxy:pcmklatest]
haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0
haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started overcloud-controller-1
haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started overcloud-controller-2
Podman container set: haproxy-bundle [192.168.24.1:8787/rhosp-rhel8/openstack-haproxy:pcmklatest]
haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0
haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started overcloud-controller-1
haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started overcloud-controller-2
Copy to ClipboardCopied!Toggle word wrapToggle overflow
For each bundle, you can see the following details:
The name that Pacemaker assigns to the service.
The reference to the container that is associated with the bundle.
The list and status of replicas that are running on the different Controller nodes.
The following example shows the settings for the haproxy-bundle simple bundle:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The example shows the following information about the containers in the bundle:
image: Image used by the container, which refers to the local registry of the undercloud.
network: Container network type, which is "host" in the example.
options: Specific options for the container.
replicas: Indicates how many copies of the container must run in the cluster. Each bundle includes three containers, one for each Controller node.
run-command: System command used to spawn the container.
Storage Mapping: Mapping of the local path on each host to the container. To check the haproxy configuration from the host, open the /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg file instead of the /etc/haproxy/haproxy.cfg file.
Note
Although HAProxy provides high availability services by load balancing traffic to selected services, you configure HAProxy as a highly available service by managing it as a Pacemaker bundle service.
Complex Bundle Set resources (complex bundles)
Complex Bundle Set resources, or complex bundles, are Pacemaker services that specify a resource configuration in addition to the basic container configuration that is included in simple bundles.
This configuration is needed to manage Multi-State resources, which are services that can have different states depending on the Controller node they run on.
This example shows a list of complex bundles from the output of the pcs status command:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
This output shows the following information about each complex bundle:
RabbitMQ: All three Controller nodes run a standalone instance of the service, similar to a simple bundle.
Galera: All three Controller nodes are running as Galera masters under the same constraints.
Redis: The overcloud-controller-0 container is running as the master, while the other two Controller nodes are running as slaves. Each container type might run under different constraints.
The following example shows the settings for the galera-bundle complex bundle:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
This output shows that, unlike in a simple bundle, the galera-bundle resource includes explicit resource configuration that determines all aspects of the multi-state resource.
Note
Although a service can run on multiple Controller nodes at the same time, the Controller node itself might not be listening at the IP address that is required to reach those services. For information about how to check the IP address of a service, see Section 4.4, “Viewing virtual IP addresses”.
To view general Pacemaker information, use the pcs status command.
Procedure
Log in to any Controller node as the heat-admin user.
ssh heat-admin@overcloud-controller-0
$ ssh heat-admin@overcloud-controller-0
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Run the pcs status command:
[heat-admin@overcloud-controller-0 ~] $ sudo pcs status
[heat-admin@overcloud-controller-0 ~] $ sudo pcs status
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Example output:
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-1 (version 2.0.1-4.el8-0eb7991564) - partition with quorum
Last updated: Thu Feb 8 14:29:21 2018
Last change: Sat Feb 3 11:37:17 2018 by root via cibadmin on overcloud-controller-2
12 nodes configured
37 resources configured
Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
GuestOnline: [ galera-bundle-0@overcloud-controller-0 galera-bundle-1@overcloud-controller-1 galera-bundle-2@overcloud-controller-2 rabbitmq-bundle-0@overcloud-controller-0 rabbitmq-bundle-1@overcloud-controller-1 rabbitmq-bundle-2@overcloud-controller-2 redis-bundle-0@overcloud-controller-0 redis-bundle-1@overcloud-controller-1 redis-bundle-2@overcloud-controller-2 ]
Full list of resources:
[...]
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-1 (version 2.0.1-4.el8-0eb7991564) - partition with quorum
Last updated: Thu Feb 8 14:29:21 2018
Last change: Sat Feb 3 11:37:17 2018 by root via cibadmin on overcloud-controller-2
12 nodes configured
37 resources configured
Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
GuestOnline: [ galera-bundle-0@overcloud-controller-0 galera-bundle-1@overcloud-controller-1 galera-bundle-2@overcloud-controller-2 rabbitmq-bundle-0@overcloud-controller-0 rabbitmq-bundle-1@overcloud-controller-1 rabbitmq-bundle-2@overcloud-controller-2 redis-bundle-0@overcloud-controller-0 redis-bundle-1@overcloud-controller-1 redis-bundle-2@overcloud-controller-2 ]
Full list of resources:
[...]
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The main sections of the output show the following information about the cluster:
Cluster name: Name of the cluster.
[NUM] nodes configured: Number of nodes that are configured for the cluster.
[NUM] resources configured: Number of resources that are configured for the cluster.
Online: Names of the Controller nodes that are currently online.
GuestOnline: Names of the guest nodes that are currently online. Each guest node consists of a complex Bundle Set resource. For more information about bundle sets, see Section 4.1, “Resource bundles and containers”.
Each IPaddr2 resource sets a virtual IP address that clients use to request access to a service. If the Controller node with that IP address fails, the IPaddr2 resource reassigns the IP address to a different Controller node.
Show all virtual IP addresses
Run the pcs resource show command with the --full option to display all resources that use the VirtualIP type:
sudo pcs resource show --full
$ sudo pcs resource show --full
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The following example output shows each Controller node that is currently set to listen to a particular virtual IP address:
ip-10.200.0.6 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-192.168.1.150 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.16.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.16.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.18.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
ip-172.19.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
ip-10.200.0.6 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-192.168.1.150 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.16.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.16.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.18.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
ip-172.19.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Each IP address is initially attached to a specific Controller node. For example, 192.168.1.150 is started on overcloud-controller-0. However, if that Controller node fails, the IP address is reassigned to other Controller nodes in the cluster.
The following table describes the IP addresses in the example output and shows the original allocation of each IP address.
Expand
Table 4.1. IP address description and allocation source
IP Address
Description
Allocated From
192.168.1.150
Public IP address
ExternalAllocationPools attribute in the network-environment.yaml file
10.200.0.6
Controller virtual IP address
Part of the dhcp_start and dhcp_end range set to 10.200.0.5-10.200.0.24 in the undercloud.conf file
172.16.0.10
Provides access to OpenStack API services on a Controller node
InternalApiAllocationPools in the network-environment.yaml file
172.18.0.10
Storage virtual IP address that provides access to the Glance API and to Swift Proxy services
StorageAllocationPools attribute in the network-environment.yaml file
172.16.0.11
Provides access to Redis service on a Controller node
InternalApiAllocationPools in the network-environment.yaml file
172.19.0.10
Provides access to storage management
StorageMgmtAlloctionPools in the network-environment.yaml file
View a specific IP address
Run the pcs resource show command.
sudo pcs resource show ip-192.168.1.150
$ sudo pcs resource show ip-192.168.1.150
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Note
Processes that are listening to all local addresses, such as 0.0.0.0, are also available through 192.168.1.150. These processes include sshd, mysqld, dhclient, ntpd.
View port number assignments
Open the /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg file to see default port number assignments.
The following example shows the port numbers and the services that they listen to:
TCP port 6080: nova_novncproxy
TCP port 9696: neutron
TCP port 8000: heat_cfn
TCP port 80: horizon
TCP port 8776: cinder
In this example, most services that are defined in the haproxy.cfg file listen to the 192.168.1.150 IP address on all three Controller nodes. However, only the controller-0 node is listening externally to the 192.168.1.150 IP address.
Therefore, if the controller-0 node fails, HAProxy only needs to re-assign 192.168.1.150 to another Controller node and all other services will already be running on the fallback Controller node.
The last sections of the pcs status output show information about your power management fencing, such as IPMI, and the status of the Pacemaker service itself:
my-ipmilan-for-controller-0 (stonith:fence_ipmilan): Started my-ipmilan-for-controller-0
my-ipmilan-for-controller-1 (stonith:fence_ipmilan): Started my-ipmilan-for-controller-1
my-ipmilan-for-controller-2 (stonith:fence_ipmilan): Started my-ipmilan-for-controller-2
PCSD Status:
overcloud-controller-0: Online
overcloud-controller-1: Online
overcloud-controller-2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0
pcsd: active/enabled
my-ipmilan-for-controller-0 (stonith:fence_ipmilan): Started my-ipmilan-for-controller-0
my-ipmilan-for-controller-1 (stonith:fence_ipmilan): Started my-ipmilan-for-controller-1
my-ipmilan-for-controller-2 (stonith:fence_ipmilan): Started my-ipmilan-for-controller-2
PCSD Status:
overcloud-controller-0: Online
overcloud-controller-1: Online
overcloud-controller-2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0
pcsd: active/enabled
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The my-ipmilan-for-controller settings show the type of fencing for each Controller node (stonith:fence_ipmilan) and whether or not the IPMI service is stopped or running. The PCSD Status shows that all three Controller nodes are currently online. The Pacemaker service consists of three daemons: corosync, pacemaker, and pcsd. In the example, all three services are active and enabled.
If one the Pacemaker resources fails, you can view the Failed Actions section of the pcs status output. In the following example, the openstack-cinder-volume service stopped working on controller-0:
Failed Actions:
* openstack-cinder-volume_monitor_60000 on overcloud-controller-0 'not running' (7): call=74, status=complete, exitreason='none',
last-rc-change='Wed Dec 14 08:33:14 2016', queued=0ms, exec=0ms
Failed Actions:
* openstack-cinder-volume_monitor_60000 on overcloud-controller-0 'not running' (7): call=74, status=complete, exitreason='none',
last-rc-change='Wed Dec 14 08:33:14 2016', queued=0ms, exec=0ms
Copy to ClipboardCopied!Toggle word wrapToggle overflow
In this case, you must enable the systemd service openstack-cinder-volume. In other cases, you might need to locate and fix the problem and then clean up the resources. For more information about troubleshooting resource problems, see Chapter 8, Troubleshooting resource problems.