Chapter 14. Configuring Multi-Site Clusters with Pacemaker
When a cluster spans more than one site, issues with network connectivity between the sites can lead to split-brain situations. When connectivity drops, there is no way for a node on one site to determine whether a node on another site has failed or is still functioning with a failed site interlink. In addition, it can be problematic to provide high availability services across two sites which are too far apart to keep synchronous.
To address these issues, Red Hat Enterprise Linux release 7.4 provides full support for the ability to configure high availability clusters that span multiple sites through the use of a Booth cluster ticket manager. The Booth ticket manager is a distributed service that is meant to be run on a different physical network than the networks that connect the cluster nodes at particular sites. It yields another, loose cluster, a Booth formation, that sits on top of the regular clusters at the sites. This aggregated communication layer facilitates consensus-based decision processes for individual Booth tickets.
A Booth ticket is a singleton in the Booth formation and represents a time-sensitive, movable unit of authorization. Resources can be configured to require a certain ticket to run. This can ensure that resources are run at only one site at a time, for which a ticket or tickets have been granted.
You can think of a Booth formation as an overlay cluster consisting of clusters running at different sites, where all the original clusters are independent of each other. It is the Booth service which communicates to the clusters whether they have been granted a ticket, and it is Pacemaker that determines whether to run resources in a cluster based on a Pacemaker ticket constraint. This means that when using the ticket manager, each of the clusters can run its own resources as well as shared resources. For example there can be resources A, B and C running only in one cluster, resources D, E, and F running only in the other cluster, and resources G and H running in either of the two clusters as determined by a ticket. It is also possible to have an additional resource J that could run in either of the two clusters as determined by a separate ticket.
The following procedure provides an outline of the steps you follow to configure a multi-site configuration that uses the Booth ticket manager.
These example commands use the following arrangement:
- Cluster 1 consists of the nodes
cluster1-node1
andcluster1-node2
- Cluster 1 has a floating IP address assigned to it of 192.168.11.100
- Cluster 2 consists of
cluster2-node1
andcluster2-node2
- Cluster 2 has a floating IP address assigned to it of 192.168.22.100
- The arbitrator node is
arbitrator-node
with an ip address of 192.168.99.100 - The name of the Booth ticket that this configuration uses is
apacheticket
These example commands assume that the cluster resources for an Apache service have been configured as part of the resource group
apachegroup
for each cluster. It is not required that the resources and resource groups be the same on each cluster to configure a ticket constraint for those resources, since the Pacemaker instance for each cluster is independent, but that is a common failover scenario.
For a full cluster configuration procedure that configures an Apache service in a cluster, see the example in High Availability Add-On Administration.
Note that at any time in the configuration procedure you can enter the
pcs booth config
command to display the booth configuration for the current node or cluster or the pcs booth status
command to display the current status of booth on the local node.
- Install the
booth-site
Booth ticket manager package on each node of both clusters.[root@cluster1-node1 ~]#
yum install -y booth-site
[root@cluster1-node2 ~]#yum install -y booth-site
[root@cluster2-node1 ~]#yum install -y booth-site
[root@cluster2-node2 ~]#yum install -y booth-site
- Install the
pcs
,booth-core
, andbooth-arbitrator
packages on the arbitrator node.[root@arbitrator-node ~]#
yum install -y pcs booth-core booth-arbitrator
- Ensure that ports 9929/tcp and 9929/udp are open on all cluster nodes and on the arbitrator node.For example, running the following commands on all nodes in both clusters as well as on the arbitrator node allows access to ports 9929/tcp and 9929/udp on those nodes.
#
firewall-cmd --add-port=9929/udp
#firewall-cmd --add-port=9929/tcp
#firewall-cmd --add-port=9929/udp --permanent
#firewall-cmd --add-port=9929/tcp --permanent
Note that this procedure in itself allows any machine anywhere to access port 9929 on the nodes. You should ensure that on your site the nodes are open only to the nodes that require them. - Create a Booth configuration on one node of one cluster. The addresses you specify for each cluster and for the arbitrator must be IP addresses. For each cluster, you specify a floating IP address.
[cluster1-node1 ~] #
pcs booth setup sites 192.168.11.100 192.168.22.100 arbitrators 192.168.99.100
This command creates the configuration files/etc/booth/booth.conf
and/etc/booth/booth.key
on the node from which it is run. - Create a ticket for the Booth configuration. This is the ticket that you will use to define the resource constraint that will allow resources to run only when this ticket has been granted to the cluster.This basic failover configuration procedure uses only one ticket, but you can create additional tickets for more complicated scenarios where each ticket is associated with a different resource or resources.
[cluster1-node1 ~] #
pcs booth ticket add apacheticket
- Synchronize the Booth configuration to all nodes in the current cluster.
[cluster1-node1 ~] #
pcs booth sync
- From the arbitrator node, pull the Booth configuration to the arbitrator. If you have not previously done so, you must first authenticate
pcs
to the node from which you are pulling the configuration.[arbitrator-node ~] #
pcs cluster auth cluster1-node1
[arbitrator-node ~] #pcs booth pull cluster1-node1
- Pull the Booth configuration to the other cluster and synchronize to all the nodes of that cluster. As with the arbitrator node, if you have not previously done so, you must first authenticate
pcs
to the node from which you are pulling the configuration.[cluster2-node1 ~] #
pcs cluster auth cluster1-node1
[cluster2-node1 ~] #pcs booth pull cluster1-node1
[cluster2-node1 ~] #pcs booth sync
- Start and enable Booth on the arbitrator.
Note
You must not manually start or enable Booth on any of the nodes of the clusters since Booth runs as a Pacemaker resource in those clusters.[arbitrator-node ~] #
pcs booth start
[arbitrator-node ~] #pcs booth enable
- Configure Booth to run as a cluster resource on both cluster sites. This creates a resource group with
booth-ip
andbooth-service
as members of that group.[cluster1-node1 ~] #
pcs booth create ip 192.168.11.100
[cluster2-node1 ~] #pcs booth create ip 192.168.22.100
- Add a ticket constraint to the resource group you have defined for each cluster.
[cluster1-node1 ~] #
pcs constraint ticket add apacheticket apachegroup
[cluster2-node1 ~] #pcs constraint ticket add apacheticket apachegroup
You can enter the following command to display the currently configured ticket constraints.pcs constraint ticket [show]
- Grant the ticket you created for this setup to the first cluster.Note that it is not necessary to have defined ticket constraints before granting a ticket. Once you have initially granted a ticket to a cluster, then Booth takes over ticket management unless you override this manually with the
pcs booth ticket revoke
command. For information on thepcs booth
administration commands, see the PCS help screen for thepcs booth
command.[cluster1-node1 ~] #
pcs booth ticket grant apacheticket
It is possible to add or remove tickets at any time, even after completing this procedure. After adding or removing a ticket, however, you must synchronize the configuration files to the other nodes and clusters as well as to the arbitrator and grant the ticket as is shown in this procedure.
For information on additional Booth administration commands that you can use for cleaning up and removing Booth configuration files, tickets, and resources, see the PCS help screen for the
pcs booth
command.