Chapter 17. High Availability
AMQ Broker enables you to connect brokers as master and slave, where each master broker can have one or more slave brokers. Normally, only master brokers actively serve client requests. However, if a master broker and its clients are no longer able to communicate, for example, due to a power outage or similar failure on the master broker, a slave broker takes over for the master. This is referred to as failover. When a failover occurs, the slave broker is activated, clients reconnect to the slave broker and resume handling requests.
You can choose from two main high-availability (HA) policies. Each has different benefits and trade-offs:
- Replication continuously synchronizes the data from the master broker to the slave broker over the network.
- Shared-store uses a shared file system to store journal data for both master and slave brokers.
You can collocate master and slave brokers within the same Java runtime when using either the replication or shared-store policy.
There is also a third policy, called HA live-only, that provides a limited amount of HA when scaling down master brokers. This policy is useful during controlled shutdowns.
Only persistent message data survives a failover. Any non persistent message data is not be available after failover. See Persisting Messages for more information about how to persist your messages.
17.1. Replication and High Availability
When using replication as the HA policy for your cluster, all data synchronization occurs over the network. A slave broker first needs to synchronize all existing data from the master broker before becoming capable of replacing it. The time it takes for this to happen depends on the amount of data to be synchronized and the connection speed. In general, synchronization occurs in parallel with current network traffic, so this does not cause any blocking on clients. However, synchronization blocks journal-related operations. The maximum length of time that this exchange blocks journal-related operations is controlled by the initial-replication-sync-timeout
configuration element. There is a critical moment at the end of this process where the master broker must complete the synchronization and ensure the slave broker acknowledges the completion. After the initial synchronization phase is complete and the slave broker has all the current data from the master, then the master synchronizes data across the network as it receives it, to ensure that no data is lost if the master broker fails.
Figure 17.1. Replicated Store for High Availability
The advantage of using replication is that it doesn’t require a shared file system that is accessible by both the master and slave broker. The disadvantage is that performance can be reduced while data is replicated between a master broker and a slave broker.
The replicating master and slave pair must be part of a cluster. The cluster connection also defines how a slave broker identifies a master broker to which it can connect. See Clusters for details on how to configure a cluster connection.
Using replication across data centers is not currently recommended. In addition, your cluster should contain three or more master-slave pairs if you are using replication. This is to avoid the scenario sometimes referred to as "split-brain."
Within a cluster using data replication, there are two ways that a slave broker locates a master broker:
-
Connect to a group. A slave broker can be configured to connect only to a master broker that shares the same broker
group-name
. -
Connect to any live. The behavior if
group-name
is not configured. Slave brokers are free to connect to any master broker.
The slave searches for any master broker that it is configured to connect to. It then tries to replicate with each master broker in turn until it finds a master broker that has no current slave configured. If no master broker is available it waits until the cluster topology changes and repeats the process.
The slave broker does not know whether any data it might have is up to date, so it really cannot decide to activate automatically. To activate a replicating slave broker using the data it has, the administrator must change its configuration to make it a master broker.
When the master broker stops or crashes, its replicating slave becomes active and take over its duties. Specifically, the slave becomes active when it loses connection to its master broker. This can be problematic because a connection loss might be due to a temporary network problem. In order to address this issue, the slave tries to determine whether it can connect to the other brokers in the cluster. If it can connect to more than half the brokers, it becomes active. If it can connect to fewer than half, the slave does not become active but tries to reconnect with the master. This avoids a split-brain situation.
17.1.1. Configuring Replication
You configure brokers for replication by editing the broker.xml
configuration file. The default configuration values cover most use cases, making it easy to start using replication. You can also supply your own configuration for these values when needed, however. The appendix includes a table of the configuration elements you can add to broker.xml
when using replication HA.
Prerequisites
Master and slave brokers must form a cluster and use a cluster-connection
to communicate. See Clustering for more information on cluster connections.
Procedure
Configure a cluster of brokers to use the replication HA policy by modifying the main configuration file, BROKER_INSTANCE_DIR/etc/broker.xml
.
Configure the master broker to use replication for its HA policy.
<configuration> <core> ... <ha-policy> <replication> <master/> </replication> </ha-policy> ... </core> </configuration>
Configure the slave brokers in the same way, but use the
slave
element instead ofmaster
to denote their role in the cluster.<configuration> <core> ... <ha-policy> <replication> <slave/> </replication> </ha-policy> ... </core> </configuration>
Related Information
- See the appendix for a table of the configuration elements available when configuring master and slave brokers for replication.
-
For working examples demonstrating replication HA see the example Maven projects located under
INSTALL_DIR/examples/features/ha
.
17.1.2. Failing Back to the Master Broker
After a master broker has failed and a slave has taken over its duties, you might want to restart the master broker and have clients fail back to it.
In replication HA mode, you can configure a master broker so that at startup it searches the cluster for another broker using the same cluster node ID. If it finds one, the master attempts to synchronize its data with it. Once the data is synchronized, the master requests that the other broker shut down. The master broker then resumes its active role within the cluster.
Prerequisites
Configuring a master broker to fail back as described above requires the replication HA policy.
Procedure
To configure brokers to fail back to the original master, edit the BROKER_INSTANCE_DIR/etc/broker.xml
configuration file for the master and slave brokers as follows.
Add the
check-for-live-server
element and set its value totrue
to tell this broker to check if a slave has assumed the role of master.<configuration> <core> ... <ha-policy> <replication> <master> <check-for-live-server>true</check-for-live-server> ... </master> </replication> </ha-policy> ... </core> </configuration>
Add the
allow-failback
element to the slave broker(s) and set its value totrue
so that the slave fails back to the original master.<configuration> <core> ... <ha-policy> <replication> <slave> <allow-failback>true</allow-failback> ... </slave> </replication> </ha-policy> ... </core> </configuration>
Be aware that if you restart a master broker after failover has occurred, then the value for check-for-live-server
must be set to true
. Otherwise, the master broker restarts and process the same messages that the slave has already handled, causing duplicates.
17.1.3. Grouping Master and Slave Brokers
You can specify a group of master brokers that a slave broker can connect to. This is done by adding a group-name
configuration element to BROKER_INSTANCE_DIR/etc/broker.xml
. A slave broker connects only to a master broker that shares the same group-name
.
As an example of using group-name
, suppose you have five master brokers and six slave brokers. You could divide the brokers into two groups.
-
Master brokers
master1
,master2
, andmaster3
use agroup-name
offish
, whilemaster4
andmaster5
usebird
. -
Slave brokers
slave1
,slave2
,slave3
, andslave4
usefish
for theirgroup-name
, andslave5
andslave6
usebird
.
After joining the cluster, each slave with a group-name
of fish
searches for a master broker also assigned to fish
. Since there is one slave too many, the group has one spare slave that remains un-paired. Meanwhile, each slave assigned to bird
pairs with one of the master brokers in their group, master4
or master5
.
Prerequisites
Grouping brokers into HA groups requires you configure the brokers to use the replication HA policy.
Configuring a Broker Cluster to Use Groups
Configure a cluster of brokers to form groups of master and slave brokers by modifying the main configuration file, BROKER_INSTANCE_DIR/etc/broker.xml
.
Procedure
Configure the master broker to use the chosen
group-name
by adding it beneath themaster
configuration element. In the example below the master broker is assigned the group namefish
.<configuration> <core> ... <ha-policy> <replication> <master> <group-name>fish</group-name> ... </master> </replication> </ha-policy> ... </core> </configuration>
Configure the slave broker(s) in the same way, by adding the
group-name
element underslave
.<configuration> <core> ... <ha-policy> <replication> <slave> <group-name>fish</group-name> ... </slave> </replication> </ha-policy> ... </core> </configuration>
17.3. Colocating Slave Brokers
It is also possible to colocate slave brokers in the same JVM as a master broker. A master broker can be configured to request another master to start a slave broker that resides in its Java Virtual Machine. You can colocate slave brokers using either shared-store or replication as your HA policy. The new slave broker inherits its configuration from the master broker creating it. The name of the slave is set to colocated_backup_n
where n
is the number of backups the master broker has created.
The slave inherits configuration for its connectors and acceptors from the master broker creating it. However, AMQ Broker applies a default port offset of 100 for each. For example, if the master contains configuration for a connection that uses port 61616, the first slave created uses port 61716, the second uses 61816, and so on.
For In-VM connectors and acceptors the ID has colocated_backup_n
appended, where n
is the slave broker number.
Directories for the journal, large messages, and paging are set according to the HA strategy you choose. If you choose shared-store
, the requesting broker notifies the target broker which directories to use. If replication
is chosen, directories are inherited from the creating broker and have the new backup’s name appended to them.
Figure 17.3. Co-located Master and Slave Brokers
17.3.1. Configuring Colocated Slaves
A master broker can also be configured to allow requests from backups and also how many backups a master broker can start. This way you can evenly distribute backups around the cluster. This is configured under the ha-policy
element in the BROKER_INSTANCE_DIR/etc/broker.xml
file.
Prerequisites
You must configure a master broker to use either replication or shared-store as its HA policy.
Procedure
After choosing an HA policy, add configuration for the colocation of master and slave broker.
The example below uses each of the configuration options available and gives a description for each after the example. Some elements have a default value and therefore do not need to be explicitly added to the configuration unless you want to use your own value. Note that this example uses
replication
but you can use ashared-store
for yourha-policy
as well.<configuration> <core> ... <ha-policy> <replication> <colocated> 1 <request-backup>true</request-backup> 2 <max-backups>1</max-backups> 3 <backup-request-retries>-1</backup-request-retries> 4 <backup-request-retry-interval>5000</backup-request-retry-interval/> 5 <backup-port-offset>150</backup-port-offset> 6 <master> ... 7 </master> <slave> ... 8 </slave> </colocated> <replication> </ha-policy> </core> </configuration>
- 1
- You add the
colocated
element directly underneath the choice ofha-policy
. In the example above,colocated
appears afterreplication
. The rest of the configuration falls under this element. - 2
- Use
request-backup
to determine whether this broker requests a slave on another broker in the cluster. The default isfalse
. - 3
- Use
max-backups
to determine how many backups a master broker can create. Set to0
to stop this live broker from accepting backup requests from other live brokers. The default is1
. - 4
- Setting
backup-request-retries
defines how many times the master broker tries to request a slave. The default is-1
, which means unlimited tries. - 5
- The broker waits this long in milliseconds before retrying a request for a slave broker. The default value for
backup-request-retry-interval
is5000
, or 5 seconds. - 6
- The port offset to use for the connectors and acceptors for a new slave broker. The default is
100
. - 7
- The master broker is configured according to the
ha-policy
you chose,replication
orshared-store
. - 8
- Like the master, the slave broker adheres to the configuration of the chosen
ha-policy
.
Related Information
-
For working examples that demonstrate colocation see the colocation example Maven projects located under
INSTALL_DIR/examples/features/ha
.
17.3.2. Excluding Connectors
Sometimes some of the connectors you configure are for external brokers and should be excluded from the offset. For instance, you might have a connector used by the cluster connection to do quorum voting for a replicated slave broker. Use the excludes
element to identify connectors you do not want offset.
Prerequisites
You must configure a broker for colocation before modifying the configuration to exclude connectors.
Procedure
Modify
BROKER_INSTANCE_DIR/etc/broker.xml
by adding theexcludes
configuration element, as in the example below.<configuration> <core> ... <ha-policy> <replication> <colocated> <excludes> </excludes> ... <colocated> </replication> </ha-policy> </core> </configuration>
Add a
connector-ref
element for each connector you want to exclude. In the example below, the connector with the nameremote-connector
is excluded from the connectors inherited by the slave.<configuration> <core> ... <ha-policy> <replication> <colocated> <excludes> <connector-ref>remote-connector</connector-ref> </excludes> ... <colocated> </replication> </ha-policy> </core> </configuration>
17.4. Using a live-only Policy for Scaling Down Brokers
You can configure brokers to scale down as an alternative to using a replication or shared-store HA policy. When configured for scale down, a master broker copies its messages and transaction state to another master broker before shutting down. The advantage of scale down is that you do not need full backups to provide some form of HA. However, scaling down handles only cases where a broker stops gracefully. It is not made to handle an unexpected failure gracefully.
Another disadvantage is that it is possible to lose message ordering when scaling down. This happens because the messages in the broker that is scaling down are appended to the end of the queues of the other broker. For example, two master brokers have ten messages distributed evenly between them. If one of the brokers scales down, the messages sent to the other broker are added to queue after the ones already there. Consequently, after Broker 2 scales down, the order of the messages in Broker 1 would be 1, 3, 5, 7, 9, 2, 4, 6, 8, 10.
When a broker is preparing to scale down, it sends a message to its clients before they are disconnected informing them which new broker is ready to process their messages. However, clients should reconnect to the new broker only after their initial broker has finished scaling down. This ensures that any state, such as queues or transactions, is available on the other broker when the client reconnects. The normal reconnect settings apply when the client is reconnecting so these should be high enough to deal with the time needed to scale down.
Figure 17.4. Scaling Down Master Brokers
17.4.1. Using a Specific Connector when Scaling Down
You can configure a broker to use a specific connector to scale down. If a connector is not specified, the broker uses the first In-VM connector appearing in the configuration.
Prerequisites
Using a static list of brokers during scale down requires that you configure a connector
to the broker that receives the state of the broker scaling down. See About Connectors for more information.
Procedure
Configure scale down to a specific broker by adding a
connector-ref
element under the configuration for thescale-down
inBROKER_INSTANCE_DIR/etc/broker.xml
, as in the example below.<configuration> <core> ... <ha-policy> <live-only> <scale-down> <connectors> <connector-ref>server1-connector</connector-ref> </connectors> </scale-down> </live-only> </ha-policy> ... </core> </configuration>
Related Information
-
For a working example of scaling down using a static connector that demonstrate colocation see the
scale-down
example Maven project located underINSTALL_DIR/examples/features/ha
.
17.4.2. Using Dynamic Discovery
You can use dynamic discovery when configuring the cluster for scale down. Instead of scaling down to a specific broker by using a connector, brokers instead use a discovery group and find another broker dynamically.
Prerequisites
Using dynamic discovery during scale down requires that you configure a discovery-group
. See About Discovery Groups for more information.
Procedure
Configure scale down to use a discovery group by adding a
discovery-group-ref
element under the configuration for thescale-down
inBROKER_INSTANCE_DIR/etc/broker.xml
, as in the example below. Note thatdiscovery-group-ref
uses the attributediscovery-group-name
to hold the name of the discovery group to use.<configuration> <core> ... <ha-policy> <live-only> <scale-down> <discovery-group-ref discovery-group-name="my-discovery-group"/> </scale-down> </live-only> </ha-policy> ... </core> </configuration>
17.4.3. Using Broker Groups
It is also possible to configure brokers to scale down only to brokers that are configured with the same group.
Procedure
Configure scale down for a group of brokers by adding a
group-name
element, and a value for the desired group name, inBROKER_INSTANCE_DIR/etc/broker.xml
.In the example below, only brokers that belong to the group
my-group-name
are scaled down.<configuration> <core> ... <ha-policy> <live-only> <scale-down> <group-name>my-group-name</group-name> </scale-down> </live-only> </ha-policy> ... </core> </configuration>
17.4.4. Using Slave Brokers
You can mix scale down with HA and use master and slave brokers. In such a configuration, a slave immediately scales down to another master broker instead of becoming active itself.
Procedure
Edit the master’s broker.xml
to colocate a slave broker that is configured for scale down. Configuration using replication for its HA policy would look like the example below.
<configuration> <core> ... <ha-policy> <replication> <colocated> <backup-request-retries>44</backup-request-retries> <backup-request-retry-interval>33</backup-request-retry-interval> <max-backups>3</max-backups> <request-backup>false</request-backup> <backup-port-offset>33</backup-port-offset> <master> <group-name>purple</group-name> <check-for-live-server>true</check-for-live-server> <cluster-name>abcdefg</cluster-name> </master> <slave> <group-name>tiddles</group-name> <max-saved-replicated-journals-size>22</max-saved-replicated-journals-size> <cluster-name>33rrrrr</cluster-name> <restart-backup>false</restart-backup> <scale-down> <!--a grouping of servers that can be scaled down to--> <group-name>boo!</group-name> <!--either a discovery group--> <discovery-group-ref discovery-group-name="wahey"/> </scale-down> </slave> </colocated> </replication> </ha-policy> ... </core> </configuration>