Questo contenuto non è disponibile nella lingua selezionata.

Chapter 15. Implementing high availability


You can implement a high availability solution for AMQ Broker by using either of the following methods:

  • Group brokers that are in a cluster into primary-backup groups.

    In a primary-backup group, a primary broker is linked to a backup broker. The primary broker serves client requests, while the backup brokers wait in passive mode. If the primary broker fails, a backup broker replaces the primary broker as the active broker. AMQ Broker provides several different strategies for failover (called HA policies) within a primary-backup group.

  • Create a leader-follower configuration for brokers that are not in a cluster.

    In a leader-follower configuration, non-clustered brokers use a shared message store, which can be a JDBC database or a journal file. Both brokers compete as peers to acquire a lock to the message store. The broker that acquires a lock becomes the leader broker, which serves client requests. The broker that is unable to acquire a lock becomes a follower. A follower is a passive broker that continuously tries to obtain a lock so it can become the active broker if the current leader is unavailable.

In a primary-backup group, you configure one broker as a primary and another as a backup. In a leader-follower configuration, you specify the same configuration for both brokers.

15.1. Configuring primary-backup groups for high availability

You can employ several different strategies, called HA policies to configure failover within a primary-backup group.

15.1.1. High availability policies

A high availability (HA) policy defines how failover happens in a primary-backup group. AMQ Broker provides several different HA policies:

Shared store (recommended)

The primary and backup brokers store their messaging data in a common directory on a shared file system; typically a Storage Area Network (SAN) or Network File System (NFS) server. You can also store broker data in a specified database if you have configured JDBC-based persistence. With shared store, if a primary broker fails, the backup broker loads the message data from the shared store and takes over as the active broker.

In most cases, you should use shared store instead of replication. Because shared store does not replicate data over the network, it typically provides better performance than replication. Shared store also avoids network isolation (also called "split brain") issues in which a primary broker and its backup become active at the same time.

Figure 15.1. Shared store high availability

Note

You can implement an alternative high availability solution that uses a shared store by configuring your brokers in a leader-follower configuration. In a leader-follower configuration, the brokers are not clustered and compete as peers to determine which broker becomes active and serves client requests and which remains passive until the active broker is no longer available. For more information, see Section 15.2, “Configuring leader-follower brokers for high availability”.

Replication

The primary and backup brokers continuously synchronize their messaging data over the network. If the primary broker fails, the backup broker loads the synchronized data and takes over as the active broker.

Data synchronization between the primary and backup brokers ensures that no messaging data is lost if the primary broker fails. When the primary and backup brokers initially join together, the primary broker replicates all of its existing data to the backup broker over the network. Once this initial phase is complete, the primary broker replicates persistent data to the backup broker as the primary broker receives it. This means that if the primary broker drops off the network, the backup broker has all of the persistent data that the primary broker has received up to that point.

Because replication synchronizes data over the network, network failures can result in network isolation in which a primary broker and its backup become active at the same time.

Figure 15.2. Replication high availability

Primary-only (limited HA)

When a primary broker is stopped gracefully, it copies its messages and transaction state to another active broker and then shuts down. Clients can then reconnect to the other broker to continue sending and receiving messages.

Figure 15.3. Primary-only high availability

Additional resources

15.1.2. Replication policy limitations

When you use replication to provide high availability, a risk exists that both primary and backup brokers can become active at the same time, which is referred to as "split brain".

Split brain can happen if a primary broker and its backup lose their connection. In this situation, both a primary broker and its backup can become active at the same time. Because there is no message replication between the brokers in this situation, they each serve clients and process messages without the other knowing it. In this case, each broker has a completely different journal. Recovering from this situation can be very difficult and in some cases, not possible.

  • To eliminate any possibility of split brain, use the shared store HA policy.
  • If you do use the replication HA policy, take the following steps to reduce the risk of split brain occurring.

    If you want the brokers to use the ZooKeeper Coordination Service to coordinate brokers, deploy ZooKeeper on at least three nodes. If the brokers lose connection to one ZooKeeper node, using at least three nodes ensures that a majority of nodes are available to coordinate the brokers when a primary-backup broker pair experiences a replication interruption.

    If you want to use the embedded broker coordination, which uses the other available brokers in the cluster to provide a quorum vote, you can reduce (but not eliminate) the chance of encountering split brain by using at least three primary-backup pairs. Using at least three primary-backup pairs ensures that a majority result can be achieved in any quorum vote that takes place when a primary-backup broker pair experiences a replication interruption.

Some additional considerations when you use the replication HA policy are described below:

  • When a primary broker fails and the backup broker becomes active, no further replication takes place until a new backup broker is attached to the active broker, or failback to the original primary broker occurs.
  • If the backup broker in a primary-backup group fails, the primary broker continues to serve messages. However, messages are not replicated until another broker is added as a backup, or the original backup broker is restarted. During that time, messages are persisted only to the primary broker.
  • If the brokers use the embedded broker coordination and both brokers in a primary-backup pair are shut down, to avoid message loss, you must restart the most recently active broker first. If the most recently active broker was the backup broker, you need to manually reconfigure this broker as a primary broker to enable it to be restarted first.

15.1.3. Configuring shared store high availability

You can use the shared store high availability (HA) policy to implement HA in a broker cluster. With shared store, both primary and backup brokers access a common directory on a shared file system; typically a Storage Area Network (SAN) or Network File System (NFS) server. You can also store broker data in a specified database if you have configured JDBC-based persistence. With shared store, if a primary broker fails, the backup broker loads the message data from the shared store and takes over as the active broker.

In general, a SAN offers better performance (for example, speed) versus an NFS server, and is the recommended option, if available. If you need to use an NFS server, see Red Hat AMQ 7 Supported Configurations for more information about network file systems that AMQ Broker supports.

In most cases, you should use shared store HA instead of replication. Because shared store does not replicate data over the network, it typically provides better performance than replication. Shared store also avoids network isolation (also called "split brain") issues in which a primary broker and its backup become active at the same time.

Note

When using shared store, the startup time for the backup broker depends on the size of the message journal. When the backup broker takes over for a failed primary broker, it loads the journal from the shared store. This process can be time consuming if the journal contains a lot of data.

15.1.3.1. Configuring an NFS shared store

When using shared store high availability, you must configure both the primary and backup brokers to use a common directory on a shared file system. Typically, you use a Storage Area Network (SAN) or Network File System (NFS) server.

Listed below are some recommended configuration options when mounting an exported directory from an NFS server on each of your broker machine instances.

sync
Specifies that all changes are immediately flushed to disk.
intr
Allows NFS requests to be interrupted if the server is shut down or cannot be reached.
noac
Disables attribute caching. This behavior is needed to achieve attribute cache coherence among multiple clients.
soft
Specifies that if the NFS server is unavailable, the error should be reported rather than waiting for the server to come back online.
lookupcache=none
Disables lookup caching.
timeo=n
The time, in deciseconds (tenths of a second), that the NFS client (that is, the broker) waits for a response from the NFS server before it retries a request. For NFS over TCP, the default timeo value is 600 (60 seconds). For NFS over UDP, the client uses an adaptive algorithm to estimate an appropriate timeout value for frequently used request types, such as read and write requests.
retrans=n
The number of times that the NFS client retries a request before it attempts further recovery action. If the retrans option is not specified, the NFS client tries each request three times.
Important

It is important to use reasonable values when you configure the timeo and retrans options. A default timeo wait time of 600 deciseconds (60 seconds) combined with a retrans value of 5 retries can result in a five-minute wait for AMQ Broker to detect an NFS disconnection.

Additional resources

15.1.3.2. Configuring shared store high availability

This procedure shows how to configure shared store high availability for a broker cluster.

Prerequisites

  • A shared storage system must be accessible to the primary and backup brokers.

Procedure

  1. Group the brokers in your cluster into primary-backup groups.

    In most cases, a primary-backup group should consist of two brokers: a primary broker and a backup broker. If you have six brokers in your cluster, you would need three primary-backup groups.

  2. Create the first primary-backup group consisting of one primary broker and one backup broker.

    1. Open the primary broker’s <broker_instance_dir>/etc/broker.xml configuration file.
    2. If you are using:

      1. A network file system to provide the shared store, verify that the primary broker’s paging, bindings, journal, and large messages directories point to a shared location that the backup broker can also access.

        <configuration>
            <core>
                ...
                <paging-directory>../sharedstore/data/paging</paging-directory>
                <bindings-directory>../sharedstore/data/bindings</bindings-directory>
                <journal-directory>../sharedstore/data/journal</journal-directory>
                <large-messages-directory>../sharedstore/data/large-messages</large-messages-directory>
                ...
            </core>
        </configuration>
        Copy to Clipboard Toggle word wrap
      2. A database to provide the shared store, ensure that both the primary and backup broker can connect to the same database and have the same configuration specified in the database-store element of the broker.xml configuration file. An example configuration is shown below.

        <configuration>
          <core>
            <store>
               <database-store>
                  <jdbc-connection-url>jdbc:oracle:data/oracle/database-store;create=true</jdbc-connection-url>
                  <jdbc-user>ENC(5493dd76567ee5ec269d11823973462f)</jdbc-user>
                  <jdbc-password>ENC(56a0db3b71043054269d11823973462f)</jdbc-password>
                  <bindings-table-name>BIND_TABLE</bindings-table-name>
                  <message-table-name>MSG_TABLE</message-table-name>
                  <large-message-table-name>LGE_TABLE</large-message-table-name>
                  <page-store-table-name>PAGE_TABLE</page-store-table-name>
                  <node-manager-store-table-name>NODE_TABLE<node-manager-store-table-name>
                  <jdbc-driver-class-name>oracle.jdbc.driver.OracleDriver</jdbc-driver-class-name>
                  <jdbc-network-timeout>10000</jdbc-network-timeout>
                  <jdbc-lock-renew-period>2000</jdbc-lock-renew-period>
                  <jdbc-lock-expiration>15000</jdbc-lock-expiration>
                  <jdbc-journal-sync-period>5</jdbc-journal-sync-period>
               </database-store>
            </store>
          </core>
        </configuration>
        Copy to Clipboard Toggle word wrap
    3. Configure the primary broker to use shared store for its HA policy.

      <configuration>
          <core>
              ...
              <ha-policy>
                  <shared-store>
                      <primary>
                          <failover-on-shutdown>true</failover-on-shutdown>
                      </primary>
                  </shared-store>
              </ha-policy>
              ...
          </core>
      </configuration>
      Copy to Clipboard Toggle word wrap
      failover-on-shutdown
      If this broker is stopped normally, this property controls whether the backup broker should become active and take over.
    4. Open the backup broker’s <broker_instance_dir>/etc/broker.xml configuration file.
    5. If you are using:

      1. A network file system to provide the shared store, verify that the backup broker’s paging, bindings, journal, and large messages directories point to the same shared location as the primary broker.

        <configuration>
            <core>
                ...
                <paging-directory>../sharedstore/data/paging</paging-directory>
                <bindings-directory>../sharedstore/data/bindings</bindings-directory>
                <journal-directory>../sharedstore/data/journal</journal-directory>
                <large-messages-directory>../sharedstore/data/large-messages</large-messages-directory>
                ...
            </core>
        </configuration>
        Copy to Clipboard Toggle word wrap
      2. A database to provide the shared store, ensure that both the primary and backup brokers can connect to the same database and have the same configuration specified in the database-store element of the broker.xml configuration file.
    6. Configure the backup broker to use shared store for its HA policy.

      <configuration>
          <core>
              ...
              <ha-policy>
                  <shared-store>
                      <backup>
                          <failover-on-shutdown>true</failover-on-shutdown>
                          <allow-failback>true</allow-failback>
                          <restart-backup>true</restart-backup>
                      </backup>
                  </shared-store>
              </ha-policy>
              ...
          </core>
      </configuration>
      Copy to Clipboard Toggle word wrap
      failover-on-shutdown
      If this broker has become active and then is stopped normally, this property controls whether the backup broker (the original primary broker) should become active and take over.
      allow-failback

      If failover has occurred and the backup broker has taken over for the primary broker, this property controls whether the backup broker should fail back to the original primary broker when it restarts and reconnects to the cluster.

      Note

      Failback is intended for a primary-backup pair (one primary broker paired with a single backup broker). If the primary broker is configured with multiple backups, then failback will not occur. Instead, if a failover event occurs, the backup broker will become active, and the next backup will become its backup. When the original primary broker comes back online, it will not be able to initiate failback, because the broker that is now active already has a backup.

      restart-backup
      This property controls whether the backup broker automatically restarts after it fails back to the primary broker. The default value of this property is true.
  3. Repeat Step 2 for each remaining primary-backup group in the cluster.

15.1.4. Configuring replication high availability

You can use the replication high availability (HA) policy to implement HA in a broker cluster. With replication, persistent data is synchronized between the primary and backup brokers. If a primary broker encounters a failure, message data is synchronized to the backup broker and it takes over for the failed primary broker.

You should use replication as an alternative to shared store, if you do not have a shared file system. However, replication can result in a scenario in which a primary broker and its backup become active at the same time.

Note

Because the primary and backup brokers must synchronize their messaging data over the network, replication adds a performance overhead. This synchronization process blocks journal operations, but it does not block clients. You can configure the maximum amount of time that journal operations can be blocked for data synchronization.

If the replication connection between the primary-backup broker pair is interrupted, the brokers require a way to coordinate to determine if the primary broker is still active or if it is unavailable and a failover to the backup broker is required. To provide this coordination, you can configure the brokers to use either of the following coordination methods.

  • The Apache ZooKeeper coordination service.
  • The embedded broker coordination, which uses other brokers in the cluster to provide a quorum vote.

15.1.4.1. Choosing a coordination method

Red Hat recommends that you use the Apache ZooKeeper coordination service to coordinate broker activation. When choosing a coordination method, it is useful to understand the differences in infrastructure requirements and the management of data consistency between both coordination methods.

Infrastructure requirements

  • If you use the ZooKeeper coordination service, you can operate with a single primary-backup broker pair. However, you must connect the brokers to at least 3 Apache ZooKeeper nodes to ensure that brokers can continue to function if they lose connection to one node. To provide a coordination service to brokers, you can share existing ZooKeeper nodes that are used by other applications. For more information on setting up Apache ZooKeeper, see the Apache ZooKeeper documentation.
  • If you want to use the embedded broker coordination, which uses the other available brokers in the cluster to provide a quorum vote, you must have at least three primary-backup broker pairs. Using at least three primary-backup pairs ensures that a majority result can be achieved in any quorum vote that occurs when a primary-backup broker pair experiences a replication interruption.

Data consistency

  • If you use the Apache ZooKeeper coordination service, ZooKeeper tracks the version of the data on each broker so only the broker that has the most up-to-date journal data can become active, irrespective of whether the broker is configured as a primary or backup broker for replication purposes. Version tracking eliminates the possibility that a broker can activate with an out-of-date journal and start serving clients.
  • If you use the embedded broker coordination, no mechanism exists to track the version of the data on each broker to ensure that only the broker that has the most up-to-date journal can become active. Therefore, it is possible for a broker that has an out-of-date journal to become active and start serving clients, which causes a divergence in the journal.

15.1.4.2. How brokers coordinate after a replication interruption

This section explains how both coordination methods work after a replication connection is interrupted.

Using the ZooKeeper coordination service

If you use the ZooKeeper coordination service to manage replication interruptions, both brokers must be connected to multiple Apache ZooKeeper nodes.

  • If, at any time, the active broker loses connection to a majority of the ZooKeeper nodes, it shuts down to avoid the risk of "split brain" occurring.
  • If, at any time, the backup broker loses connection to a majority of the ZooKeeper nodes, it stops receiving replication data and waits until it can connect to a majority of the ZooKeeper nodes before it acts as a backup broker again. When the connection is restored to a majority of the ZooKeeper nodes, the backup broker uses ZooKeeper to determine if it needs to discard its data and search for an active broker from which to replicate, or if it can become the active broker with its current data.

ZooKeeper uses the following control mechanisms to manage the failover process:

  • A shared lease lock that can be owned only by a single active broker at any time.
  • An activation sequence counter that tracks the latest version of the broker data. Each broker tracks the version of its journal data in a local counter stored in its server lock file, along with its NodeID. The active broker also shares its version in a coordinated activation sequence counter on ZooKeeper.

If the replication connection between the active broker and the backup broker is lost, the active broker increases both its local activation sequence counter value and the coordinated activation sequence counter value on ZooKeeper by 1 to advertise that it has the most up-to-date data. The backup broker’s data is now considered stale and the broker cannot become the active broker until the replication connection is restored and the up-to-date data is synchronized.

After the replication connection is lost, the backup broker checks if the ZooKeeper lock is owned by the active broker and if the coordinated activation sequence counter on ZooKeeper matches its local counter value.

  • If the lock is owned by the active broker, the backup broker detects that the activation sequence counter on ZooKeeper was updated by the active broker when the replication connection was lost. This indicates that the active broker is running so the backup broker does not try to failover.
  • If the lock is not owned by the active broker, the active broker is not alive. If the value of the activation sequence counter on the backup broker is the same as the coordinated activation sequence counter value on ZooKeeper, which indicates that the backup broker has up-to-date data, the backup broker fails over.
  • If the lock is not owned by the active broker but the value of the activation sequence counter on the backup broker is less than the counter value on ZooKeeper, the data on the backup broker is not up-to-date and the backup broker cannot fail over.

Using the embedded broker coordination

If a primary-backup broker pair use the embedded broker coordination to coordinate a replication interruption, the following two types of quorum votes can be initiated.

Expand
Table 15.1. Quorum voting
Vote typeDescriptionInitiatorRequired configurationParticipantsAction based on vote result

Passive vote

If a passive broker loses its replication connection to the active broker, the passive broker decides whether or not to start based on the result of this vote.

Passive broker

None. A passive vote happens automatically when a passive broker loses connection to its replication partner.

However, you can control the properties of a passive vote by specifying custom values for these parameters:

  • quorum-vote-wait
  • vote-retries
  • vote-retry-wait

Other active brokers in the cluster

The passive broker starts if it receives a majority (that is, a quorum) vote from the other active brokers in the cluster, indicating that its replication partner is no longer available.

Active vote

If an active broker loses connection to its replication partner, the active broker decides whether to continue running based on this vote.

Active broker

A vote happens when an active broker, which could be a broker configured as a backup that has failed over, loses connection to its replication partner and vote-on-replication-failure is set to true.

Other active brokers in the cluster

The active broker shuts down if it does not receive a majority vote from the other active brokers in the cluster, indicating that its cluster connection is still active.

Important

Listed below are some important things to note about how the configuration of your broker cluster affects the behavior of quorum voting.

  • For a quorum vote to succeed, the size of your cluster must allow a majority result to be achieved. Therefore, your cluster should have at least three primary-backup broker pairs.
  • The more primary-backup broker pairs that you add to your cluster, the more you increase the overall fault tolerance of the cluster. For example, suppose you have three primary-backup pairs. If you lose a complete primary-backup pair, the two remaining primary-backup pairs cannot achieve a majority result in any subsequent quorum vote. This situation means that any further replication interruption in the cluster might cause a primary broker to shut down, and prevent its backup broker from starting up. By configuring your cluster with, say, five broker pairs, the cluster can experience at least two failures, while still ensuring a majority result from any quorum vote.
  • If you intentionally reduce the number of primary-backup broker pairs in your cluster, the previously established threshold for a majority vote does not automatically decrease. During this time, any quorum vote triggered by a lost replication connection cannot succeed, making your cluster more vulnerable to split brain. To make your cluster recalculate the majority threshold for a quorum vote, first shut down the primary-backup pairs that you are removing from your cluster. Then, restart the remaining primary-backup pairs in the cluster. When all of the remaining brokers have been restarted, the cluster recalculates the quorum vote threshold.

You must specify the same replication configuration for both brokers in a pair that uses the Apache ZooKeeper coordination service. The brokers then coordinate to determine which broker is the active broker and which is the passive backup broker.

Prerequisites

  • At least 3 Apache ZooKeeper nodes to ensure that brokers can continue to operate if they lose the connection to one node.
  • The broker machines have a similar hardware specification, that is, you do not have a preference for which machine runs the active broker and which runs the passive backup broker at any point in time.
  • ZooKeeper must have sufficient resources to ensure that pause times are significantly less than the ZooKeeper server tick time. Depending on the expected load of the broker, consider carefully if the broker and ZooKeeper node can share the same node. For more information, see https://zookeeper.apache.org/.

Procedure

  1. Open the <broker_instance_dir>/etc/broker.xml configuration file for both brokers in the pair.
  2. Configure the same replication configuration for both brokers in the pair. For example:

    <configuration>
        <core>
            ...
            <ha-policy>
               <replication>
                  <primary>
                    <coordination-id>production-001</coordination-id>
                    <manager>
                       <properties>
                         <property key="connect-string" value="192.168.1.10:6666,192.168.2.10:6667,192.168.3.10:6668"/>
                       </properties>
                    </manager>
                  </primary>
               </replication>
            </ha-policy>
            ...
        </core>
    </configuration>
    Copy to Clipboard Toggle word wrap
    primary
    Configure the replication type as primary to indicate that either broker can be the primary broker depending on the result of the broker coordination.
    Coordination-id
    Specify a common string value for both brokers. Brokers with the same Coordination-id string coordinate activation together. During the coordination process, both brokers use the Coordination-id string as the node Id and attempt to obtain a lock in ZooKeeper. The first broker that obtains a lock and has up-to-date data starts as an active broker and the other broker becomes a passive backup.
    properties

    Specify a property element within which you can specify a set of key-value pairs to provide the connection details for the ZooKeeper nodes:

    Expand
    Table 15.2. ZooKeeper connection details
    KeyValue

    connect-string

    Specify a comma-separated list of the IP addresses and port numbers of the ZooKeeper nodes. For example, value="192.168.1.10:6666,192.168.2.10:6667,192.168.3.10:6668".

    session-ms

    The duration that the broker waits before it shuts down after losing connection to a majority of the ZooKeeper nodes. The default value is 18000 ms. A valid value is between 2 times and 20 times the ZooKeeper server tick time.

    Note

    The ZooKeeper pause time for garbage collection must be less than 0.33 of the value of the session-ms property in order to allow the ZooKeeper heartbeat to function reliably. If it is not possible to ensure that pause times are less than this limit, increase the value of the session-ms property for each broker and accept a slower failover.

    Important

    Broker replication partners automatically exchange "ping" packets every 2 seconds to confirm that the partner broker is available. When a backup broker does not receive a response from the active broker, the backup waits for a response until the broker’s connection time-to-live (ttl) expires. The default connection-ttl is 60000 ms which means that a backup broker attempts to fail over after 60 seconds. It is recommended that you set the connection-ttl value to a similar value to the session-ms property value to allow a faster failover. To set a new connection-ttl, configure the connection-ttl-override property.

    namespace (optional)

    If the brokers share the ZooKeeper nodes with other applications, you can create a ZooKeeper namespace to store the files that provide a coordination service to brokers. You must specify the same namespace for both brokers.

  3. Configure any additional HA properties for the brokers.

    These additional HA properties have default values that are suitable for most common use cases. Therefore, you only need to configure these properties if you do not want the default behavior. For more information, see Appendix F, Additional Replication High Availability Configuration Elements.

  4. Repeat steps 1 to 3 to configure each additional broker pair in the cluster.

Additional resources

Replication using the embedded broker coordination requires at least three primary-backup pairs to lessen (but not eliminate) the risk of "split brain".

The following procedure describes how to configure replication high-availability (HA) for a six-broker cluster. In this topology, the six brokers are grouped into three primary-backup pairs: each of the three primary brokers is paired with a dedicated backup broker.

Prerequisites

  • You must have a broker cluster with at least six brokers.

    The six brokers are configured into three primary-backup pairs. For more information about adding brokers to a cluster, see Chapter 14, Setting up a broker cluster.

Procedure

  1. Group the brokers in your cluster into primary-backup groups.

    In most cases, a primary-backup group should consist of two brokers: a primary broker and a backup broker. If you have six brokers in your cluster, you need three primary-backup groups.

  2. Create the first primary-backup group consisting of one primary broker and one backup broker.

    1. Open the primary broker’s <broker_instance_dir>/etc/broker.xml configuration file.
    2. Configure the primary broker to use replication for its HA policy.

      <configuration>
          <core>
              ...
              <ha-policy>
                  <replication>
                      <primary>
                          <check-for-active-server>true</check-for-active-server>
                          <group-name>my-group-1</group-name>
                          <vote-on-replication-failure>true</vote-on-replication-failure>
                          ...
                      </primary>
                  </replication>
              </ha-policy>
              ...
          </core>
      </configuration>
      Copy to Clipboard Toggle word wrap
      check-for-active-server

      If the primary broker fails, this property controls whether clients should fail back to it when it restarts.

      If you set this property to true, when the primary broker restarts after a previous failover, it searches for another broker in the cluster with the same node ID. If the primary broker finds another broker with the same node ID, this indicates that a backup broker successfully started upon failure of the primary broker. In this case, the primary broker synchronizes its data with the backup broker. The primary broker then requests the backup broker to shut down. If the backup broker is configured for failback, as shown below, it shuts down. The primary broker then resumes its active role, and clients reconnect to it.

      Warning

      If you do not set check-for-active-server to true on the primary broker, you might experience duplicate messaging handling when you restart the primary broker after a previous failover. Specifically, if you restart a primary broker with this property set to false, the primary broker does not synchronize data with its backup broker. In this case, the primary broker might process the same messages that the backup broker has already handled, causing duplicates.

      group-name
      A name for this primary-backup group (optional). To form a primary-backup group, the primary and backup brokers must be configured with the same group name. If you don’t specify a group-name, a backup broker can replicate with any primary broker.
      vote-on-replication-failure

      This property controls whether a primary broker initiates a quorum vote called a primary vote in the event of an interrupted replication connection.

      A primary vote is a way for a primary broker to determine whether it or its partner is the cause of the interrupted replication connection. Based on the result of the vote, the primary broker either stays running or shuts down.

      Important

      For a quorum vote to succeed, the size of your cluster must allow a majority result to be achieved. Therefore, when you use the replication HA policy, your cluster should have at least three primary-backup broker pairs.

      The more broker pairs you configure in your cluster, the more you increase the overall fault tolerance of the cluster. For example, suppose you have three primary-backup broker pairs. If you lose connection to a complete primary-backup pair, the two remaining primary-backup pairs can no longer achieve a majority result in a quorum vote. This situation means that any subsequent replication interruption might cause a primary broker to shut down, and prevent its backup broker from starting up. By configuring your cluster with, say, five broker pairs, the cluster can experience at least two failures, while still ensuring a majority result from any quorum vote.

    3. Configure any additional HA properties for the primary broker.

      These additional HA properties have default values that are suitable for most common use cases. Therefore, you only need to configure these properties if you do not want the default behavior. For more information, see Appendix F, Additional Replication High Availability Configuration Elements.

    4. Open the backup broker’s <broker_instance_dir>/etc/broker.xml configuration file.
    5. Configure the backup broker to use replication for its HA policy.

      <configuration>
          <core>
              ...
              <ha-policy>
                  <replication>
                      <backup>
                          <allow-failback>true</allow-failback>
                          <group-name>my-group-1</group-name>
                          <vote-on-replication-failure>true</vote-on-replication-failure>
                          ...
                      </backup>
                  </replication>
              </ha-policy>
              ...
          </core>
      </configuration>
      Copy to Clipboard Toggle word wrap
      allow-failback

      If failover has occurred and the backup broker has taken over for the primary broker, this property controls whether the backup broker should fail back to the original primary broker when it restarts and reconnects to the cluster.

      Note

      Failback is intended for a primary-backup pair (one primary broker paired with a single backup broker). If the primary broker is configured with multiple backups, then failback will not occur. Instead, if a failover event occurs, the backup broker will become active, and the next backup will become its backup. When the primary broker comes back online, it will not be able to initiate failback, because the broker that is now active already has a backup.

      group-name
      A name for this primary-backup group (optional). To form a primary-backup group, the primary and backup brokers must be configured with the same group name. If you don’t specify a group-name, a backup broker can replicate with any primary broker.
      vote-on-replication-failure

      This property controls whether a backup broker that has become active can initiate a quorum vote called a primary vote in the event of an interrupted replication connection.

      A primary vote is a way for a primary broker to determine whether it or its partner is the cause of the interrupted replication connection. Based on the result of the vote, the primary broker either stays running or shuts down.

    6. (Optional) Configure properties of the quorum votes that the backup broker initiates.

      <configuration>
          <core>
              ...
              <ha-policy>
                  <replication>
                      <backup>
                      ...
                          <vote-retries>12</vote-retries>
                          <vote-retry-wait>5000</vote-retry-wait>
                      ...
                      </backup>
                  </replication>
              </ha-policy>
              ...
          </core>
      </configuration>
      Copy to Clipboard Toggle word wrap
      vote-retries
      This property controls how many times the backup broker retries the quorum vote in order to receive a majority result that allows the backup broker to start up.
      vote-retry-wait
      This property controls how long, in milliseconds, that the backup broker waits between each retry of the quorum vote.
    7. Configure any additional HA properties for the backup broker.

      These additional HA properties have default values that are suitable for most common use cases. Therefore, you only need to configure these properties if you do not want the default behavior. For more information, see Appendix F, Additional Replication High Availability Configuration Elements.

  3. Repeat step 2 for each additional primary-backup group in the cluster.

    If there are six brokers in the cluster, repeat this procedure two more times; once for each remaining primary-backup group.

Additional resources

15.1.5. Configuring limited high availability with primary-only

The primary-only HA policy enables you to shut down a broker in a cluster without losing any messages. With primary-only, when an active broker is stopped gracefully, it copies its messages and transaction state to another active broker and then shuts down. Clients can then reconnect to the other broker to continue sending and receiving messages.

The primary-only HA policy only handles cases when the broker is stopped gracefully. It does not handle unexpected broker failures.

While primary-only HA prevents message loss, it may not preserve message order. If a broker configured with primary-only HA is stopped, its messages will be appended to the ends of the queues of another broker.

Note

When a broker is preparing to scale down, it sends a message to its clients before they are disconnected informing them which new broker is ready to process their messages. However, clients should reconnect to the new broker only after their initial broker has finished scaling down. This ensures that any state, such as queues or transactions, is available on the other broker when the client reconnects. The normal reconnect settings apply when the client is reconnecting, so you should set these high enough to deal with the time needed to scale down.

This procedure describes how to configure each broker in the cluster to scale down. After completing this procedure, whenever a broker is stopped gracefully, it will copy its messages and transaction state to another broker in the cluster.

Procedure

  1. Open the first broker’s <broker_instance_dir>/etc/broker.xml configuration file.
  2. Configure the broker to use the primary-only HA policy.

    <configuration>
        <core>
            ...
            <ha-policy>
                <primary-only>
                </primary-only>
            </ha-policy>
            ...
        </core>
    </configuration>
    Copy to Clipboard Toggle word wrap
  3. Configure a method for scaling down the broker cluster.

    Specify the broker or group of brokers to which this broker should scale down.

    Expand
    Table 15.3. Methods for scaling down a broker cluster
    To scale down to…​Do this…​

    A specific broker in the cluster

    Specify the connector of the broker to which you want to scale down.

    <primary-only>
        <scale-down>
            <connectors>
                <connector-ref>broker1-connector</connector-ref>
            </connectors>
        </scale-down>
    </primary-only>
    Copy to Clipboard Toggle word wrap

    Any broker in the cluster

    Specify the broker cluster’s discovery group.

    <primary-only>
        <scale-down>
            <discovery-group-ref discovery-group-name="my-discovery-group"/>
        </scale-down>
    </primary-only>
    Copy to Clipboard Toggle word wrap

    A broker in a particular broker group

    Specify a broker group.

    <primary-only>
        <scale-down>
            <group-name>my-group-name</group-name>
        </scale-down>
    </primary-only>
    Copy to Clipboard Toggle word wrap
  4. Repeat this procedure for each remaining broker in the cluster.

Additional resources

  • For an example of a broker cluster that uses primary-only to scale down the cluster, see the scale-down example.

15.1.6. Configuring high availability with colocated backups

Rather than configure primary-backup groups, you can colocate backup brokers in the same JVM as another primary broker. In this configuration, each primary broker is configured to request another primary broker to create and start a backup broker in its JVM.

Figure 15.4. Colocated primary and backup brokers

You can use colocation with either shared store or replication as the high availability (HA) policy. The new backup broker inherits its configuration from the primary broker that creates it. The name of the backup is set to colocated_backup_n where n is the number of backups the primary broker has created.

In addition, the backup broker inherits the configuration for its connectors and acceptors from the primary broker that creates it. By default, port offset of 100 is applied to each. For example, if the primary broker has an acceptor for port 61616, the first backup broker created will use port 61716, the second backup will use 61816, and so on.

Directories for the journal, large messages, and paging are set according to the HA policy you choose. If you choose shared store, the requesting broker notifies the target broker which directories to use. If replication is chosen, directories are inherited from the creating broker and have the new backup’s name appended to them.

This procedure configures each broker in the cluster to use shared store HA, and to request a backup to be created and colocated with another broker in the cluster.

Procedure

  1. Open the first broker’s <broker_instance_dir>/etc/broker.xml configuration file.
  2. Configure the broker to use an HA policy and colocation.

    In this example, the broker is configured with shared store HA and colocation.

    <configuration>
        <core>
            ...
            <ha-policy>
                <shared-store>
                    <colocated>
                        <request-backup>true</request-backup>
                        <max-backups>1</max-backups>
                        <backup-request-retries>-1</backup-request-retries>
                        <backup-request-retry-interval>5000</backup-request-retry-interval/>
                        <backup-port-offset>150</backup-port-offset>
                        <excludes>
                            <connector-ref>remote-connector</connector-ref>
                        </excludes>
                        <primary>
                            <failover-on-shutdown>true</failover-on-shutdown>
                        </primary>
                        <backup>
                            <failover-on-shutdown>true</failover-on-shutdown>
                            <allow-failback>true</allow-failback>
                            <restart-backup>true</restart-backup>
                        </backup>
                    </colocated>
                </shared-store>
            </ha-policy>
            ...
        </core>
    </configuration>
    Copy to Clipboard Toggle word wrap
    request-backup
    By setting this property to true, this broker will request a backup broker to be created by another primary broker in the cluster.
    max-backups
    The number of backup brokers that this broker can create. If you set this property to 0, this broker will not accept backup requests from other brokers in the cluster.
    backup-request-retries
    The number of times this broker should try to request a backup broker to be created. The default is -1, which means unlimited tries.
    backup-request-retry-interval
    The amount of time in milliseconds that the broker should wait before retrying a request to create a backup broker. The default is 5000, or 5 seconds.
    backup-port-offset
    The port offset to use for the acceptors and connectors for a new backup broker. If this broker receives a request to create a backup for another broker in the cluster, it will create the backup broker with the ports offset by this amount. The default is 100.
    excludes (optional)
    Excludes connectors from the backup port offset. If you have configured any connectors for external brokers that should be excluded from the backup port offset, add a <connector-ref> for each of the connectors.
    primary
    The shared store or replication failover configuration for this broker.
    backup
    The shared store or replication failover configuration for this broker’s backup.
  3. Repeat this procedure for each remaining broker in the cluster.

Additional resources

  • For examples of broker clusters that use colocated backups, see the HA examples.

15.2. Configuring leader-follower brokers for high availability

A leader-follower configuration provides high availability for broker instances that are not clustered. Each instance must be configured to persist messages to the same message store, which can be a JDBC database or a journal on a shared volume.

In a leader-follower configuration, you specify the same high availability configuration for both brokers.

You can configure leader-follower broker instances that persist messages to a shared JDBC database.

Procedure

  1. Create two broker instances. For more information, see Creating a broker instance in Getting Started with AMQ Broker.
  2. Open the <broker_instance_dir>/etc/broker.xml configuration file for the first broker instance.
  3. Add the appropriate JDBC client libraries to the broker runtime. To do this, add the .JAR file for the database to the broker_instance_dir>/lib directory.
  4. Configure the first broker instance to use a JDBC database.

    1. Within the core element, add a ha-policy element that contains a shared-store element. Within the shared-store element, specify a value of primary.

      <configuration>
          <core>
              ...
              <ha-policy>
                 <shared-store>
                    <primary/>
                 </shared-store>
              </ha-policy>
              ...
          </core>
      </configuration
      Copy to Clipboard Toggle word wrap
      Note

      Within the shared-store element, you specify a value of primary in the configuration file for both broker instances.

    2. Within the core element, add a store element that contains a database-store element.

      <configuration>
        <core>
          ...
          <store>
             <database-store>
             </database-store>
          </store>
          ...
        </core>
      </configuration>
      Copy to Clipboard Toggle word wrap
    3. Within the database-store element, add configuration parameters for JDBC persistence and specify values. For example:

      <configuration>
          <core>
              ...
              <store>
                 <database-store>
                    <jdbc-driver-class-name>oracle.jdbc.driver.OracleDriver</jdbc-driver-class-name>
                    <jdbc-connection-url>jdbc:oracle:data/oracle/database-store;create=true</jdbc-connection-url>
                    <bindings-table-name>BINDINGS_TABLE</bindings-table-name>
                    <message-table-name>MESSAGE_TABLE</message-table-name>
                    <page-store-table-name>MESSAGE_TABLE</page-store-table-name>
                    <large-message-table-name>LARGE_MESSAGES_TABLE</large-message-table-name>
                    <node-manager-store-table-name>NODE_MANAGER_TABLE</node-manager-store-table-name>
                 </database-store>
              </store>
              ...
          <core>
      </configuration>
      Copy to Clipboard Toggle word wrap

      For more information on the JDBC configuration parameters see, Section 6.2.1, “Configuring JDBC persistence”.

  5. Open the <broker_instance_dir>/etc/broker.xml configuration file for the second broker instance.
  6. Repeat step 4 to configure the second broker instance to use the same HA policy and JDBC database.

15.2.2. Configuring leader-follower brokers that use a shared journal

You can configure leader-follower brokers that persist messages to a journal on a shared file system, such as a storage Area Network (SAN) or Network File System (NFS) server.

Prerequisite

A shared file system that can be accessed by both broker instances.

Procedure

  1. Create two broker instances. For more information, see Creating a broker instance in Getting Started with AMQ Broker.
  2. Open the <broker_instance_dir>/etc/broker.xml configuration file for the first broker instance:
  3. Configure the first broker instance to use a shared store HA policy.

    1. Within the core element, add a ha-policy element that contains a shared-store element. Within the shared-store element, specify a value of primary.

      <configuration>
          <core>
              ...
              <ha-policy>
                 <shared-store>
                    <primary/>
                 </shared-store>
              </ha-policy>
              ...
          </core>
      </configuration
      Copy to Clipboard Toggle word wrap
      Note

      Within the shared-store element, you specify a value of primary in the configuration file for both broker instances.

    2. Change the default location of the paging, bindings, journal, and large messages directories to a path on the shared filesystem. For example:

      <configuration>
          <core>
              ...
              <paging-directory>../sharedstore/data/paging</paging-directory>
              <bindings-directory>../sharedstore/data/bindings</bindings-directory>
              <journal-directory>../sharedstore/data/journal</journal-directory>
              <large-messages-directory>../sharedstore/data/large-messages</large-messages-directory>
              ...
          </core>
      </configuration
      Copy to Clipboard Toggle word wrap

      For more information on the shared store parameters see, Section 6.1.2, “Configuring journal-based persistence”

  4. Open the <broker_instance_dir>/etc/broker.xml configuration file for the second broker instance.
  5. Repeat step 3 to configure the same HA policy and shared store directories for the second broker instance.

15.3. Configuring clients to fail over

After configuring high availability in a broker cluster, you configure your clients to fail over. Client failover ensures that if a broker fails, the clients connected to it can reconnect to another broker in the cluster with minimal downtime.

Note

In the event of transient network problems, AMQ Broker automatically reattaches connections to the same broker. This is similar to failover, except that the client reconnects to the same broker.

You can configure two different types of client failover:

Automatic client failover
If you implement HA by using primary-backup groups, the client receives information about the broker cluster when it first connects. If the broker to which it is connected fails, the client automatically reconnects to the broker’s backup, and the backup broker re-creates any sessions and consumers that existed on each connection before failover.
Application-level client failover
As an alternative to automatic client failover, you can instead code your client applications with your own custom reconnection logic in a failure handler.

Procedure

Torna in cima
Red Hat logoGithubredditYoutubeTwitter

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Aiutiamo gli utenti Red Hat a innovarsi e raggiungere i propri obiettivi con i nostri prodotti e servizi grazie a contenuti di cui possono fidarsi. Esplora i nostri ultimi aggiornamenti.

Rendiamo l’open source più inclusivo

Red Hat si impegna a sostituire il linguaggio problematico nel codice, nella documentazione e nelle proprietà web. Per maggiori dettagli, visita il Blog di Red Hat.

Informazioni su Red Hat

Forniamo soluzioni consolidate che rendono più semplice per le aziende lavorare su piattaforme e ambienti diversi, dal datacenter centrale all'edge della rete.

Theme

© 2025 Red Hat