Chapter 18. Client Failover
AMQ Broker 7.2 defines two types of client failover, each of which is covered in its own section later in this chapter: automatic client failover and application-level client failover. The broker also provides 100% transparent automatic reattachment of connections to the same broker, as in the case of transient network problems, for example. This is similar to failover, except the client is reconnecting to the same broker.
During failover, if the client has consumers on any non persistent or temporary queues, those queues are automatically re-created during failover on the slave broker, since the slave broker does not have any knowledge of non persistent queues.
18.1. Automatic Client Failover
A client can receive information about all master and slave brokers, so that in the event of a connection failure, it can reconnect to the slave broker. The slave broker then automatically re-creates any sessions and consumers that existed on each connection before failover. This feature saves you from having to hand-code manual reconnection logic in your applications.
When a session is re-created on the slave, it does not have any knowledge of messages already sent or acknowledged. Any in-flight sends or acknowledgements at the time of failover might also be lost. However, even without 100% transparent failover, it is simple to guarantee once and only once delivery, even in the case of failure, by using a combination of duplicate detection and retrying of transactions.
Clients detect connection failure when they have not received packets from the broker within a configurable period of time. See Detecting Dead Connections for more information.
You have a number of methods to configure clients to receive information about master and slave. One option is to configure clients to connect to a specific broker and then receive information about the other brokers in the cluster. See Configuring a Client to Use Static Discovery for more information. The most common way, however, is to use broker discovery. For details on how to configure broker discovery, see Configuring a Client to Use Dynamic Discovery.
Also, you can configure the client by adding parameters to the query string of the URL used to connect to the broker, as in the example below.
connectionFactory.ConnectionFactory=tcp://localhost:61616?ha=true&reconnectAttempts=3
Procedure
To configure your clients for failover through the use of a query string, ensure the following components of the URL are set properly.
-
The
host:port
portion of the URL should point to a master broker that is properly configured with a backup. This host and port is used only for the initial connection. Thehost:port
value has nothing to do with the actual connection failover between a live and a backup server. In the example above,localhost:61616
is used for thehost:port
. (Optional) To use more than one broker as a possible initial connection, group the
host:port
entries as in the following example:connectionFactory.ConnectionFactory=(tcp://host1:port,tcp://host2:port)?ha=true&reconnectAttempts=3
-
Include the name-value pair
ha=true
as part of the query string to ensure the client receives information about each master and slave broker in the cluster. -
Include the name-value pair
reconnectAttempts=n
, wheren
is an integer greater than0
. This parameter sets the number of times the client attempts to reconnect to a broker.
Failover occurs only if ha=true
and reconnectAttempts
is greater than 0
. Also, the client must make an initial connection to the master broker in order to receive information about other brokers. If the initial connection fails, the client can only retry to establish it. See Failing Over During the Initial Connection for more information.
18.1.1. Failing Over During the Initial Connection
Because the client does not receive information about every broker until after the first connection to the HA cluster, there is a window of time where the client can connect only to the broker included in the connection URL. Therefore, if a failure happens during this initial connection, the client cannot failover to other master brokers, but can only try to re-establish the initial connection. Clients can be configured for set number of reconnection attempts. Once the number of attempts has been made an exception is thrown.
Setting the Number of Reconnection Attempts
Procedure
The examples below shows how to set the number of reconnection attempts to 3
using the AMQ JMS client. The default value is 0
, that is, try only once.
Set the number of reconnection attempts by passing a value to
ServerLocator.setInitialConnectAttempts()
.ConnectionFactory cf = ActiveMQJMSClient.createConnectionFactory(...) cf.setInitialConnectAttempts(3);
Setting a Global Number of Reconnection Attempts
Alternatively, you can apply a global value for the maximum number of reconnection attempts within the broker’s configuration. The maximum is applied to all client connections.
Procedure
Edit
BROKER_INSTANCE_DIR/etc/broker.xml
by adding theinitial-connect-attempts
configuration element and providing a value for the time-to-live, as in the example below.<configuration> <core> ... <initial-connect-attempts>3</initial-connect-attempts> 1 ... </core> </configuration>
- 1
- All clients connecting to the broker are allowed a maximum of three attempts to reconnect. The default is
-1
, which allows clients unlimited attempts.
18.1.2. Handling Blocking Calls During Failover
When failover occurs and the client is waiting for a response from the broker to continue its execution, the newly created session does not have any knowledge of the call that was in progress. The initial call might otherwise hang forever, waiting for a response that never comes. To prevent this, the broker is designed to unblock any blocking calls that were in progress at the time of failover by making them throw an exception. Client code can catch these exceptions and retry any operations if desired.
When using AMQ JMS clients, if the unblocked method is a call to commit()
or prepare()
, the transaction is automatically rolled back and the broker throws an exception.
18.1.3. Handling Failover with Transactions
When using AMQ JMS clients, if the session is transactional and messages have already been sent or acknowledged in the current transaction, the broker cannot be sure that those messages or their acknowledgements were lost during the failover. Consequently, the transaction is marked for rollback only. Any subsequent attempt to commit it throws an javax.jms.TransactionRolledBackException
.
The caveat to this rule is when XA is used. If a two-phase commit is used and prepare()
has already been called, rolling back could cause a HeuristicMixedException
. Because of this, the commit throws an XAException.XA_RETRY
exception, which informs the Transaction Manager it should retry the commit at some later point. If the original commit has not occurred, it still exists and can be committed. If the commit does not exist, it is assumed to have been committed, although the transaction manager might log a warning. A side effect of this exception is that any nonpersistent messages are lost. To avoid such losses, always use persistent messages when using XA. This is not an issue with acknowledgements since they are flushed to the broker before prepare()
is called.
The AMQ JMS client code must catch the exception and perform any necessary client side rollback. There is no need to roll back the session, however, because it was already rolled back. The user can then retry the transactional operations again on the same session.
If failover occurs when a commit call is being executed, the broker unblocks the call to prevent the AMQ JMS client from waiting indefinitely for a response. Consequently, the client cannot determine whether the transaction commit was actually processed on the master broker before failure occurred.
To remedy this, the AMQ JMS client can enable duplicate detection in the transaction, and retry the transaction operations again after the call is unblocked. If the transaction was successfully committed on the master broker before failover, duplicate detection ensures that any durable messages present in the transaction when it is retried are ignored on the broker side. This prevents messages from being sent more than once.
If the session is non transactional, messages or acknowledgements can be lost in case of failover. If you want to provide once and only once delivery guarantees for non transacted sessions, enable duplicate detection and catch unblock exceptions.
18.1.4. Getting Notified of Connection Failure
JMS provides a standard mechanism for getting notified asynchronously of connection failure: java.jms.ExceptionListener
.
Any ExceptionListener
or SessionFailureListener
instance is always called by the broker if a connection failure occurs, whether the connection was successfully failed over, reconnected, or reattached. You can find out if a reconnect or a reattach has happened by examining the failedOver
flag passed in on the connectionFailed
on SessionFailureListener
. Alternatively, you can inspect the error code of the javax.jms.JMSException
, which can be one of the following:
Error code | Description |
---|---|
FAILOVER | Failover has occurred and the broker has successfully reattached or reconnected |
DISCONNECT | No failover has occurred and the broker is disconnected |
18.2. Application-Level Failover
In some cases you might not want automatic client failover, but prefer to code your own reconnection logic in a failure handler instead. This is known as application-level failover, since the failover is handled at the application level.
To implement application-level failover when using JMS, set an ExceptionListener
class on the JMS connection. The ExceptionListener
is called by the broker in the event that a connection failure is detected. In your ExceptionListener
, you should close your old JMS connections. You might also want to look up new connection factory instances from JNDI and create new connections.