Chapter 37. High Availability and Fail-over
High availability is defined as the ability for the system to continue functioning after failure of one or more of the servers.
A part of high availability is fail-over which is defined as the ability for client connections to migrate from one server to another in the event of server failure so that client applications can continue to operate.
Warning
HornetQ requires a stable, reliable connection to the file system where its journal is located. If connectivity between HornetQ and the journal is lost and later re-established, an I/O error for messaging will occur. This error is considered a "major event" and requires manual intervention with the messaging system in order to recover (i.e. the messaging system will need to be restarted). If this occurs on a cluster node, other nodes will take on the load of the failed node, providing they have been configured to do so.
37.1. Live - Backup Pairs
HornetQ allows pairs of servers to be linked together as live - backup pairs. In this release there is a single backup server for each live server. A backup server is owned by only one live server. Backup servers are not operational until fail-over occurs.
Before fail-over, only the live server is serving the HornetQ clients while the backup servers remain passive or awaiting to become a backup server. When a live server crashes or is brought down in the correct mode, the backup server currently in passive mode will become live and another backup server will become passive. If a live server restarts after a fail-over then it will have priority and be the next server to become live when the current live server goes down, if the current live server is configured to allow automatic fail back then it will detect the live server coming back up and automatically stop.
37.1.1. HA modes
HornetQ provides only shared store in this release.
Note
Only persistent message data will survive fail-over. Non-persistent message data is lost after fail-over occurs.