Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

9.3.3. Total Cluster Failure


The cluster guarantees availability as long as one active primary broker or ready backup broker is left alive. If all brokers fail simultaneously, the cluster fails and non-persistent data is lost.
Brokers are in one of 6 states:
  1. standalone: not part of a HA cluster
  2. joining: newly started backup, not yet joined to the cluster.
  3. catch-up: backup has connected to the primary and is downloading queues, messages etc.
  4. ready: backup is connected and actively replicating from primary, it is ready to take over.
  5. recovering: newly-promoted to primary, waiting for backups to catch up before serving clients. Only a single primary broker can be recovering at a time.
  6. active: serving clients, only a single primary broker can be active at a time.
While there is an active primary broker, clients can get service. If the active primary fails, one of the "ready" backup brokers takes over, recovers and becomes active. A backup can only be promoted to primary if it is in the "ready" state (with the exception of the first primary in a new cluster where all brokers are in the "joining" state)
Given a stable cluster of N brokers with one active primary and N-1 ready backups, the system can sustain N-1 failures in rapid succession. The surviving broker will be promoted to active and continue to give service.
However, at this point the system cannot sustain a failure of the surviving broker until at least one of the other brokers recovers, catches up and becomes a ready backup. If the surviving broker fails before that the cluster will fail in one of two modes (depending on the exact timing of failures).
1. The cluster hangs

All brokers are in joining or catch-up mode. rgmanager tries to promote a new primary but cannot find any candidates and so gives up. clustat will show that the qpidd services are running but the the qpidd-primary service has stopped, something like this:

Expand
Table 9.3. 
Service Name Owner (Last) State
service:mrg33-qpidd-service
20.0.10.33
started
service:mrg34-qpidd-service
20.0.10.34
started
service:mrg35-qpidd-service
20.0.10.35
started
service:qpidd-primary-service
(20.0.10.33)
stopped
Eventually all brokers become stuck in "joining" mode, as shown by qpid-ha status --all.
At this point you need to restart the cluster in one of the following ways:
Restart the entire cluster

  • In luci:<your-cluster>:Nodes click reboot to restart the entire cluster.
  • or stop and restart the cluster with ccs --stopall; ccs --startall

Restart just the Qpid services

  • In luci:<your-cluster>:Service Groups:
    • select all the qpidd (not primary) services, click restart.
    • select the qpidd-primary service, click restart.
  • or stop the primary and qpidd services with clusvcadm, then restart (primary last)

2. The cluster reboots

A new primary is promoted and the cluster is functional. All non-persistent data from before the failure is lost.

Nach oben
Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2025 Red Hat