4.2. Failover behavior for IdM servers and services
SSSD failover mechanism treats an IdM server and its services independently. If the hostname resolution for a server succeeds, SSSD considers the machine is online and tries to connect to the required service on that machine. If the connection to the service fails, SSSD considers only that specific service as offline, not the entire machine or other services on it.
If hostname resolution fails, SSSD considers the entire machine as offline, and does not attempt to connect to any services on that machine.
When all primary servers are unavailable, SSSD attempts to connect to a configured backup server. While connected to a backup server, SSSD periodically attempts to reconnect to one of the primary servers and connects immediately once a primary server becomes available. The interval between these attempts is controlled by the failover_primary_timeout option , which defaults to 31 seconds.
If all IdM servers become unreachable, SSSD switches to offline mode. In this state, SSSD retries connections every 30 seconds until a server becomes available.
You can achieve load-balancing and high-availability in IdM by installing multiple IdM replicas:
- If you have a geographically dispersed network, you can shorten the path between IdM clients and the nearest accessible server by configuring multiple IdM replicas per data center.
- Red Hat supports environments with up to 60 replicas.
- The IdM replication mechanism provides active/active service availability: services at all IdM replicas are readily available at the same time.
Red Hat recommends against combining IdM and other load-balancing or high-availability (HA) software.
Many third-party high availability solutions assume active/passive scenarios and cause unnecessary service interruption to IdM availability. Other solutions use virtual IPs or a single hostname per clustered service. All these methods do not typically work well with the type of service availability provided by the IdM solution. They also integrate very poorly with Kerberos, decreasing the overall security and stability of the deployment.