Chapter 20. Troubleshooting replication-related problems
This section lists frequent error messages in replication environments, explains possible causes, and offers remedy.
20.1. Configuring Directory Server to log replication-related errors Copy linkLink copied to clipboard!
To log replication-related errors, enable replication debugging. The nsslapd-errorlog-level parameter is additive. This means that, to enable multiple logging features, you have to add the values of each logging feature, and set the sum in nsslapd-errorlog-level.
Procedure
Display the current error log level:
dsconf <instance_name> config get nsslapd-errorlog-level
# dsconf <instance_name> config get nsslapd-errorlog-level nsslapd-errorlog-level: 16384Copy to Clipboard Copied! Toggle word wrap Toggle overflow The value to enable replication debugging is
8192. Set thensslapd-errorlog-levelparameter to24576(8192+ the previous value16384) to enable replication debugging in addition to the currently enabled error logging features:dsconf <instance_name> config replace nsslapd-errorlog-level=24576
# dsconf <instance_name> config replace nsslapd-errorlog-level=24576Copy to Clipboard Copied! Toggle word wrap Toggle overflow
20.2. Tracing origin of connections in Directory Server log files using sid Copy linkLink copied to clipboard!
You can identify the specific supplier server and replication agreement responsible for initiating a given operation in a consumer’s Directory Server log files in a replication topology.
A replication topology usually contains several suppliers that replicate updates to multiple consumers. Consequently, the access log of a consumer contains a huge mix of requests from several suppliers. For debugging and diagnostic purposes, you can use the session tracking identifier (sid) to match replication agreement actions on a supplier with related consumer received operations. sid uniquely identifies each connection that a replication agreement creates to send updates from a supplier to a consumer.
On the supplier side you must enable replication debug level (8192) to record sid to error logs when a replication agreement initiates a replication session on a consumer.
On the consumer side, Directory Server records the operation’s result from all suppliers along with the sid to the access log, even with the default logging level.
The following procedure traces the origin of connections or operations in the Directory Server log files.
Prerequisites
-
You enabled the replication debugging level
8192for the error log on a supplier. For details, see Configuring Directory Server to log replication-related errors.
Procedure
Identify
sidof an operation you want to trace in the access log file on a consumer. For example, you want to trace the origin of the operation that modified theuid=example_user,ou=people,dc=example,dc=comentry:Retrieve the connection and operation numbers of the modify operation:
grep "uid=example_user,ou=people,dc=example,dc=com" /var/log/dirsrv/slapd-<consumer_instance>/access
# grep "uid=example_user,ou=people,dc=example,dc=com" /var/log/dirsrv/slapd-<consumer_instance>/accessCopy to Clipboard Copied! Toggle word wrap Toggle overflow [03/Nov/2025:04:35:48.308889287 -0500] conn=1 op=5 MOD dn="uid=example_user,ou=people,dc=example,dc=com"
[03/Nov/2025:04:35:48.308889287 -0500] conn=1 op=5 MOD dn="uid=example_user,ou=people,dc=example,dc=com"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Search for the result log entry in the access log that contains previously retrieved the connection and operation numbers:
grep "conn=1 op=5" /var/log/dirsrv/slapd-<consumer_instance>/access
# grep "conn=1 op=5" /var/log/dirsrv/slapd-<consumer_instance>/accessCopy to Clipboard Copied! Toggle word wrap Toggle overflow [03/Nov/2025:04:35:48.350994382 -0500] conn=1 op=5 RESULT err=0 tag=103 nentries=0 wtime=0.000157697 optime=0.042113370 etime=0.042269379 csn=69087772000000010000 sid="u5aM64k2pxZ 3"
[03/Nov/2025:04:35:48.350994382 -0500] conn=1 op=5 RESULT err=0 tag=103 nentries=0 wtime=0.000157697 optime=0.042113370 etime=0.042269379 csn=69087772000000010000 sid="u5aM64k2pxZ 3"Copy to Clipboard Copied! Toggle word wrap Toggle overflow The modify operation has the session tracking identifier
u5aM64k2pxZ 3.
Search for the session tracking identifier
u5aM64k2pxZ 3in the error log file on each supplier:grep "u5aM64k2pxZ 3" /var/log/dirsrv/slapd-<supplier_instance>/errors
# grep "u5aM64k2pxZ 3" /var/log/dirsrv/slapd-<supplier_instance>/errorsCopy to Clipboard Copied! Toggle word wrap Toggle overflow The error log file on the
supplier_3server contains requiredsidvalue meaning thatsupplier_3initiated the modify operation:[03/Nov/2025:04:35:48.045087151 -0500] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): State: backoff -> sending_updates [03/Nov/2025:04:35:48.094419605 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Sending modify operation (dn="uid=demo_user,ou=people,dc=example,dc=com" csn=69087772000000010000) [03/Nov/2025:04:35:48.097990796 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Receiver successfully sent operation with csn 69087772000000010000
[03/Nov/2025:04:35:48.045087151 -0500] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): State: backoff -> sending_updates [03/Nov/2025:04:35:48.094419605 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Sending modify operation (dn="uid=demo_user,ou=people,dc=example,dc=com" csn=69087772000000010000) [03/Nov/2025:04:35:48.097990796 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Receiver successfully sent operation with csn 69087772000000010000Copy to Clipboard Copied! Toggle word wrap Toggle overflow
20.3. Overview of replication-related errors, causes, and possible solutions Copy linkLink copied to clipboard!
The following is an overview of replication-related errors and possible solutions:
agmt=<agreement_name> (<host_name>:<port>) Replica has a different generation ID than the local data
- Reason: The consumer specified in parenthesis of the message has not been successfully initialized yet, or it was initialized from a different root supplier.
- Impact: The local supplier will not replicate any data to the consumer.
-
Solution: Ignore this message if it occurs before the consumer is initialized. Otherwise, reinitialize the consumer if the message is persistent. In a multi-supplier environment, all servers need be initialized only once from a root supplier, directly or indirectly. For example, server
S1initializesS2andS4,S2then initializesS3, and so on. The important thing to note is thatS2must not start initializingS3until the initialization ofS2is done. For this, check the total update status from the web console onS1or in the error log ofS1orS2. Also,S2should not initializeS1back.
Warning: data for replica’s was reloaded, and it no longer matches the data in the changelog. Recreating the changelog file. This could affect replication with replica’s consumers, in which case the consumers should be reinitialized.
- Reason: This message can appear only when you restart a supplier. It indicates that the supplier was unable to write the changelog or did not flush out its replica update vector (RUV) at its last shutdown. The former case usually happens because of a disk-space problem, and the latter case because a server crashed or was ungracefully shut down.
-
Impact: The server is not be able to send the changes to a consumer if the consumer’s
maxcsnvalue no longer exists in the server’s changelog. - Remedy: Check the disk space and for possible core files under the server’s logs directory. If this is a single-supplier replication, reinitialize the consumers. Otherwise, if the server later complains that it cannot locate change sequence numbers (CSN) for a consumer, verify if the consumer can receive the CSN from other suppliers. If not, reinitialize the consumer.
Too much time skew
- Reason: The system clocks on the host machines are extremely out of sync.
- Impact: Directory Server uses the system clock to generate a part of the CSN. In order to reflect the change sequence among multiple suppliers, suppliers would forward-adjust their local clocks based on the remote clocks of the other suppliers. Because the adjustment is limited to a certain amount, any difference that exceeds the permitted limit will cause the replication session to be aborted.
-
Remedy: Synchronize the system clocks on the Directory Server host machines, for example, by configuring the
chronydservice.
agmt=<agreement_name> (<host_name>:<port>): Warning: Unable to send endReplication extended operation (error_message)
- Reason: The consumer is not responding.
- Impact: If the consumer recovers without being restarted, there is a chance that the replica on the consumer will be locked forever if it did not receive the release lock message from the supplier.
- Remedy: Watch if the consumer can receive any new change from any of its suppliers, or start the replication monitor, and see if all the suppliers of this consumer warn that the replica is busy. If the replica appears to be locked forever and no supplier can get in, restart the consumer.
Changelog is getting too big
- Reason: Either changelog purge is turned off, which is the default setting, or changelog purge is turned on, but some consumers are way behind the supplier.
Remedy: By default, changelog purge is turned off. To turn it on from the command line, enter:
dsconf <instance_name> replication set-changelog --max-age 1d --suffix "dc=example,dc=com"
# dsconf <instance_name> replication set-changelog --max-age 1d --suffix "dc=example,dc=com"Copy to Clipboard Copied! Toggle word wrap Toggle overflow 1dmeans 1 day. Other valid time units aresfor seconds,mfor minutes,hfor hours, andwfor weeks. A value of0turns off the purge.With changelog purge turned on, a purge thread that wakes up every five minutes removes a change if its age is greater than the value you set and if it has been replayed to all the direct consumers of this supplier or hub.
If it appears that the changelog is not purged when the purge threshold is reached, check the maximum time lag from the replication monitor among all the consumers. Irrespective of what the purge threshold is, no change will be purged before it is replayed by all the consumers.