Home
Products
Red Hat Directory Server
13
Configuring and managing replication
Chapter 20. Troubleshooting replication-related problems

Chapter 20. Troubleshooting replication-related problems

This section lists frequent error messages in replication environments, explains possible causes, and offers remedy.

To log replication-related errors, enable replication debugging. The nsslapd-errorlog-level parameter is additive. This means that, to enable multiple logging features, you have to add the values of each logging feature, and set the sum in nsslapd-errorlog-level.

Procedure

Display the current error log level:

dsconf <instance_name> config get nsslapd-errorlog-level
nsslapd-errorlog-level: 16384

# dsconf <instance_name> config get nsslapd-errorlog-level
nsslapd-errorlog-level: 16384

Copy to Clipboard

Toggle word wrap

The value to enable replication debugging is 8192. Set the nsslapd-errorlog-level parameter to 24576 (8192 + the previous value 16384) to enable replication debugging in addition to the currently enabled error logging features:
```
dsconf <instance_name> config replace nsslapd-errorlog-level=24576
```
```
# dsconf <instance_name> config replace nsslapd-errorlog-level=24576
```
Copy to Clipboard Toggle word wrap

20.2. Tracing origin of connections in Directory Server log files using sid
Copy link

You can identify the specific supplier server and replication agreement responsible for initiating a given operation in a consumer’s Directory Server log files in a replication topology.

A replication topology usually contains several suppliers that replicate updates to multiple consumers. Consequently, the access log of a consumer contains a huge mix of requests from several suppliers. For debugging and diagnostic purposes, you can use the session tracking identifier (sid) to match replication agreement actions on a supplier with related consumer received operations. sid uniquely identifies each connection that a replication agreement creates to send updates from a supplier to a consumer.

On the supplier side you must enable replication debug level (8192) to record sid to error logs when a replication agreement initiates a replication session on a consumer.

On the consumer side, Directory Server records the operation’s result from all suppliers along with the sid to the access log, even with the default logging level.

The following procedure traces the origin of connections or operations in the Directory Server log files.

Prerequisites

You enabled the replication debugging level 8192 for the error log on a supplier. For details, see Configuring Directory Server to log replication-related errors.

Procedure

Identify sid of an operation you want to trace in the access log file on a consumer. For example, you want to trace the origin of the operation that modified the uid=example_user,ou=people,dc=example,dc=com entry:

Retrieve the connection and operation numbers of the modify operation:

grep "uid=example_user,ou=people,dc=example,dc=com" /var/log/dirsrv/slapd-<consumer_instance>/access

# grep "uid=example_user,ou=people,dc=example,dc=com" /var/log/dirsrv/slapd-<consumer_instance>/access

Copy to Clipboard

Toggle word wrap

[03/Nov/2025:04:35:48.308889287 -0500] conn=1 op=5 MOD dn="uid=example_user,ou=people,dc=example,dc=com"

[03/Nov/2025:04:35:48.308889287 -0500] conn=1 op=5 MOD dn="uid=example_user,ou=people,dc=example,dc=com"

Copy to Clipboard

Toggle word wrap

Search for the result log entry in the access log that contains previously retrieved the connection and operation numbers:

grep "conn=1 op=5" /var/log/dirsrv/slapd-<consumer_instance>/access

# grep "conn=1 op=5" /var/log/dirsrv/slapd-<consumer_instance>/access

Copy to Clipboard

Toggle word wrap

[03/Nov/2025:04:35:48.350994382 -0500] conn=1 op=5 RESULT err=0 tag=103 nentries=0 wtime=0.000157697 optime=0.042113370 etime=0.042269379 csn=69087772000000010000 sid="u5aM64k2pxZ 3"

[03/Nov/2025:04:35:48.350994382 -0500] conn=1 op=5 RESULT err=0 tag=103 nentries=0 wtime=0.000157697 optime=0.042113370 etime=0.042269379 csn=69087772000000010000 sid="u5aM64k2pxZ 3"

Copy to Clipboard

Toggle word wrap

The modify operation has the session tracking identifier u5aM64k2pxZ 3.

Search for the session tracking identifier u5aM64k2pxZ 3 in the error log file on each supplier:

grep "u5aM64k2pxZ 3" /var/log/dirsrv/slapd-<supplier_instance>/errors

# grep "u5aM64k2pxZ 3" /var/log/dirsrv/slapd-<supplier_instance>/errors

Copy to Clipboard

Toggle word wrap

The error log file on the supplier_3 server contains required sid value meaning that supplier_3 initiated the modify operation:

[03/Nov/2025:04:35:48.045087151 -0500] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): State: backoff -> sending_updates
[03/Nov/2025:04:35:48.094419605 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Sending modify operation (dn="uid=demo_user,ou=people,dc=example,dc=com" csn=69087772000000010000)
[03/Nov/2025:04:35:48.097990796 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Receiver successfully sent operation with csn 69087772000000010000

[03/Nov/2025:04:35:48.045087151 -0500] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): State: backoff -> sending_updates
[03/Nov/2025:04:35:48.094419605 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Sending modify operation (dn="uid=demo_user,ou=people,dc=example,dc=com" csn=69087772000000010000)
[03/Nov/2025:04:35:48.097990796 -0500] - DEBUG - NSMMReplicationPlugin - replay_update - sid="u5aM64k2pxZ 3" - agmt="cn=002" (supplier_3:39002): Receiver successfully sent operation with csn 69087772000000010000

Copy to Clipboard

Toggle word wrap

The following is an overview of replication-related errors and possible solutions:

agmt=<agreement_name> (<host_name>:<port>) Replica has a different generation ID than the local data

Reason: The consumer specified in parenthesis of the message has not been successfully initialized yet, or it was initialized from a different root supplier.
Impact: The local supplier will not replicate any data to the consumer.
Solution: Ignore this message if it occurs before the consumer is initialized. Otherwise, reinitialize the consumer if the message is persistent. In a multi-supplier environment, all servers need be initialized only once from a root supplier, directly or indirectly. For example, server S1 initializes S2 and S4, S2 then initializes S3, and so on. The important thing to note is that S2 must not start initializing S3 until the initialization of S2 is done. For this, check the total update status from the web console on S1 or in the error log of S1 or S2. Also, S2 should not initialize S1 back.

Warning: data for replica’s was reloaded, and it no longer matches the data in the changelog. Recreating the changelog file. This could affect replication with replica’s consumers, in which case the consumers should be reinitialized.

Reason: This message can appear only when you restart a supplier. It indicates that the supplier was unable to write the changelog or did not flush out its replica update vector (RUV) at its last shutdown. The former case usually happens because of a disk-space problem, and the latter case because a server crashed or was ungracefully shut down.
Impact: The server is not be able to send the changes to a consumer if the consumer’s maxcsn value no longer exists in the server’s changelog.
Remedy: Check the disk space and for possible core files under the server’s logs directory. If this is a single-supplier replication, reinitialize the consumers. Otherwise, if the server later complains that it cannot locate change sequence numbers (CSN) for a consumer, verify if the consumer can receive the CSN from other suppliers. If not, reinitialize the consumer.

Too much time skew

Reason: The system clocks on the host machines are extremely out of sync.
Impact: Directory Server uses the system clock to generate a part of the CSN. In order to reflect the change sequence among multiple suppliers, suppliers would forward-adjust their local clocks based on the remote clocks of the other suppliers. Because the adjustment is limited to a certain amount, any difference that exceeds the permitted limit will cause the replication session to be aborted.
Remedy: Synchronize the system clocks on the Directory Server host machines, for example, by configuring the chronyd service.

agmt=<agreement_name> (<host_name>:<port>): Warning: Unable to send endReplication extended operation (error_message)

Reason: The consumer is not responding.
Impact: If the consumer recovers without being restarted, there is a chance that the replica on the consumer will be locked forever if it did not receive the release lock message from the supplier.
Remedy: Watch if the consumer can receive any new change from any of its suppliers, or start the replication monitor, and see if all the suppliers of this consumer warn that the replica is busy. If the replica appears to be locked forever and no supplier can get in, restart the consumer.

Changelog is getting too big

Reason: Either changelog purge is turned off, which is the default setting, or changelog purge is turned on, but some consumers are way behind the supplier.
Remedy: By default, changelog purge is turned off. To turn it on from the command line, enter:
```
dsconf <instance_name> replication set-changelog --max-age 1d --suffix "dc=example,dc=com"
```
```
# dsconf <instance_name> replication set-changelog --max-age 1d --suffix "dc=example,dc=com"
```
Copy to Clipboard Toggle word wrap
1d means 1 day. Other valid time units are s for seconds, m for minutes, h for hours, and w for weeks. A value of 0 turns off the purge.
With changelog purge turned on, a purge thread that wakes up every five minutes removes a change if its age is greater than the value you set and if it has been replayed to all the direct consumers of this supplier or hub.
If it appears that the changelog is not purged when the purge threshold is reached, check the maximum time lag from the replication monitor among all the consumers. Irrespective of what the purge threshold is, no change will be purged before it is replayed by all the consumers.

Chapter 20. Troubleshooting replication-related problems

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 20. Troubleshooting replication-related problems

20.1. Configuring Directory Server to log replication-related errorsCopy linkLink copied to clipboard!

20.2. Tracing origin of connections in Directory Server log files using sidCopy linkLink copied to clipboard!

20.3. Overview of replication-related errors, causes, and possible solutionsCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

20.1. Configuring Directory Server to log replication-related errors
Copy link

20.2. Tracing origin of connections in Directory Server log files using sid
Copy link

20.3. Overview of replication-related errors, causes, and possible solutions
Copy link