Chapter 21. Solving common replication problems
Multi-supplier replication uses an eventually-consistency replication model. This means that the same entries can be changed on different servers. When replication occurs between these two servers, Directory Server needs to resolve the conflicting changes. Mostly, resolution occurs automatically, based on the timestamp associated with the change on each server. The most recent change has priority. However, there are some cases where conflicts require manual intervention in order to reach a resolution.
21.1. Identifying and solving naming conflicts Copy linkLink copied to clipboard!
When several supplier servers receive a request to create an entry with the same distinguished name (DN), each server creates the entry with this DN and a different entry unique identifier (entry ID). The entry ID is stored in the nsuniqueid
operational attribute.
For example, Server A
and Server B
receive a request to create uid=user_name,ou=people,dc=example,dc=com
user entry. As a result, each server has its own entry:
On Server A, the entry has:
-
uid=user_name,ou=people,dc=example,dc=com
-
nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b
-
On Server B, the entry has:
-
uid=user_name,ou=people,dc=example,dc=com
-
nsuniqueid=643a461e-b61311e1-b23be826-4afeed5f
-
During replication, Server A
replicates newly created entry uid=user_name,ou=people,dc=example,dc=com
to Server B
, and Server B
replicates newly created entry to Server A
, and a naming conflict occurs on each server. By comparing change sequence numbers (CSN), each server determines which entry was created earlier. For example, the entry on Server B
was created earlier.
The automatic conflict resolution procedure changes the last entry created (the entry on Server A
) the following way:
-
Adds the
nsuniqueid
value to the non-unique DN. -
Adds the
nsds5replconflict
attribute with the description which operation caused the conflict. -
Adds the
ldapsubentry
objectclass.
Now the following entries exist on both servers:
The valid entry with:
-
uid=user_name,ou=people,dc=example,dc=com
-
nsuniqueid=643a461e-b61311e1-b23be826-4afeed5f
-
The conflict entry with:
-
nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=people,dc=example,dc=com
-
nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b
-
To solve the naming conflict manually, use the following procedure on each server.
Procedure
List the conflict entries:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If conflict entries exist, decide how to proceed:
To keep only the valid entry (
uid=user_name,ou=people,dc=example,dc=com
) and delete the conflict entry, enter:dsconf <instance_name> repl-conflict delete nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=People,dc=example,dc=com
# dsconf <instance_name> repl-conflict delete nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=People,dc=example,dc=com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To keep only the conflict entry (
nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=People,dc=example,dc=com
) and delete the valid entry, enter:dsconf <instance_name> repl-conflict swap nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=People,dc=example,dc=com
# dsconf <instance_name> repl-conflict swap nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=People,dc=example,dc=com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To keep both entries, specify a new relative distinguished name (RDN) to rename the conflict entry:
dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict convert --new-rdn=uid=user_name_NEW nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=people,dc=example,dc=com
# dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict convert --new-rdn=uid=user_name_NEW nsuniqueid=a7f1758b-512211ec-b115e2e9-7dc2d46b+uid=user_name,ou=people,dc=example,dc=com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command renames the conflict entry to
uid=user_name_NEW,ou=people,dc=example,dc=com
.
Directory Server replicates LDAP operations performed on a conflict entry. Usually replicated operations target the entry by using the nsuniqueid
of the original operation entry rather than by using the operation dn
. However, in cases with conflict entries, the behavior might differ.
21.2. Identifying and solving orphan entry conflicts Copy linkLink copied to clipboard!
When Directory Server replicates a delete operation and the consumer server finds that the entry to be deleted has child entries, the conflict resolution procedure creates a glue entry to avoid having orphaned entries in the directory.
In the same way, when Directory Server replicates an add operation and the consumer server cannot find the parent entry, the conflict resolution procedure creates a glue entry for the parent.
Glue entries are temporary entries that include the object classes glue
and extensibleObject
. Glue entries can be created in several ways:
If the conflict resolution procedure finds a deleted entry with a matching unique identifier, the glue entry has the same attributes as the deleted entry, but with the added
glue
object class and thensds5ReplConflict
attribute.In such cases, either modify the glue entry to remove the
glue
object class and thensds5ReplConflict
attribute to keep the entry as a normal entry or delete the glue entry and its child entries.-
The server creates an entry with the
glue
andextensibleObject
object classes.
Procedure
List the orphan entry conflicts:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If orphan entry conflicts exist, decide how to proceed:
To delete a glue entry and its child entries, enter:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To convert a glue entry into a regular entry, enter:
dsconf <instance_name> repl-conflict convert-glue "ou=parent,dc=example,dc=com"
# dsconf <instance_name> repl-conflict convert-glue "ou=parent,dc=example,dc=com"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
21.3. Identifying and solving errors about obsolete or missing suppliers Copy linkLink copied to clipboard!
Directory Server stores information about the replication topology, such as all suppliers that send updates to other replicas, in a set of metadata called replica update vector (RUV). An RUV contains information about the supplier, such as its ID and URL, the last change state number (CSN) on the local server, and the CSN of the first change. Both suppliers and consumers store RUV information, and they use it to control replication updates.
When you remove a supplier from the replication topology, information about it can remain in another replica’s RUV. You can use a cleanallruv
task to remove the RUV entry form all suppliers in the topology.
Prerequisites
- Replication is enabled.
Procedure
Monitor the
/var/log/dirsrv/slapd-<instance_name>/errors
log file and search for entries similar to the following:[22/Jan/2021:17:16:01 -0500] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 8 ldap://server2.example.com:389} 4aac3e59000000080000 4c6f2a02000000080000] which is present in RUV [database RUV] ... [22/Jan/2021:17:16:01 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: for replica dc=example,dc=com there were some differences between the changelog max RUV and the database RUV. If there are obsolete elements in the database RUV, you should remove them using the CLEANALLRUV task. If they are not obsolete, you should check their status to see why there are no changes from those servers in the changelog.
[22/Jan/2021:17:16:01 -0500] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 8 ldap://server2.example.com:389} 4aac3e59000000080000 4c6f2a02000000080000] which is present in RUV [database RUV] ... [22/Jan/2021:17:16:01 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: for replica dc=example,dc=com there were some differences between the changelog max RUV and the database RUV. If there are obsolete elements in the database RUV, you should remove them using the CLEANALLRUV task. If they are not obsolete, you should check their status to see why there are no changes from those servers in the changelog.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this case, the replica ID
8
causes this error.Display all RUV records and replica IDs, both valid and invalid:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Note the list of returned replica IDs:
1
,2
, and8
.Run cleanup tasks for the replica IDs
8
.dsconf <server1_instance_name> repl-tasks cleanallruv --suffix="dc=example,dc=com" --replica-id=8
# dsconf <server1_instance_name> repl-tasks cleanallruv --suffix="dc=example,dc=com" --replica-id=8
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Note that Directory Server replicates RUV cleanup tasks. Therefore, you need to start the tasks on only one supplier.
If one of the replicas can not be joined, for example if it is down, you can use the
--force-cleaning
option to achieve an immediate clean up of the RUV.
Verification
Display the RUV records and replica IDs:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The command no longer returns RUV entries for the replica IDs
8
.
21.4. Stopping cleanallruv task on a supplier Copy linkLink copied to clipboard!
For performance or maintenance purposes, it is possible to stop the cleanallruv
task if the task runs for a long time. You can use the dsconf
utility to stop the task.
Prerequisites
- Replication is enabled.
Procedure
Display all
cleanallruv
tasks on a supplier:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The example shows that the
cleanallruv
task cannot be completed because the replica became unresponsive. In some cases, it can negatively impact the server performance.Stop the
cleanallruv
task:dsconf <instance_name> repl-tasks abort-cleanallruv --suffix "dc=example,dc=com" --replica-id 12
# dsconf <instance_name> repl-tasks abort-cleanallruv --suffix "dc=example,dc=com" --replica-id 12
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Additionally, you can use the
--certify
option to force Directory Server to stop thecleanallruv
task on all replicas.
Verification
Display all
cleanallruv
tasks on the supplier:Copy to Clipboard Copied! Toggle word wrap Toggle overflow