Issue promoting domain controllers since intra-forest migration of user

  • 230 Views
  • Last Post 26 November 2015
n3ilb3 posted this 24 November 2015

Hi All,

I've hit an issue recently that has me a bit stumped. (sorry for the really long post!!)

We're in the middle of an intra-forest migration project which is going fine, we're migrating batches of users on a nightly basis and haven't had any issues to date.

This past weekend however I was doing unrelated work to promote some new domain controllers as part of lifecycle work.  The promotion was going fine until replication started complaining about a specific user object not being consistent in the local copy of the database as which point replication of one read-only partition fails to complete.

So we're migrating users from domains child1, child2, child3 etc into parent.local.  The user object that shows up in the replication error was migrated out of child1.parent.local into parent.local around 30 days ago and went without a hitch.  The DC's being promoted are in parent.local, all DCs are GCs in the forest and the only partition in the forest that won't replicate in to the new DC is child1.parent.local.  When the event occurs it retries replication and after a while the KCC kicks in and creates additional replication links out to other DCs these succeed in replicating in all partitions except child1.parent.local.

I can't find any remnants of the user object in the child1 domain, I can see infrastructureUpdate object was created within child1's Infrastructure container at the point of migration (which I think is what its meant to do) and none of the established DCs have any replication issues.

On promotion, the object in question replicates into the new DC fine from its partners in parent.local and I can update the object on the new DC or an existing parent1 DC and the changes replicate fine.

I've attempted 3 promotions, all 3 showed the issue, the first I demoted and logged a call with MS, their initial suggestion was to move the object in question in parent.local to another OU.  So this past weekend I promoted another 2 DCs, the 1st hit the issue, I moved the object on another parent.local DC to another OU and the issue resolved itself, replication completed on the new DC and I moved the object back to its original OU and the DC appears fine, its acting as a GC.  On the second newly promoted DC (promoted 24hours later) again I hit the issue, I moved the object as before and unfortunately the replication issue continued.  I tried moving it on both on the newly promoted DC and other DCs in parent.local and although the moves were fine the same issue continued to report (though the DN of the object updating to reflect the move) and in the end I demoted out the server.

Has anyone come across this before?  Any pointers?

Thanks all for any suggestions in advance!
Neil

The event text is listed below:-

1/8/2015            11:10:37 AM      Error      newdc.parent.local            1084       Microsoft-Windows-ActiveDirectory_DomainService Replication          NT AUTHORITY\ANONYMOUS LOGON    Internal event: Active Directory Domain Services could not update the following object with changes received from the following source directory service. This is because an error occurred during the application of the changes to Active Directory Domain Services on the directory service.    Object: CN=user444,OU=OurUsers,DC=parent,DC=local  Object GUID: d4454402-8844-4d44-9550-060b14451699  Source directory service: 923d123a-4daa-4505-f66e-f28042bef29d._msdcs.parent.local    Synchronization of the directory service with the source directory service is blocked until this update problem is corrected.    This operation will be tried again at the next scheduled replication.   

User Action  Restart the local computer if this condition appears to be related to low system resources (for example, low physical or virtual memory).    Additional Data  Error value: 8443 The replication operation encountered a database inconsistency.

2108       Microsoft-Windows-ActiveDirectory_DomainService Replication          NT AUTHORITY\ANONYMOUS LOGON    This event contains REPAIR PROCEDURES for the 1084 event which has previously been logged. This message indicates a specific issue with the consistency of the Active Directory Domain Services database on this replication destination. A database error occurred while applying replicated changes to the following object. The database had unexpected contents, preventing the change from being made.   Object: CN=user444,OU=OurUsers,DC=parent,DC=local  Object GUID: d4454402-8844-4d44-9550-060b14451699  Source directory service: 923d123a-4daa-4505-f66e-f28042bef29d._msdcs.parent.local  

Order By: Standard | Newest | Votes
ZJORZ posted this 25 November 2015

Have you checked the object does not exist anymore on the GCs in the parent domain? Met vriendelijke groeten / Kind regards, Jorge de Almeida Pinto*: JorgeDeAlmeidaPinto@xxxxxxxxxxxxxxxx(: +31 (0)6 26.26.62.80 Description: Description: Description: Description: Think Green 

show

n3ilb3 posted this 26 November 2015

Hi,
Yes - doing a GC search against the child1 domain (base DN) on another GC (child2) doesn't find it, I've tried searching on GUID, SID and SamAccountName.  I've tried that against multiple GCs that the newly promoted DC was unable to replicate the child1 partition from at promotion.  I've also searched on the child1 dc's and not found it, but can find the new parent.local copy in its ro copy of the parent.local.  I was using ldifde with -x to search to include deleted objects.
My current thinking on this is that replicating in a readonly copy of child1 shouldn't see any update to the object in parent.local, therefore with the event text (attempt to update object with GUID) there must be something in child1 with a GUID that matches the object in parent.local.  The newly promoted DC sees the change coming in for that GUID and reports it doesn't match its local copy in the 1084 event?
Do you know of any way to see the actual change that is replicating in that would allow me to see which change is attempting to replicate in at the point the event 1084 is logged?
Thanks,
Neil

show

ZJORZ posted this 26 November 2015

How about doing the search against the parent GCs?



Met vriendelijke groet / Kind regards,


Jorge de Almeida Pinto



E-Mail: JorgeDeAlmeidaPinto@xxxxxxxxxxxxxxxx


Tel.: +31-(0)6-26.26.62.80



(+++Sent from my mobile device +++)


(Apologies for any typos)








show

n3ilb3 posted this 26 November 2015

Thanks - results below, hopefully I'm running the command correctly to cover the search you want.
ldifde -s parentDC3.local -f foundUser.ldf -x -d DC=child1,DC=parent,DC=local -t 3268 -r "(objectGUID=\02\75.....)" - No entries found
ldifde -s parentDC3.local -f foundUser.ldf -x -d DC=parent,DC=local -t 3268 -r "(objectGUID=\02\75.....)" - 1 entry found (matches the object in parent.local)
ldifde -s child2DC1.parent.local -f foundUser.ldf -x -d DC=child1,DC=parent,DC=local -t 3268 -r "(objectGUID=\02\75.....)" - No entries found
ldifde -s child1DC1.parent.local -f foundUser.ldf -x -d DC=child1,DC=parent,DC=local -t 3268 -r "(objectGUID=\02\75.....)" - No entries found
ldifde -s child1DC1.parent.local -f foundUser.ldf -x -d DC=parent,DC=local -t 3268 -r "(objectGUID=\02\75.....)" - 1 entry found (matches the object in parent.local)
Thanks,
Neil


show

idarryl posted this 26 November 2015

Neil,
Take my advice with caution as I've never needed to to this myself, but as you as you said "only partition in the forest that won't replicate in to the new DC is child1.parent.local", has you tried dropping that partition from the DC '923d123a-4daa-4505-f66e-f28042bef29d.msdcs.parent.local' using repadmin /rehost ?
My thought is that '923d123a-4daa-4505-f66e-f28042bef29d.
msdcs.parent.local' has a bad copy of the child1.parent.local partition, and the new DC is failing when copying from that partition from the DC.
~
Darryl

show

n3ilb3 posted this 26 November 2015

Hi Darryl,
Thanks, I've not tried that yet.  The thing is when the promotion wouldn't complete the KCC kept creating new links to try and replicate the partition in, I ended up with 28 replication links being created, none of which were successful (Last success @ never).  Would that suggest they all have a bad copy? 
During the promotion, is it an option to unhost the partition that won't replicate and then initiate a replication from one the DC's for that domain?  Should I run an integrity check of the DIT on a Child1 DC before doing that?
Thanks,
Neil


show

idarryl posted this 26 November 2015

Neil,
I'm one of the more junior members on this list, but my thoughts are; if it can't replicate that child partition from any replication partner in the parent domain, then that tells you that all partitions are at least consistent (if only consistently incorrect), so dropping the partition from the existing DC will not help.  You could run an integrity check on the DIT , but I would first look for lingering objects on the child domain, event ID 1388 or 1988.  Also, have you done a repadmin /replsummary to ensure that's clean?
~
Darryl

show

n3ilb3 posted this 26 November 2015

Hi Darryl,
I appreciate any troubleshooting suggestions :)
replsummary comes back as clean.  The only time I'm hitting the issue is with promotion of new DCs, otherwise replication is working 100%.
I've found one lingering object, details below, I ran repadmin against multiple DCs using different source DC GUIDs and it consistently came back with the same object.
Active Directory Domain Services has identified the following lingering object on the local domain controller in advisory mode. The object had been deleted and garbage collected on the following source domain controller yet still exists on the local domain controller.  Object: CN=QTCounter\0ACNF:8db5cffb-28467-4f45-b2e4-1eeaa3185734,CN=VolumeTable,CN=FileLinks,CN=System,DC=child1,DC=parent,DC=local Object GUID: 8db5cffb-28467-4f45-b2e4-1eeaa3185734
Source domain controller: ba816176-fde1-4c1e-8e21-d45230017f8c.
msdcs.parent.local
Exporting the object using ldifde I can see it was created in 2002 and last modified in 2014 and I can't see any reference to the object that is reported as being inconsistent in parent.local during the promotion and they don't have the same GUID.
Thanks,
Neil


show

Close