Oct 29 2018 09:16 AM
Additional information from MS Support.
Helpful for small DFS-R instances, but does not apply to large file sets:
Elaborate on ConflictAndDeleted and Pre-Existing:
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc773238(v=ws.... - DFS Replication: Frequently Asked Questions (FAQ).
What happens if the primary member suffers a database loss during initial replication?
During initial replication, the primary member's files will always take precedence in the conflict resolution that occurs if the receiving members have different versions of files on the primary member. The primary member designation is stored in Active Directory Domain Services, and the designation is cleared after the primary member is ready to replicate, but before all members of the replication group replicate.
What happens when two users simultaneously update the same file on different servers?
When DFS Replication detects a conflict, it uses the version of the file that was saved last. It moves the other file into the DfsrPrivate\ConflictandDeleted folder (under the local path of the replicated folder on the computer that resolved the conflict). It remains there until Conflict and Deleted folder cleanup, which occurs when the Conflict and Deleted folder exceeds the configured size or DFS Replication encounters an Out of disk space error. The Conflict and Deleted folder is not replicated, and this method of conflict resolution avoids the problem of morphed directories that was possible in FRS.
When a conflict occurs, DFS Replication logs an informational event to the DFS Replication event log. This event does not require user action for the following reasons:
- It is not visible to users (it is visible only to server administrators).
- DFS Replication treats the Conflict and Deleted folder as a cache. When a quota threshold is reached, it cleans out some of those files. There is no guarantee that conflicting files will be saved.
- The conflict could reside on a server different from the origin of the conflict.
https://blogs.technet.microsoft.com/askds/2010/01/05/understanding-dfsr-conflict-algorithms-and-doin... - Understanding DFSR conflict algorithms (and doing something about conflicts).
Troubleshooting steps for events 4304 & 5002:
Troubleshooting DFSR Event 4304:
https://blogs.technet.microsoft.com/askds/2007/10/05/top-10-common-causes-of-slow-replication-with-d... - Top 10 Common Causes of Slow Replication with DFSR.
Many applications can create a large number of spurious sharing violations, because they create temporary files that shouldn’t be replicated. If they have a predictable extension, you can prevent DFSR from trying to replicate them by setting and exception in DFSMGMT.MSC. The default file filter excludes file extensions ~*, *.bak, and *.tmp, so for example the Microsoft Office temporary files (~*) are excluded by default.
There two kinds of DFSR events for sharing violations: Event ID 4302 and Event ID 4304. The DFSR Diagnostics combines both kinds of events, and reports them only as "Event ID 4302.". The following information explains more about these two kinds of events:
Event ID 4302: A local sharing violation occurs when the service cannot receive an updated file because the local file is being used. This occurs on the "receive" side of the file change. The file is already replicated. However, it cannot be moved from the installing directory to the final destination.
Event ID 4304: The service cannot stage a file for replication because of a sharing violation. This occurs on the "send" side of the file change. DFSR wants to stage or copy the file for replication. However, an exclusive lock prevents this. (In our case)
https://blogs.technet.microsoft.com/filecab/2006/05/15/troubleshooting-erroneous-sharing-violations-... - Troubleshooting erroneous sharing violations in the DFS Replication health report.
https://blogs.technet.microsoft.com/askds/2009/02/20/understanding-the-lack-of-distributed-file-lock... - Understanding (the Lack of) Distributed File Locking in DFSR.
Troubleshooting DFSR Event 5002:
Event ID 5002 is a very common DFSR warning event that is logged when connection failures occur. There are different root causes of this event, and in each case the event must be evaluated individually. There are some common root causes and resolutions listed below which should be considered only after the error codes within the events are understood.
Common errors returned with Event ID 5002
This section lists the error portion of the "Additional Information" section for the Event ID 5014 with known solutions. The majority of the events fall into two main categories: Remote Procedure Call (RPC) failures and errors returned by the service itself.
Remote Procedure Call (RPC) errors
- Error: 1723 (The RPC server is too busy to complete this operation)
- Error: 1726 (The remote procedure call failed)
- Error: 1727 (The remote procedure call failed and did not execute)
- Error: 1753 (There are no more endpoints available from the endpoint mapper.) (In our case)
Common Solutions for RPC errors:
- Make sure all DFSR servers are patched with the latest DFSR releases.
Note: When troubleshooting DFSR always confirm all servers are up to date with latest DFSR hotfixes -
List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2003 and in Windows Server 2003 R2: http://support.microsoft.com/kb/958802
List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2008 and in Windows Server 2008 R2: http://support.microsoft.com/kb/968429
Updates to DFSR are released as needed, and all the servers using DFSR should be maintained as part of regular patching and maintenance schedules.
- Disable Task Offloading on all members of the Replication Group - http://technet.microsoft.com/en-us/library/cc959732.aspx
- Windows 2003 R2 (Reboot necessary) - http://support.microsoft.com/default.aspx?scid=kb;EN-US;904946
- Windows 2008 and 2008 R2 (No reboot necessary) –
- Run this command from an elevated command prompt:
netsh int ip set global taskoffload=disabled
- Disable and then re-enable the network interface card
iii. To confirm the command completed successfully, run:
netsh int ip show offload
- Check for Wan accelerators. Exclude DFSR traffic either by IP address or UUID (897e2e5f-93f3-4376-9c9c-fd2277495c27 Frs2 Service)
- Check firewall rules. Make sure DFSR traffic is not being blocked
- Check max MTU size on your network and adjust accordingly so that servers have a common max MTU size. (Link KB)
- Update Network Card drivers to latest versions.
Service Error codes
These errors are more commonly service related. They can be caused by AD replication latency and RPC issues as well
Error: 9026 (The connection is invalid)
Error: 9033 (The request was cancelled by a shutdown)
Error: 9027 (A failure was reported by the remote partner)
Describe the data difference:
There could be multiple reasons for the data inconsistency as mentioned above (ConflictAndDeleted, Pre-Existing, Sharing Violations, etc. and including below:
https://social.technet.microsoft.com/wiki/contents/articles/406.dfsr-does-not-replicate-temporary-fi... - DFSR Does Not Replicate Temporary Files.
https://blogs.technet.microsoft.com/askds/2007/09/04/wheres-my-file-root-cause-analysis-of-frs-and-d... - Where’s my file? Root cause analysis of FRS and DFSR data deletion.
Also, as we’re using DFSN (Namespace) in conjunction with DFSR (Replication), we need to consider below articles as well:
https://blogs.technet.microsoft.com/askds/2012/07/24/common-dfsn-configuration-mistakes-and-oversigh... - Common DFSN Configuration Mistakes and Oversights.