While being busy building our new Load Balanced hosting solution, we came across an issue where DFS stopped syncing the folders between servers.
It was logging the following error:
The DFS Replication service stopped replication on replicated folder FOLDERNAME at local path FOLDERPATH due to Error ID: 9098 (A tombstoned content set deletion has been scheduled). Event ID: 4004
After some research (Googling) the same post kept being suggested again and again on Microsoft TechCenter:
- Look in Event Viewer and identify all the replication groups/folders that are giving the tombstone error. Once you have them identified, go into DFS Management GUI and completely delete the replication group associated with that folder. You do not need to delete the DFS Namespace for that folder, just the replication functionality of that namespace folder. If you have other replication groups in your DFS-R that do not get the 9098 errors, then you do not have to do this for these folders.
- Stop DFSR services (you may need to kill the service using the taskkill command if it hangs when it tries to stop).
- Give yourself permissions to the hidden System Volume Information folder. If you're account is under the domain admins group, you can simply add the security group.
- This folder exists on all servers that is a member of the replication group. In my situation, 2 of the 3 servers didn't show this folder as existing even when I enabled to see hidden folders. If this happens to you, the server is lying to you that it's not there. It is there. Don't listen to it. My suggestion is to download and use the 7-zip file manager. It will see the folder and will help you set the permissions to it as well as delete files that are longer than 256 characters, which is an issue if you do the next step from the command line).
Note, after you set the permissions, it might tell you that you still don't have access to that folder. Just close out of 7-zip and open it back up. It should let you in that folder as well as its subfolders.
- Once you have access to that folder, go ahead and delete the DFSR folder that resides underneath it. You will want to do this on all servers that has the DFSR role installed and is a member to any replication groups. You can use the command line command "rmdir", but it fails to delete files/folders that are longer than 256 characters. This is why the 7-zip file manager is a better option to delete the DFSR folder under System Volume Information. However, there are instances where 7-zip is unable to delete a file or folder. If you run in that scenario, use the rmdir command in an elevated command prompt. Essentially, a combination of these two will eventually clear out everything you need to clear out.
- Turn DFSR services back on. This will begin the process of recreating the DFSR hash and virtual tree that you had just deleted.
- Recreate the replication group that you want.
- On the replication groups that you did not delete, you may get the warning: "The DFS Replication service initialized the replicated folder at local path <path> and is waiting to perform initial replication. The replicated folder will remain in this state until it has received replicated data, directly or indirectly, from the designated primary member."
- If you do, what you need to do is run the command line to set one of the DFSR servers as the primary server for that replication group, and then once set - this is important - you will have to go in the DFS Management GUI, click on the replication group with the associated warning, select the connections tab, and then right click the the sending member that you just made as primary and choose "Replicate now..." This will initialize the replication and you will have to do this just that once for it to replicate here on out. You will need to do choose the "Replicate now..." option for each receiving member that the sending member/primary member server is attached to in that replication group.
- Wait about 5-10 minutes and run the dfsrdiag backlog command on each replicationgroup and see if a backlog for replication/sync gets created. Run this command each 5 to 10 minutes to see if the backlog file count value decreases. If it does, it's syncing/replicating.
The above instructions didn't solve the issue on our servers however, after a bit more research (Googling) I found the following post which did work:
Open a command prompt on both Member servers
Type in to each server:
NET STOP DFSR
This will stop the replication service from trying to replicate.
Next, go into Explorer on both servers and show hidden files.
Go into the Disks that contain the Replicated folders (i.e. W:\ Drive)
Right click on “System Volume Information” and select “Properties” from the context menu.
Go to the “Security” tab and click “Edit…”
We need to give ourselves access to this folder so click on “Add…”
Type in your administrative user name or simply use “Domain Admins” if you choose.
Tick the “Full Control” -> “Allow” check box and click “OK”
Click “OK” again to return to Explorer.
Next, we need to return to our CMD window and type the following:
rmdir "W:\System Volume Information\DFSR" /s
This will remove the DFS Replication database information for this drive. Doing this will force DFS to re-generate a new set.
Note: If this command reports any errors about filenames being too long, you may need to delete files manually using a filemanager that is able to delete file paths longer than 255 chars. I used 7-Zip’s File Manager which is handy for doing this. In 7-Zip, browse to where the folder is stored and hold SHIFT whilst clicking Delete. That folder should now delete ok.
Once these folders have been removed from both Member servers, we can go ahead and start the DFSR services again. In our CMD prompt, type:
NET START DFSR
Hopefully this post can help you solve your DFS sync issues!