Clustering and DFS

I’m currently involved in an NDS to AD migration for around 2,000 users and 2 Terabytes or so of data. We are replacing ZEN and eDirectory with SCCM and AD, as well as moving files, folder, permissions etc with Quest software.

One of the project goals is to make the new file/print solution highly available, so we’ve been looking at DFS and Failover Clustering.

To me each of these technologies has advantages and disadvantages.

Wolfpack has come along way since NT 4.0, and the good old fashioned single copy clustered file server is tried and tested, easy to set up, easy to administer, easy to understand, but still has that single point of failure, the fact that there is a single copy of the live data. There is no site resilience.

DFS provides for multiple copies, but has replication to contend with, and for anybody familiar with troubleshooting DFS replication it can get messy. The lack of conflict resolution and the fact that replication has a latency which is affected by the volume of changes to the data mean you can easily end up with lost data (last copy changed wins) or missing data (it’s on the other replica and this one hasn’t caught up yet.)

I’ve worked with DFS for a while. I like it where we have a single read-write replica with referrals enabled, and one or more read-only replicas with referrals disabled.  So we have a master-slave replication topology, and a little bit of manual work involved to “fail-over” should the read-write replica become unavailable.

The small amount of manual work, making the hidden replica referral enabled, and switching the replica to be read-write isn’t a killer for a small organisation, but it lacks that highly available feel. Hence Failover Clustering is still on the table.

The theory

This raised an interesting proposal early on when somebody suggested we have one single copy file cluster using Failover Clustering on Server 2008 R2, with a remote DFS-R replica in another location.

It looks like a really nice idea, what we want is a standard Microsoft Failover Cluster File Server using SAN storage, with a remote read-only referral disabled DFS replica in a DR site just in case.

The Active Passive nodes of the cluster provide the application and server availability, the SAN is redundant enough in terms of storage (plus we have backups) and the DR site provides that much needed site availability.

 

 

I’m sure we’re not the first to come up with this, try to get it to work, and in fact it initially looked hopeful based on this article that talks about using DFS-R with Failover Clustering.

http://blogs.technet.com/b/filecab/archive/2009/06/29/deploying-dfs-replication-on-a-windows-failover-cluster-part-iii.aspx

One slight but important difference is the use of the DFS-R. Here it’s used here to bring branch office content into the Failover Cluster, so that it can be backed up centrally.

The headache

My headache is getting my clients to talk to the replica.

In the article and the diagram above, there is a replicated copy of all of the branch office data in the central HQ location, which is great for backups.  But unless I have a way to fairly seamlessly redirect people to this replica it’s not much use to them on a day by day basis.

Take my goal. I have a file cluster and a remote DR replica. If either of the nodes in the cluster goes down I’m covered, but if the entire Site A is offline I need two things.

A copy of all the data in Site B, within reason.

2,000 client workstations to be updated to know that they now have to use the replica in Site B.

On the face of it this looks like a perfect challenge for DFS. Replication to copy the data, and namespace to find it!

DFS isn’t all about replication. You also need to be able to see and get at the data, so it’s partly about namespaces. I probably show my age when I talk about DFS, now everybody talks about DFS-N and DFS-R, when I still think of these as two components of the same technology.

Crash course, you can have a Standalone namespace which exists on a single server or a Domain namespace which exists on one or more servers and in Active Directory Domain Services.

In fact, the screen you get when you create the namespace has a clue to the Failover Cluster conversation, but I guess unless you’re thinking in terms of clustering you could easily miss it.

 

Here’s where some of the confusion comes in.

The following article has this to say about namespaces.

http://technet.microsoft.com/en-us/library/cc753448.aspx

Try to get your head around this.

Server Hosting Domain-Based Namespaces. The namespace cannot be a clustered resource in a failover cluster. However, you can locate the namespace on a server that also functions as a node in a failover cluster if you configure the namespace to use only local resources on that server.”

Stand-Alone Namespaces can be hosted by a failover cluster to increase the availability of the namespace”

It get’s worse. This conversation implies you can point a Domain Namespace at a clustered file resource as a target. The namespace is highly available in AD, and now the files are highly available on the cluster.

http://www.winvistatips.com/re-domain-based-dfs-namespace-clustered-not-t777401.html

” Best practice is to place the DFS Namespace Root in AD and configure the folders as clustered file shares for high availability.”

further down it says

“The original e-mail said “DFS Namespace” which equates to DFS-N….that is all you can cluster. You cannot cluster a replicated root using DFS-R

I’ve seen this statement substantiated elsewhere, and on face value it seems to imply that once you bring clustering into the equation there are limitations on what you can do with both DFS-N and DFS-R!

Domain Namespaces exist in AD and have multiple targets for redundancy. These target’s or links or leaves can be cluster file shares. Result! But, when you use a file cluster as a leaf in a Domain Namespace, you cannot cluster the replicated root… but what does that mean ?

So let’s look at Standalone Namespaces.

We can have a File Server resource in a failover cluster act as a namespace server in a standalone namespace, but now, if the Failover Cluster is lost say in the event of the site failure we’re trying to mitigate against, the namespace if it existed at all was standalone and attached to the cluster, so there is no way to redirect the DFS clients to the replica we now have in the DR site.

At this point there was so much confusion that we took the option off the table, but my curiosity won’t put it down.

So let’s put it to the test. Let’s build it, and break it, and see what actually works, and what doesn’t, and try clear up what Microsoft mean when they describe what you can or can’t do.

I threw together a two node Server 2008 R2 Failover Cluster using the Microsoft iSCSI target 3.3 from here.

http://www.microsoft.com/en-us/download/details.aspx?id=19867

I used some of the very useful steps from these two excellent articles to get my Failover Cluster running on Hyper-V.

http://blogs.technet.com/b/pfe-ireland/archive/2008/05/16/how-to-create-a-windows-server-2008-cluster-within-hyper-v-using-simulated-iscsi-storage.aspx

and

http://mikemstech.blogspot.ch/2011/10/no-san-failover-clustering-with-single.html

I added a third server in my PoC domain, and installed DFS-N and FDFS-R on all three servers, the two cluster nodes, and the DR server.

What I’m hoping to prove here is that the many various sources of information are misleading, or just not written with each other in mind, that the clustering implications of DFS-N and the replication implications with clusters just haven’t been exposed in the way I’m trying to use them.

Remember, what I want is a remote copy, and some way to redirect to it, so Domain namespace with DFS-R.

Here goes nothing!

I create a cluster file resource, called “POCFPCLFS”. (Proof of Concept, File and Print CLuster, File Server) Yeah , I know, it’s a mouthful and a share called “Data”

I then go to my DR machine in Site B and try to create a Domain Namespace using the Cluster File Server as the Namespace server. First question answered, I can’t make it work. Microsoft don’t mean please don’t, or it’s not supported, they mean you just can’t do it full stop.

Not one to give up easily, I try it the other way around, I create the Domain Namespace on the DR machine, and try to sneakily add the cluster as an additional name server.

No luck, the error is vague, but I’m assuming without doing too much digging that I know why it didn’t work.

OK, so let’s not panic. I can’t create a Domain Namespace using a clustered resource, but this is what is documented and all I’ve done so far is prove it and understand what happens if you try. I’m not stuck yet tho. I create the Domain Namespace on the DR machine.

In DFS I create a folder target. I then add a second folder target, using the “Data” share on the Cluster.

And guess what, I get prompted for replication to make the content highly available.

Why not!

I create a Replication Group called “DR-SiteB and add the File Cluster as the primary member. For now I skip the replication topology.

I then create a one way connection from my file cluster to my DR server

 

and then tweak the settings so that the DR replica is referral disabled, and read only, while the file cluster is referral enabled and read write.

And that’s basically it.

In the end it’s very simple, once you get thru the complexity what what each piece of the puzzle can and cannot do in certain situations.

All that’s left is to test.

I make both replica’s read-write, and referral enabled, dump in some content and go for coffee while the content replicates.

Then I kill the cluster nodes.

Result, my content is still available to my clients!!

A couple of things to note, firstly Server 2008 R2 Failover Cluster is clever enough to recognise that the share is being replicated using DFS-R

However it isn’t a namespace server, so if the DR server in site B sit’s down my namespace is badly affected. I need a Domain Namespace server in the primary site, for example, each of the cluster nodes would work.

No surprise really, because that’s what Microsoft say. “However, you can locate the namespace on a server that also functions as a node in a failover cluster if you configure the namespace to use only local resources on that server. ”

Secondly the server hosting the target is visible to your clients on the DFS tab of the share. Note in the screenshot below you can see the domain namespace and the target you are being referred to for content, you don’t actually see which namespace server has referred you to that particular target.

In my screeenshot, I’ve killed the DR server, but the namespace is still alive and well on both cluster nodes, and the content is still safely clustered.

I did have to mess with the client to get it to see that the original folder target was no longer online, and the fact that both targets and all three namespace servers are actually in one site in my lab didn’t help, but this is just how DFS works, it’s not any different just because the cluster is involved, but in fact this is the bottom line, for my purposes, it’s not any different just because the cluster is involved!