Backing up Windows computers with dirvish
We use dirvish to back up our Linux servers and workstations. Until now, it has not been possible to back up Windows computers. That changes with an open source program we wrote.
For many years we have used dirvish to back up our Linux servers in an efficient way. dirvish presents what appear to be full snapshot backups, but they are incremental in both time to create and disk space to store them. dirvish relies heavily on rsync for efficient transfers and hard links for efficient storage. The backup server connects to the machine to be backed up (the target) over ssh for authentication and privacy, then runs rsync on the target to perform the incremental backup.
Problems with Windows and dirvish
It has always been problematic to back up Windows computers with dirvish. There are several problems:
- Windows has no native ssh server.
- Windows does not come with an rsync program.
- Unlike Linux, Windows does not easily allow you to back up files that are open by another process.
There are commercial ssh servers available for Windows. An excellent one that I've used is VShell, by Van Dyke Software. There are no doubt others. However, these are expensive if you want to install them on every target computer you wish to back up. VShell, for example, is $350 with 3 years of updates.
Another option is cygwin. cygwin is a free and open source product that provides many Unix-like utilities, including an ssh server. However, until recently the cygwin ssh server has had a well-known problem where it hangs when using rsync. Fortunately this problem has been fixed. cygwin version 1.7.11-1 does not hang like older versions did.
So now we have a free ssh server we can use.
Once we're already using cygwin's ssh server, it's an obvious choice to use cygwin's rsync program. No problems here.
Backing up open files
The final problem we're left with is what to do with open files on the Windows target computers. For a long time I tried to skip all open files by explicitly listing them in the dirvish configuration file. But some files are just never closed: file accessed by services such as database and long running programs such as email clients would just never get backed up. And there's just no way to list every file that might be opened. What if I'm editing a Word file during a scheduled backup? And by default, dirvish would consider one of these open files as a fatal error and would mark the entire backup as unusable. While you can get around this by editing the source code to dirvish, this is not a very elegant procedure. And you're still left with the fact that these open files are never backed up.
An obvious solution to this is to use the Windows Volume Snapshot Service, VSS. VSS allows you to take a read-only snapshot of an entire drive. Once you have that snapshot, it can be made available under another drive letter. And the best part is that every single file in the snapshot can be read. You'll never get an "in use" error when trying to read the files.
So now we have an internally consistent set of readable files to back up.
Putting it all together
We have all of the pieces we need to use dirvish to back up a Windows target computer. But how to put it together?
It would seem like we could use dirvish's pre-client and post-client hooks to create and tear down the VSS snapshot. These are commands that dirvish will run before and after it runs rsync on the target. Unfortunately that won't work, because while the pre-client hook can create the snapshot, it will be inaccessible once the pre-client hook ends and rsync is executed.
So what we need is a program that looks works just like rsync, but creates a VSS snapshot during the duration of the rsync run.
I thought of modifying the source to rsync, but that seems like an ongoing maintenance problem forever.
So what I did was write a program to bring together all of the parts: tb-rsync-vss. This is an open source program, licensed under the Apache Software License, Version 2.0. tb-rsync-vss is a native Windows executable that creates a VSS snapshot, maps it to a drive letter (which is an cygwin rsync requirement), and then calls the real rsync program with modified parameters to actually perform the backup. When rsync is complete, tb-rsync-vss cleans up and exits.
As far as the dirvish server is concerned, it's just running a custom version of rsync. As far as rsync knows, it's just running against a new drive (maybe drive "x:" instead of drive "c:"). The specifics of the configuration are covered in the README file.
On the tb-rsync-vss bitbucket page I've provided the source code, a Visual Studio 2010 project file, and pre-built .msi files for the 32- and 64-bit versions of tb-rsync-vss. True Blade is providing these back to the dirvish community as a thank-you for the many years we've used and benefited from dirvish and so many other open source products.