Saturday, August 1, 2009

Backup server file changes with Rsync

.FLYINGHEAD WEBMASTER UPDATE
.TITLE Backup server file changes with Rsync
.AUTHOR David Gewirtz
.SUMMARY What if you want to set up Rsync under Windows, you’re not a Linux command-line wiz, and you don’t want to rebuild everything on your server? That’s what this article’s all about.
.OTHER
Traditional file synchronization programs work by matching the contents of two directories. When used as a backup, the contents of the source directory are mirrored to the destination directory. When used in a synchronization mode, the two directories are compared and files not in one are moved to the other, and vice versa.

The problem with traditional file synchronization programs is they typically work by moving or copying entire files. This is fine on a local network, especially one working at gigabit speeds. But if you’re moving files from a remote server to a local machine, you might not want to move entire files, especially if the files are particularly large.

Here at ZATZ, for example, we have a number of constantly updated server files that are in excess of 20GB. Some of these are database files we can synchronize using MySQL on a record-level, but others are wildly large data files that simply can’t be synchronized based on their internal structure.

And yet, because our server farm is located in Illinois and our development lab is located here in Central Florida, we want to make sure we have up-to-date complete local copies of everything running on the server. There’s just no way we can download multiple 20GB+ files to our local machines on a daily basis. We’d clog the pipes and it’d still take days to download a single file.

The solution, instead, is to only download what’s changed in the files and reconstruct the file at the local side by merging in the changes. We’re not alone in this requirement; in fact a particularly valuable Linux utility exists to do just this.

.H1 Rsync
Rsync was developed back in 1996 by Australians Andrew Tridgell and Paul Mackerras. Tidgell used Rsync as the subject of his Ph.D. thesis (thanks, Wikipedia!). As Tridgell described it, it can take a long time to get data transmitted from the rest of the world to Australia and he set out to find a way to make it go faster.

Rsync is the result of his quest, and it’s long been part of most Linux distros. But what if you want to set up Rsync under Windows, you’re not a Linux command-line wiz, and you don’t want to rebuild everything on your server? That’s what this article’s all about.

.H1 Backing up servers
We’re going to go through a scenario you can use to backup your server. As it turns out, there’s a really slick little utility that’ll get you most of the way.

Called [[http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp|DeltaCopy]], this free utility is really a front-end to the Cygwin version of Rsync. But what makes DeltaCopy nice is that it comes with just rsync.exe and the necessary Cygwin DLLs, and that means you don’t need to download the massive Cygwin install.

The first thing to do is install DeltaCopy on your server. Just double-click the installer and tell DeltaCopy to install Rsync as a service, as shown in Figure A.

.FIGPAIR A Install Rsync as a service.

It turns out there’s a slight problem doing this. DeltaCopy sets itself up as a service under the currently logged in user, instead of as the Local System, so you’ll have to open up Services and change that to make it work right.

Next, you’ll need to specify a virtual backup directory, as shown in Figure B. This is a bit annoying, but as you can see, you’re basically mapping a name (like "Backups") to a directory path (like "C:\Backups").

.FIGPAIR B Map your backup directory.

And that’s it for the server side. Make sure the service is running and you’re basically done.

.H1 Setting up the client
Next, you’re going to want to do the same install process on the machine that’s serving as the destination for the backups. Only this time, don’t install the service. What you’re doing here is merely using the DeltaCopy installer as a quick and dirty way to install rsync.exe on your machine.

DeltaCopy has a slight weirdness about it, in that it likes to think of the server as the destination machine and the client as the source machine. We didn’t like that, and decided to avoid the whole mess by simply writing our own rsync batch files. This is what you should do as well.

Do do this, create a folder that you’ll use to keep your batch files and also create a destination directory for your backups. We used "backups" because we’re creative that way. Next, we’ll create the two commands you’ll need to use.

.H2 Getting the permissions right
Because Rsync is really a Linux program, it likes to use Linux permissions. These don’t map correctly (or at least they didn’t for us) when using Rsync on Windows. When we first ran Rsync, we found all the files and the directories weren’t accessible until we took ownership of them. This can be a pain, and if you do a backup to a local machine and then fling those backed up files around the local network, as we do for further backup and for development, having ownership-locked files is annoying.

Fortunately, there’s a one-line batch command you can use to fix this. Took us almost half a day to find it, so send us cookies. You owe us:

.BEGIN_CODE
set CYGWIN=nontsec
.END_CODE

Yep, that tells Cygwin to not use NT security, which, paradoxically, makes it work with our Windows-based security, which, after all, is based on NT. Go figure. It works. Type it in.

.H2 Sucking the files down
Next up is the command line for pulling down the files themselves. Obviously, you’ll need to change things to work for you, but here’s our line:

.BEGIN_CODE
“C:\Program Files\Synametrics Technologies\DeltaCopy\rsync.exe” -vv -r -h -z –stats –delete-delay –partial-dir=”.rsync-partials” “www.myserver.com::Backups/myDir/” “/cygdrive/D/backups/MyDir/”
.END_CODE

You can see how each parameter works in Table A.

.BEGIN_TAB_TABLE A Rsync command line elements
.TAB_TABLE_WIDTH 50 500
.TAB_TABLE_HEADER Element Description
.TAB_TABLE_ROW "C:\Program Files\Synametrics Technologies\DeltaCopy\rsync.exe" The DeltaCopy installer drops rsync.exe and the Cygwin DLLs into this program file directory and while you can add it to your path, we just found it easier to directly reference the program in its installed directory.
.TAB_TABLE_ROW -vv This is the verbosity flag. The -vv indicates more verbosity. We like verbosity. Leave this out if you don’t want Rsync to talk to you all that much.
.TAB_TABLE_ROW -r Recurse through all lower directories as well.
.TAB_TABLE_ROW -h Show all numbers in human-readable form. Case matters in Rsync. Make sure you use -h instead of -H. -H (capitalized) means "preserve hard links".
.TAB_TABLE_ROW -z Compress the data as it’s transmitted.
.TAB_TABLE_ROW –stats More verbosity. This tells Rsync to print some transfer stats when it’s done. You can leave this out if you want as well, but what fun would that be?
.TAB_TABLE_ROW –delete-delay This causes deletions in the sync to occur after the file transfer pass is complete. We like it because stuff isn’t deleted until we know the file transfer part worked.
.TAB_TABLE_ROW –partial-dir=".rsync-partials" This tells Rsync to keep partial file transfers. This is really important when transfering 25 gigabyte files for the first time. If the transfer is interrupted, this option tells Rsync to save what’s already been transfered and resume later.
.TAB_TABLE_ROW "www.myserver.com::Backups/myDir/" This is the directory on the server, located under the virtual directory.
.TAB_TABLE_ROW "/cygdrive/D/backups/MyDir/" This is the directory on the local machine. You can’t just use "D:\backups\MyDir". Instead, you need to use a Cygwin/Linux drive naming format, where "/cygdrive/D/" is the functional equivalent of "D:\"
.END_TAB_TABLE

.H1 Adding more security
There you go. It’s relatively straightforward once you know what you’re doing. This is pretty much what we use here at ZATZ, although we added quite a lot of additional security. We transfer our files over SSH through a VPN and encrypt the transmission along the way. We’ll leave that as an exercise for the reader.

Good luck and now you can avoid that sinking feeling by using rsync.

.BEGIN_SIDEBAR
.H1 Product availability and resources
Download [[http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp|DeltaCopy]].
.END_SIDEBAR

.BIO