May 12 2009

Test Your Backups

Successful backup and recovery can save an organization from headaches when disaster strikes.

Everyone in IT knows how important backups are. They can save an organization money and time — and possibly even the network admin’s job. But it’s imperative that after backing up important files, you make certain you can actually do a successful recovery. If your backups are corrupt or not functioning properly and you’re asked to restore the entire accounting database because of a glitch in the software, you will be thankful that you have done regular restores to test your backups. Too many organizations assume their backups are solid, and learn otherwise when disaster hits.

First, let me describe my backup system. I back up all of my data to a server that has 20 1-terabyte drives in a RAID 6 configuration. If you set one as a hot spare, then you have about 17TB of hard-drive space and can afford to lose two drives at once without data failure. I use rsync scripts to perform daily backups of all of my servers, then store the backups in a folder for that day of the week. These backups happen during off-peak hours. Every week, rsync will look for changes between the previous week’s backup and the current week’s backup and sync the two. If you delete a file from your main file server, rsync knows to also delete it from the backup server. Once the data is on my backup server, I send it offsite in two different ways. I rsync the data to a hosted server that I rent rack space from and I also copy the data to removable media, which goes to my bank vault.

When testing your backups, first set up a rotation schedule for your servers that dictates when you are going to attempt a restore. Pick a certain database or file server and set up a monthly time when you are going to run the restore for that server. I try to do some sort of restore for every system at least once a month. Second, you do not have to restore all 2TB of your main file server. Pick a few folders randomly throughout the file server to restore. Typically, if a backup goes bad, everything inside the backup will go bad. Third, do not throw away your old, out-of-warranty hardware. That old server is the perfect workhorse on which to test your restores. VMware is another option, if you have that available to you.

It’s also important to test all of your backups, not just the one you use most often. I use three different forms of backup: one onsite and two offsite. If disaster hits and my onsite is destroyed, then I better be sure that my offsite backups are verified and working. I always run a test restore on my removable media before I send it to the bank vault.

I typically keep a Linux and Windows base image loaded on my servers to test my backups. This lets me test both MySQL and SQL along with other programs that are OS specific. The hardware really does not matter because most people back up only the data and not all of the OS files. So as long as your data is backed up, you shouldn’t mind rebuilding a server from scratch because you have those resources available to you and they are easy to duplicate.

On my main backup server, I have a scheduled job that runs during the day that copies data from the backup server to my test backup servers, depending on which week it is. Rsync is my tool of choice to accomplish the copy. Follow this plan, and you can attach that database to your test SQL server or try to access some of the files you have restored.

I cannot stress enough how important it is to test your backups — it’s just as important as the backup itself. If you haven’t been testing your backups, start now. No one wants to think about disaster. But should it happen, a little work now can save a lot of work later — and possibly even save your job.

Justin Dover is network administrator at Harpeth Hall School in Nashville, Tenn.