How do you prepare for the unexpected? Start by creating a disaster recovery plan — then put it aside. Your survival won’t depend on the plan, but rather on the hours you’ve clocked getting ready for that fateful moment.
A disaster recovery plan can be a great resource, but if you’re hunched over reading it during a crisis, you’re done. You have to have faith in the fact that you’ve organized your systems logically, and — pay close attention, because this is the key to a successful disaster recovery program — you test them regularly.
I was put to the test in July 2007. I had just taken my position as the IT manager at Oetiker, a clamp manufacturer located in Marlette, Mich. When I tested the company’s backups, I found that they didn’t work. Before I came on, every server — every partition on a server — was being backed up. Even backups were being backed up. The tapes would fill up, so maybe a quarter of the backup would be saved.We cleaned up shop and got everything running on our Symantec Backup Exec software, and we went from 1.5 terabytes of backed-up data down to 450 gigabytes.
We had just finished moving a couple of the servers over, and with all the problems we were having, I figured a good, old-fashioned reboot of all the servers was called for. When I did that, the RAID controller on the ancient server that was housing our enterprise resource planning system failed. This was a Sunday evening at about 5 p.m., and I was dead in the water at that point. I was just about to decommission an HP DL380 G4 anyway, so I sped that up, backed up the soon-to-be recommissioned server and did a clean install to the DL380.
I then installed the ERP system, which took a while because I had to find the disks. I did a restore from our backup tapes, rebooted, and it was up and running before anybody walked in the door Monday morning. I left work at 4 a.m., which gave me an hour and 45 minutes before people began rolling into the office.
Like most ERP systems, ours essentially runs the company. We would have been stopped in our tracks without an up-and-running system. So you could say I was forced to prove my worth rather quickly. With that initiation, the essentials of disaster recovery became clear to me:
- Run a test of backup systems quarterly — at a minimum. Don’t just check that the data is being backed up, but also that you can get it back if needed.
- Be in the know. If anything fails, make sure your system is set up to notify you immediately. This way, no matter how bad it is, you’ll have time to process the information. We use ActiveXperts software, which monitors pretty much everything from available hard-drive space to whether or not network cables are plugged in. We also use Insight Manager to monitor our servers on a global scale, and I am notified of issues by that service as well.
- Earn your own confidence. Have everything in place to make sure you can make anything happen. Then try to relax; in a worst-case scenario, it’s just a matter of figuring the best solution and action plan.
- Trust in the people you’ve surrounded yourself with. Repeatedly put your staff to the test until you’re certain they’ll ace it every time.