How do you prepare for the unexpected? Start by creating a disaster recovery plan — then put it aside. Your survival won’t depend on the plan, but rather on the hours you’ve clocked getting ready for that fateful moment.
A disaster recovery plan can be a great resource, but if you’re hunched over reading it during a crisis, you’re done. You have to have faith in the fact that you’ve organized your systems logically, and — pay close attention, because this is the key to a successful disaster recovery program — you test them regularly.
I was put to the test in July 2007. I had just taken my position as the IT manager at Oetiker, a clamp manufacturer located in Marlette, Mich. When I tested the company’s backups, I found that they didn’t work. Before I came on, every server — every partition on a server — was being backed up. Even backups were being backed up. The tapes would fill up, so maybe a quarter of the backup would be saved.We cleaned up shop and got everything running on our Symantec Backup Exec software, and we went from 1.5 terabytes of backed-up data down to 450 gigabytes.
We had just finished moving a couple of the servers over, and with all the problems we were having, I figured a good, old-fashioned reboot of all the servers was called for. When I did that, the RAID controller on the ancient server that was housing our enterprise resource planning system failed. This was a Sunday evening at about 5 p.m., and I was dead in the water at that point. I was just about to decommission an HP DL380 G4 anyway, so I sped that up, backed up the soon-to-be recommissioned server and did a clean install to the DL380.
I then installed the ERP system, which took a while because I had to find the disks. I did a restore from our backup tapes, rebooted, and it was up and running before anybody walked in the door Monday morning. I left work at 4 a.m., which gave me an hour and 45 minutes before people began rolling into the office.
Like most ERP systems, ours essentially runs the company. We would have been stopped in our tracks without an up-and-running system. So you could say I was forced to prove my worth rather quickly. With that initiation, the essentials of disaster recovery became clear to me: