Weather disasters tore through every continent last year. Hurricanes, floods, earthquakes, wildfires and more challenged companies in ways both big and small. Emergencies happen — and they’re not all weather-driven — but they can’t slow down business.
Take Omni Air International, for instance. The privately owned American charter airline offers flights across the globe, and downtime isn’t an option, CIO Art Seabolt says. The Tulsa, Okla.-based company must maintain its ability to monitor aircraft and crews, create and change schedules, and dispatch a ground crew within seconds — no matter what the conditions are outside.
When Omni Air can’t do that, there’s a problem, Seabolt says.
“We could do all those things manually, but within a very short period of time, we’d face huge challenges,” not to mention pushback from clients, he says.
“Bottom line: We really need the ability to instantly fail over to keep our operation running. In our scenario, we have aircraft in the air flying globally at all times. We need to know what the planes are doing and what the crews are doing at all times. We need to stay informed,” for the passengers’ and crews’ safety, and for Omni Air’s business needs.
To meet this requirement, the company has maintained a disaster recovery site in a leased data center space, but last year, huge growth from business expansion began testing the limits of its infrastructure.
Something had to be done to make sure Omni Air wouldn’t run out of server space, Seabolt says.
At the end of the year, the company bought new HPE servers that will let it replicate its entire Hyper-V virtualized production environment. The new failover solution will ensure uptime for customers who fly in its 12 jets as well as more than 1,000 employees working in locations worldwide.
Technology and Planning Enable Disaster Recovery
Even for an organization with a less robust infrastructure than Omni Air, the ability to withstand a disaster — particularly an extended weather event — is critical. Consider the nonprofit Jewish Family & Children’s Service of Pittsburgh.
Downtime means staff can’t deliver critical services and care, says Nate Meek, former network and system administrator for JF&CS. The 100-person charity provides food pantries, adoption and immigration services, career counseling and eldercare. If the organization’s only server goes down, clients aren’t served.
“We help people, so if our only server goes down, it affects our clients who are in critical need,” Meek says.
He realized that his disaster recovery had to be as bulletproof as possible, so he turned to VMware vSAN, a hyperconverged infrastructure, to replace an older, less automated option. That way, JF&CS is prepared if anything happens.
Critical wellness services might not be at stake for Tapestry, a design house of luxury accessories and lifestyle brands in New York City. But a disaster could be fatal to the business, so it takes a similar stance when it comes to DR, making sure to back up IT resources so the company can be ready for any crisis.
Tapestry has servers at sites across the U.S. and overseas, but its DR planning starts the same way every year: with a fresh business impact analysis, says Louis DaCosta, senior manager for business continuity and DR. His team brings together each of the business lines that represent such well-known brands as Coach and Kate Spade, as well as senior management and “busy bees,” the folks working with the technology daily.
During that meeting, they discuss what each team does, who they do it for, what tools they use and their challenges. Once all that information is gathered, it’s easier to determine which applications and data need to be accessible at all times, and which can wait to be restored should disaster strike, DaCosta says.
This process of getting multiple departments involved with IT planning may be a best practice, but few organizations follow that advice, says Jason Buffington, principal analyst at research firm ESG. It’s something that must happen, especially during DR and business continuity planning, he advises.
“Successful projects start with the business needs first and work backward from there,” Buffington says. “With DR planning, you need to ask the question, What kind of agility does my organization require from IT? When you know that, you have a path forward.”
Gather Information About Critical Applications
Omni Air also takes this approach with its prep. Every year the company sends out an annual survey before updating its DR plan, asking users for feedback that can be used to make purchasing and policy decisions, Seabolt says.
“We want to know how many people need to be present in a room if there’s an outage. If the system fails over, how large does it need to be to have everyone working together? How many remote users need to have the ability to access content, and what would that content be?” he says. “Users are part of the problem, so they should be part of the solution,” he says.
A business should gather this information in a shareable spreadsheet or a DR tool, Buffington recommends. That way, senior managers and the disaster response team know which tools and apps people rely on, which servers those tools reside on and how a loss of those resources would affect productivity.
When the IT team comes to management armed with this information, it’s easier to get funding, Buffington says. “The punchline is, once you know the needs, you know how much it will cost your company to go down,” he says.
It’s also a good way to find out what really matters. “Everyone says their stuff is important, but when you reveal how much it’s going to cost, people start prioritizing,” he adds.
Set Disaster Recovery Objectives and Test the Plan
Choosing the right technology is also easier when the business understands its weak links and its downtime tolerance. At JF&CS, technology selection came down to finding a way to get the nonprofit’s backup and DR off-premises and away from its four office locations.
“A few years ago, we would do backups every night; then, once a week, we’d take those tapes home,” Meek explains. “Today, we still use tape backup, but we also replicate the backup to one of our offices, so we have survivability in case something happens.”
Meek chose vSAN because it let him buy the least amount of new hardware and do the implementation himself, he says. “It was super simple to install,” he says. “Networking wasn’t an issue either. It took me all of three or four days from start to finish.”
Jewish Family & Children’s Service replicates its backup to one of its offices to ensure data will survive, says Nate Meek. Photo: Angelo Merendino
Plus, maintenance can be handled remotely, and the system is expandable. “We just got an alert that we are close to the recommended disk capacity,” Meek says. “We can buy more drives and install them without having to take the system down.”
Franklin Fletcher, DR planner for Southern Glazer’s Wine & Spirits, says his company’s choice of DR technology was based on recovery time and recovery point objectives. The business, headquartered in Miami, has offices in 47 states as well as Canada and the Virgin Islands. When its servers go down, its 22,000 employees don’t work, and that’s a problem. Deliveries can’t be scheduled and retailers go without stock, so downtime is not an option.
One of the biggest mistakes IT makes during DR planning is oversimplifying the difference between backup and replication, Buffington says. “How fast is your data going to be usable again? That’s an important element when looking at technology.”
In addition, as the adage says, even the best-laid plans can go awry, so once technology is in place, it should be tested — often. “Most businesses are very dynamic, so things are always changing,” says Tapestry’s DaCosta, who counts on virtualized server blades as the foundation of his company’s data center. “You’ve got to test to make sure your DR plan works, and if it doesn’t, a test will show where the gaps are so you can remediate.”
Ensure Continuity of Operations
It’s not enough to make sure that data is protected. The IT team also should implement orchestration automation and test everything, making sure it can all be brought back online.
Sandboxing is also a useful tool to ensure the IT team doesn’t inadvertently take apps and data offline when testing. “You might not be able to test all your critical applications at once since the risk of shutting it all down is a problem,” Southern Glazer’s Fletcher says. “Continuity of operations is key.”
Finally, don’t confuse network isolation tests with true system failover, DaCosta says. Without a failover test drive, a business won’t be able to tell if it’s actually protected. “When disaster strikes, you need to know the entire business can continue,” he says. “If you don’t have the right technology in place, you will pay for it in the end.” .