Before Hurricane Sandy hit the Northeast, food importer Atalanta closed its main office in Elizabeth, N.J.
Even so, CIO John McLennan says, “We didn’t expect anything as bad as it was.” The next day, driving back to work, “I had to dodge billboards that were blowing on the highway,” he recalls. “It was a sight to see.”
Yet as far as Atalanta’s customers could tell, it was business as usual. A strong disaster recovery plan allowed the company to keep its systems up and deliver its food products to warehouses and customers around the country.
Sandy leveled homes, businesses and even towns. Yet three companies in the heart of the region most damaged by the storm — Atalanta, construction management firm Structure Tone and ConnectOne Bank — relied on multilayered DR and business continuity strategies to stay open even though some of their offices and locations were forced to sit dark and empty for days.
Now, they are factoring the lessons from Sandy to make their preparedness and continuity practices even stronger.
“Our goal is not to skip a beat next time,” McLennan says. “I’m a big fan of continuous improvement.”
Plan, Then Plan Some More
When Structure Tone brought on Terrence Robbins six years ago, senior management’s first task for the new vice president/CIO was to unify the company’s disparate IT operations as a precursor to implementing a broader set of construction industry business applications.
The senior management team at the global construction services firm wanted to ensure that Structure Tone could offer its clients information about their projects in a timely manner and in a format that worked for their businesses. In making these investments, the executive managers also wanted to ensure corporate systems were highly available and resilient so that the company could provide services to its users and clients regardless of the circumstances.
That request ultimately led to a plan to centralize IT operations in the New York headquarters and begin standardizing the infrastructure, Robbins says.
As a springboard for the effort, IT LANs Director Gregory Ring virtualized the environment. Today, there are 27 HP ProLiant servers running 170 virtual machines.
“This gave us an opportunity to use the VMware virtualization technology to provide production systems without having to buy one-to-one replacements for every physical server across the organization,” Ring says. “It also provided us the ability to grow the environment while allowing flexibility to quickly deploy systems.”
Structure Tone also centralized storage, deploying Clariion CX4-120 storage area networks from EMC at its sites in New York City and Dallas. And it added EMC RecoverPoint appliances, which ensure data stored in the New York SAN continually syncs to the SAN in the Dallas office. That means that every file is never more than 15 minutes out of sync. And to orchestrate failover, Structure Tone uses VMware vCenter Site Recovery Manager.
With 12 offices around the United States, plus locations in England, Ireland and Hong Kong — all connected via Multiprotocol Label Switching and a fully redundant mesh network — Structure Tone is able to use its regional offices as backup sites. In the United States, Northeast offices back up to the New York City data center, and three Texas offices back up to a Dallas center. Then Dallas and New York, which are on different power grids, back up to each other.
Number of storm-related tweets sent between Oct. 27 and Nov. 1. Businesses used social media to communicate internally, disseminate information to clients and correct misinformation.
SOURCE: “Seven Business Technology Resiliency Lessons Learned from Superstorm Sandy” (Forrester, April 2013)
“The RecoverPoint synchronization ensures that the data in our environments is never more than 15 minutes out of sync,” Robbins notes, providing a “hot” DR environment. “Having the New York data replicated in Dallas allowed us to seamlessly transition our New York servers, using Site Recovery Manager, to our Dallas data center and point the New York servers to the Dallas SAN which had the replicated data already available for the New York servers.”
Additionally, to add resiliency, Structure Tone moved to disk-to-disk backup using EMC’s Avamar deduplication backup system. Before the switch, the company performed daily backups to tape and stored those tapes offsite. If disaster struck, data would be at least a day old, and the IT team would have to retrieve the tapes, take them to a new location, rebuild its systems and then restore data off the tapes.
It would have taken two to three days — “best-case scenario,” Robbins says. Now, data is at most five minutes old and can be restored within a half-hour.
The Atalanta team applied a slightly different approach, establishing a hot-site colocation facility, where each week it backs up data and business applications to a private cloud, along with its DR infrastructure of HP servers, a Cisco Systems network and Citrix virtualization environment.
Like Structure Tone, ConnectOne Bank also uses its own facilities as hot sites for one another. The New Jersey company’s headquarters is in Englewood Cliffs, and its Hackensack branch has a mirrored infrastructure: the same operation centers and phone systems on different power grids and phone networks. In an emergency, calls automatically reroute to the other office, and if both locations go down, each branch also has cellphone backup numbers.
“We joke that we have a backup to the backup,” says Elizabeth Magennis, executive vice president and chief lending officer at ConnectOne Bank.
Be Ready to Move — Fast
Perhaps the biggest surprise for those affected by Sandy was the length of time they were displaced. That’s what threw off McLennan and Atalanta Help Desk Supervisor Javier Acebedo, who together developed the company’s DR plan five years ago and update it regularly.
“The extended loss of power, networking and telephone circuits really put us to the test,” McLennan says. “We didn’t account for such a widespread disaster in our plan.”
Structure Tone, a few blocks from Washington Square, sent its staff home at 1 p.m. on Monday, Oct. 29, and they didn’t get back into the building until Saturday, Nov. 3. When the office lost electricity and shifted to its uninterruptible power supply Monday night, the IT staff, who were monitoring systems remotely from their homes, knew about the outage immediately. They had 60 minutes to “gracefully” failover the New York systems to Dallas, Robbins says.
Photo: Matthew Furman
Gregory Ring (left) and Terrence Robbins
With the SAN replication technology in place, it brought up all the New York systems in the Dallas environment within 30 minutes of losing power. “It was seamless for the user,” he says.
At ConnectOne, the phones went down in the Englewood Cliffs headquarters, and the office had problems accessing its core management system, so staff moved temporarily to Hackensack and rerouted calls to the systems there.
All of the bank’s locations, however, were open the day after the storm. Englewood Cliffs lost power, but it was able to come back online using a built-in gas generator. Hackensack also has a gas generator; other offices have portable generators. ConnectOne had a contractor onsite before and during the storm that made sure generators were running properly.
Atalanta’s DR site is about 30 minutes away from headquarters, but it sits at a higher elevation, so it fared well during the storm, McLennan says. When power went out on Monday, it was able to move operations to the backup site.
Although phone lines were down for weeks, Atalanta routed calls to a sister company in Los Angeles also owned by Gellert Global Group. When another sister company in Paramus, N.J., got power back after a week, Atalanta moved its Voice over IP phone system there.
“The day of the storm was scary,” McLennan recalls. “Power, cellphone and Internet loss was widespread throughout the New York and New Jersey metro region.”
As to advice he would offer businesses revamping their continuity plans in light of Sandy, McLennan doesn’t hesitate: “Think bigger. Mother Nature is pretty powerful.”
Anyone who tried to call a New York City mobile or landline phone during the 9/11 attacks knows how critical communications are to any DR plan. Despite extensive planning following the 2001 disaster, many companies found themselves struggling to stay in contact with employees and customers in the wake of Sandy.
Even Atalanta, ConnectOne and Structure Tone had to reinvent their strategies on the fly.
Get Everyone Back to Work
“We needed to ensure that word got out to the staff that we were up and running and ready for business and to inform them of any procedural changes,” says Atalanta’s Acebedo. “That was a big challenge for us. A lot of people wondered whether they should report to work. And, of course, there were a lot of false rumors running around.”
Atalanta’s IT team sent texts and email to both personal and work accounts, but because so many homes and cell towers were without power, it was hard to reach people. The company rented space from a hotel near the office, but most people were able to work from home. Atalanta has a bring-your-own-device policy, which requires additional day-to-day security and management, but “it really increases our mobility” and helps in a disaster, Acebedo says.
What is an acceptable recovery time objective for your business should a natural or manmade disaster take place?
36% 7 to 24 hours
25% 2 to 7 days
15% 3 to 6 hours
13% We haven’t set an RTO yet
5% 1 to 2 hours
5% Less than 1 hour
1% More than a week
SOURCE: CDW poll of 301 BizTech readers
Before Sandy hit, Structure Tone’s Bob Yardis, senior vice president of human resources, emailed staff with conference bridge numbers that the executive team and department heads used to communicate every day during and after the storm.
Still, Robbins acknowledges, “we had made some assumptions about the movement of our staff in the light of a disaster, and our assumptions just didn’t hold true. That was a lesson learned.”
The company provides notebook systems to all users, and IT assumed people would use them to work remotely. But many people viewed the storm warnings as overhyped and didn’t bring their notebooks home. “That caused a lot of challenges,” Robbins says.
Plus, those who had their notebooks should have been able to connect to Structure Tone’s network because the company has Cisco Systems virtual private network (VPN) concentrators and Internet gateways at four sites, but many people lost Internet access at home. Structure Tone also provides aircards to employees, but so many businesses relied on the cellular service during Sandy that use soared and flooded the networks already crippled by the storm.
Getting notebooks to employees also proved challenging because many were dealing with personal crises, and some regional offices weren’t accessible. Structure Tone’s leaders set up a command center at the company’s Lyndhurst, N.J., office, where they triaged IT needs.
Because there was no power below 14th Street, they needed basic telecommunications equipment — smartphones and VPN access — to coordinate and communicate with Structure Tone teams working in lower Manhattan. Many of the company’s clients had severe flooding, so the company helped assess the damage and coordinated teams to help pump water out and to perform any necessary work.
“Above 14th Street, where there was still power, projects needed to function as if nothing happened,” Robbins says. The ability to failover systems to Dallas meant those projects had no slowdowns.
Structure Tone set up mini command centers at some larger job sites, and Robbins sent his staff to those locations to help distribute notebooks and provide IT support to employees. “Even though we had great recovery of our centralized systems, our real lessons learned were all about getting people connected,” Robbins says.
Meanwhile, ConnectOne initiated a standing policy that in case of disaster, a 10-person DR team would dial in to a conference call each morning to discuss next steps. The team — broken into groups that focus on different aspects of the business, such as IT, retail, compliance, operations and loans — held multiple calls daily during Sandy, Magennis says.
“I think you need to get all the employees involved in the disaster recovery,” because it’s about far more than just technology, she says.