As data volumes continue to rise by double-digit percentages each year, IT managers face a hard reality. Not only must they find room to store all this information, they also must keep it readily available, protected from technical glitches and safe from disasters.
At the end of 2010, the volume of stored data worldwide totaled 1.2 million petabytes (1 petabyte equals 1 million gigabytes), according to the 2010 IDC Digital Universe Study, conducted by technology researcher IDC. That’s a 50 percent rise from the year before.
The way for a business to thrive in an environment of exponentially expanding data stores, along with concerns about business continuity, is to implement a multipronged storage strategy — one that goes beyond spending more money to acquire additional storage capacity.
Analysts and vendors say the solution is to create tiers of storage for primary and backup data, as well as for disaster recovery (DR) and archived data. Here’s what a modern corporate IT infrastructure needs to keep data safe and highly available.
Primary storage systems should manage the most-used information by relying on technologies that deliver the fastest data access rates. Typically, getting the fastest rates calls for deploying solid-state drives, high-performance SCSI disk drives and Fibre Channel arrays. These technologies are speedy and reliable, but are also the most expensive.
Organizations can justify the costs by keeping their most important data on these performance stars only as long as needed, which is as long as stakeholders are actively using the data for core business needs. As information ages, automated management software can move it to other storage tiers made of less expensive components.
To get the most out of expensive tier-one hardware, companies can use storage virtualization and thin-provisioning technologies. Virtualization creates an enterprisewide pool of storage capacity by aggregating storage area network (SAN) and network-attached storage (NAS) units, two solutions for creating smaller, localized storage pools. Virtualization also lets firms centrally manage these storage resources and scale up capacity without production downtime that impairs business continuity.
Thin provisioning builds on storage virtualization by letting IT managers dynamically allocate available storage capacity within the enterprise pool. If one array has extra terabytes of capacity and another array is starved for space, managers can quickly divvy up resources between them.
Virtualized storage environments also can automatically shift resources based on pre-set conditions. Users crunching numbers or finalizing a quarterly production analysis won’t be affected by these behind-the-scenes activities.
Once all primary data is virtualized, the business can easily move copies to a tier-two backup layer for added protection. Reliable storage hardware is a must for tier two, but high performance isn’t as critical as for the first tier.
Companies can build the backup layer with arrays using less expensive iSCSI and SATA hard drives. To make sure these components are used efficiently, IT managers should also implement data deduplication technologies to avoid backing up redundant information and wasting storage capacity.
“File deduplication boils down to a simple concept,” says Joe Disher, solutions marketing manager for Overland Storage. “There is no need to have the same PowerPoint document or the same Word document stowed in 25 different folders.”
Data dedupe can reduce the amount of stored data by a factor of 10, adds Peter Elliman, senior manager for product marketing in Symantec’s Information Management Group. In addition to eliminating waste, organizations can copy all their critical data to tier-two systems during allotted, after-hours backup windows.
Organizations are now pushing to get even more value out of deduplication. Enterprises traditionally filtered data through dedupe software before it reached target storage devices — those that sat behind the backup server in their backup environments. Now the focus is also on dedupe at the client level, namely the servers where data lives.
“This reduces the data flow further upstream, which means you don’t need bigger pipes to move all the growing data around your network,” Elliman says. The combination of target and client deduplication can reduce the amount of data being moved by as much as 99 percent, he adds.
This third storage tier acts as the insurance policy for when the unthinkable happens: a wide-scale disaster that destroys individual storage arrays or an entire data center, taking important data along with it.
Successful DR planning encompasses both local data replication, to protect against minor outages, and long-distance replication to a disaster recovery site situated outside of the geographical region of the primary data center. This guarantees that one event won’t take down both resources. IT managers need different strategies for each of these replication requirements.
An effective option for local replication includes storage arrays using SAN units built with Fibre Channel and iSCSI connectivity. Some units come with two controllers, so if one component fails the other immediately takes over to keep operations running. Achieving redundancy and replication among components within a single box makes this an economical option for high availability.
Effective disaster recovery strategies require better data protection, offsite storage and regular testing.
For more reliability, but at a higher price, companies can choose to connect identically configured, single-controller SANs together into so-called active-active clusters. If one unit crashes, the other immediately takes over the additional duties.
“I can have one system completely die, all the disks catch on fire,” Overland Storage’s Disher says. “And it doesn’t matter; I don’t lose any application availability.”
Choices for long-distance replication come down to weighing the pros and cons of synchronous or asynchronous data replication. With synchronous replication, data moves separately from the primary site to the DR site.
The main site doesn’t send the next chunk until the DR location confirms the earlier transfer has successfully completed. This ensures that no data is lost during replication because of short network outages or other problems. However, distance limitations may be a problem for some organizations.
Synchronous replication requires sites to be less than 100 miles apart (significantly less in some cases, depending on WAN characteristics and other factors). Some companies overcome this limitation with a series of intermediate sites that relay data between the primary and the DR sites.
Asynchronous replication avoids stringent distance limitations and communication delays by sending data continuously without a series of back-and-forth confirmation messages. The trade-off, however, is that asynchronous replication doesn’t offer safeguards against potential data losses during transmission.
How do you decide? Analyze data to determine what’s less critical and what can tolerate some of the risks of asynchronous replication. Protect data that requires the greatest care with more costly synchronous replications.
The technical challenges and WAN costs associated with replication force many businesses to stick with tape storage. While tape has been around for awhile, it is still a tried-and-true technology for disaster recovery operations.
With tape, enterprises can safely store data in multiple, geographically dispersed locations and, in some cases, at improved recovery times. This is especially valuable when data volumes are large enough to strain networks if a DR site has to upload a full data store to a primary facility.
Data archiving is the process of moving data deemed inactive to a storage device for long-term retention. Low cost and reliability also make the latest tape technologies a prime choice for data archives.
Information in this layer is older and rarely accessed. Still, these files remain valuable enough to fall within corporate or regulatory retention requirements.
Smart archiving is attainable. It begins by understanding what information needs to be archived and how the organization values that information.
To create four storage tiers and subsequently send data to the right layers, IT managers need an assortment of data management tools. These applications help organizations determine the age and how regularly people access specific files.
Using guidelines determined by IT managers, the applications can automatically move individual files to the most appropriate tier at every point in the data’s lifecycle.
Data management tools cross a number of product categories, from information lifecycle management and automated storage tiering technologies to data deduplication, replication, backup and recovery, and archiving solutions.
Businesses may choose to integrate point products from multiple vendors or rely on a pre-integrated suite of products from a single vendor. Either way, IT managers need to make sure the final solution can manage data from both physical and virtual IT environments.
Clear visibility across both of these environments requires close work with staff members responsible for implementing and configuring servers. Staff should make sure that storage considerations are part of the provisioning process.
To help, some of today’s backup tools can automatically detect and inventory virtual servers and then identify whether the data associated with each of them is being protected adequately.
In addition, some tools help IT managers peer inside virtual machines. This is important because most data recovery requests are for individual files, which can be hard to locate within a range of virtual machines.
Tiered storage vendors include EMC, IBM, Overland Storage, Symantec and others. Symantec recently introduced V-Ray, for its NetBackup and Backup Exec products, which lets companies see into virtual machines to better understand how to protect them.
In the end, high availability and fault tolerance have become essential elements of a sound storage management strategy — no longer separate considerations made after the storage infrastructure is in place.
“The key in this era of high data growth and high demand, is to build business continuity in from the start for primary storage, for backup storage and for archival systems,” says Eric Herzog, vice president of product management and marketing for EMC's Unified Storage Division.