Oct 10 2011

Power Up to Prevent Downtime

With data center downtime no longer tolerable, implementation of an effective power management plan is critical.

When power fails, business flails.

Consider these recent examples: A major airline had to ground or delay flights because of a power outage that affected one of its data centers. Millions of customers of a large financial software company were unable to access tax and accounting software for an entire day because of a power failure in the company’s data center. And the data center of the leading customer resource management platform, which services tens of thousands of businesses, shut down for an hour because of a complete power failure, during which time, customers were unable to access data stored in the data center.

Customer service and business disasters occur much more often than they should — not only at large companies, such as those described above, but at smaller companies as well. That means the focus on business continuity and power management should be top of mind for every IT and data center manager.

The key to maximizing uptime is having a comprehensive strategy for electric power, not only for everyday power and cooling, but also to prevent business continuity problems. By doing so, data center managers avoid guesswork in identifying how much power is needed and how to allocate it. Such a strategy also can help free up capacity to add more servers to an enclosure or rack without increasing the risk of downtime or disruption.

Business Continuity and Power

Business continuity has an inseparable relationship with power. Without continuous power, a data center is at risk for failure and all of the repercussions that it can bring.

In fact, a 2010 survey from the Data Center Users’ Group, sponsored by Emerson Network Power, found that 47 percent of respondents listed availability as a top facility or network concern, and 23 percent reported experiencing at least one power outage in the previous 12 months.

“Power and cooling are much more critical elements to the delivery of IT services than they were in the past,” says Jeff Carlat, director of HP’s Industry Standard Servers and Software business. “Look at virtualization. Now you’re dropping many applications on a given host, so the power and cooling to the host and the redundancy throughout the components are even more business-critical than they were in the past.”

In addition to virtualization, businesses are utilizing more web-enabled services along with IP telephony, making data center uptime even more critical. From the business continuity standpoint, this requires strategic thinking about the amount of power, quality of power, uninterruptible power supplies and supplemental power, and how each relates to data center uptime.

Power and Cooling Strategies

The best way to ensure continuity through power management is to upgrade and implement the most advanced power technologies available, while employing industry best practices.

First, evaluate current equipment. Anything that is more than about five years old should be replaced. And newer equipment, depending on the situation, may also be a candidate for replacement.

Industry groups like The Green Grid recommend taking additional steps, such as blocking floor cutouts (where cables run) on a raised floor to maintain floor pressure, and creating separate hot and cold aisles for equipment to ensure that it runs more efficiently.

Other steps include installing blanking plates on server racks to ensure that air doesn’t mix between the front and back of the server; implementing variable speed fans; using tunneling (basically, a system of heavy plastic flaps) to avoid mixing air between hot and cold aisles; and installing more intelligent UPSs and power distribution units (PDUs).

On the cooling and airflow front, there still exists the potential for excess leakage of chilled air in data centers. It’s not uncommon today to have 50 percent of the air going to places that aren’t targeted for it, says Roger Schmidt, IBM fellow, chief engineer of data center energy efficiency. “Those data centers are wasting a huge amount of energy just by not providing the proper ventilation to server racks and storage, and by not targeting air flow.”

A new recommendation from the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) suggests actually raising the temperature of the data center to about 80 degrees because new equipment can handle it — thus, less money is spent on cooling, which can provide real savings.

Automated Monitoring

Today’s data centers, with their virtualized servers and applications and complex equipment, require a more automated approach. All of the necessary equipment and the concern about power and uptime have led data center managers to focus their effort more on monitoring power management.

Traditionally, however, techniques have involved spreadsheets, generators and UPSs. These methods face two problems in today’s world: They are best used in situations where things don’t change quickly, and they are reactive instead of proactive responses.

“In today’s data center, there are lots of moves, adds and changes, with people putting in new servers every day and moving things around,” says Gary Anderson, business development manager for AC Power at Emerson Network Power. “Not only that, but the power density of equipment has gone up. So now, if you put in a blade chassis and fill it up, you could completely wipe out the power to that rack.”

Clearly, the old way of managing power in the data center doesn’t work in today’s complex data centers. Really understanding what’s going on with the complete power infrastructure in today’s data center requires consistent and comprehensive monitoring to pinpoint spikes, capacities and trouble areas. Without that information, it’s easy to overload circuits and cause the entire system to go down.

“What you want is a much more granular level of understanding of the power needed to support the equipment in the data center, which eliminates the guesswork,” Carlat explains. “With the ability to monitor the power usage of the equipment in real time, you can see your cycles and peaks over time, which allows for better overall management.”

There are two ways to approach automated monitoring, either modularly or via a comprehensive monitoring system. For smaller data centers, the modular approach may make more sense, says Herman Chan, director of the Power and Management Solutions Business Unit at Raritan. The best way to do that, he says, is by combining the newest generation of intelligent PDUs.

An intelligent PDU is a computer packed inside a rack. It monitors the power coming into the rack, to the circuit breaker and outlet level. That capability combined with energy management software, such as what is offered by Raritan and Tripp Lite, allows data center operators to collect data and trend it over time to understand when capacity is being reached or if there is enough capacity to add another server.

It also allows users to set thresholds and get alerts before a circuit breaker trips. Additional software that includes environmental sensors can be added.

Most providers of network PDU systems that power devices within data center equipment racks have options for remote monitoring of in-rack environmental, electrical and power consumption. Those options allow users to set alarm thresholds and notifications for each reported item via text or e-mail.

The goal, says Rich Feldhaus, product manager at Tripp Lite, is to prevent overloads and identify failure potential before they have a chance to turn into more serious problems and downtime. Higher-power density-switched PDUs often include power consumption monitoring at the outlet level so data center managers can tell exactly how much power each connected device is pulling, if it’s going above allocated usage levels, or if it’s even being used at all.

For larger or more complex data centers, the option of moving up to a high-end comprehensive power management system might make sense. A comprehensive monitoring system should automate the monitoring of all power-related functions, including temperature, rack conditions, cooling, heating, detection of fluid leaks and batteries. They should be able to manage power settings for thousands of endpoints regardless of location, connection type or status, from a single console.

Examples of tools that provide these broad capabilities include Tivoli Endpoint Manager for Power Management; HP’s Insight Control powered by its Integrated Lights Out (iLO) Power Management Pack; Schneider Electric’s Power & Energy and Monitoring System; Raritan’s Power IQ Energy Management software; and Emerson Network Power’s Aperture Capacity Manager.

Many of these tools use technology that provides thermal imaging of the data center floor, using sensors placed throughout the data center. With thermal imaging of air temperature, airflow and air pressure under the floor, data center managers can fine-tune their power management.

Immediate Action

These are real-time tools that give facility managers the ability to see, for example, an inefficiency in the right corner of the data center. “Maybe it’s too hot or being overcooled. With that information, they can rearrange floor cutouts, vents and return air, or shut off a unit based on the readouts,” explains Albert Pepe, senior program manager for IBM’s Business Continuity Resilience Service Delivery.

These tools can go even further than that. By analyzing past and current metrics, these systems can forecast the consumption of data resources at various points in the future and determine when data center resources will run out across all of an organization’s data centers. They can also oversee overall network switch port availability as well as control rack and floor space availability.

These tools also meter power usage effectiveness (PUE), which is the relationship of the IT equipment on the floor and the power running it. Basically, it’s an algorithm that tells the data center manager how efficiently the data center is running.

According to most industry estimates, an optimum PUE is around 1.6, yet most data centers today run at about 2.0. By metering PUE all the way down to the circuit level or the actual server, you can gain a good understanding of how much power you’re drawing within the data center, Pepe says.

But just installing a comprehensive power monitoring and management system isn’t enough. It’s far from a “set it and forget it” scenario. Once it’s installed, “find all the points you can monitor and make sure the software system allows you to pick up all of those data points so you can start to really understand what’s going on in the environment,” Emerson’s Anderson says.

In addition, Anderson recommends using the collected data to better balance the loads in the data center to avoid spots of very high density and spots of very low density for both power and cooling.

“It’s about knowing what’s there, looking at the trends and understanding where you are going so you can make decisions ahead of time, and then increase utilization of what you have as much as possible,” Anderson adds.

Good Things Ahead

As intelligent and feature-rich as today’s power management tools are, they are improving at a staggering rate. Emerson Network Power, for example, is working on a system under the name Trellis, which will combine all points of information — from cooling, UPS and floor- and rack-based PDUs all the way down to the server to create an accurate minute-by-minute reading of where and how power is being used in the data center.

“Right now, if I had to figure out the best spot for a server, I would have to do it manually, but eventually you’ll be able to determine the optimum spot for greatest power efficiency,” Anderson says. “We’re not quite there yet, but we’re going in that direction.”

Greater automation is the key for the future, Carlat agrees. “As automation and intelligence increases, you will have the ability to set up policy, use power capping and be able to migrate and move workflows, along with the energy needed to power and cool the infrastructure running it, without human intervention,” he says.

“It’s about orchestrating and managing the workflows. The idea is to continually optimize power efficiency without exposing the business to risks of downtime or business interruption.”


aaa 1