As companies put ever more information into the cloud, data centers are going to become an increasingly integral part of their business. As data centers proliferate, so will the cost of operating them, and a critical part of data center operational costs is keeping the servers it houses cool.
Recently, Google unit DeepMind Technologies, which focuses on artificial intelligence, announced that by applying its machine learning to Google’s own data centers, it managed to slash the amount of energy used for cooling by up to 40 percent.
Google’s servers and data centers were already very efficient, which made the gains even more impressive. “In any large scale, energy-consuming environment, this would be a huge improvement,” Rich Evans, a research engineer at DeepMind and Jim Gao, a data center engineer at Google, wrote in a DeepMind blog post. “Given how sophisticated Google’s data centers are already, it’s a phenomenal step forward.”
The implications for running data centers, and the companies that rely on them to power key parts of their business, are significant. Evans and Gao wrote that in addition to improving efficiency at Google’s data centers, the advances “will also help other companies who run on Google’s cloud to improve their own energy efficiency.”
Using AI to Boost Data Center Efficiency
Data centers, depending on their size, sometimes use millions of gallons of water per year to fuel cooling systems that keep servers cool and operating. Evans and Gao noted that this is “typically accomplished via large industrial equipment such as pumps, chillers and cooling towers.”
Yet, they wrote, data centers are complex environments and are often difficult to operate optimally because that equipment, the method by which it operates, and how it interacts with the environment often interact in complex, nonlinear ways. “Traditional formula-based engineering and human intuition often do not capture these interactions,” they add. Additionally, the system cannot adapt quickly to internal or external changes (like the weather), because engineers cannot develop rules and heuristics for every scenario, the engineers added.
“Each data center has a unique architecture and environment,” they wrote. “A custom-tuned model for one system may not be applicable to another. Therefore, a general intelligence framework is needed to understand the data center’s interactions.”
Google acquired DeepMind in January 2014. Later that year, DeepMind started using machine learning technology to boost data center efficiency. Machine learning technology is a form of AI that learns from, and makes predictions based upon, data it accesses.
In the last few months, according to Evans and Gao, DeepMind researchers “began working with Google’s data center team to significantly improve the system’s utility. Using a system of neural networks trained on different operating scenarios and parameters within our data centers, we created a more efficient and adaptive framework to understand data center dynamics and optimize efficiency.”
The researchers took historical data that had already been collected by thousands of sensors within the data center — measuring temperatures, power, pump speeds, setpoints, and more — and used it to train deep neural networks.
“Since our objective was to improve data center energy efficiency, we trained the neural networks on the average future PUE (power usage effectiveness), which is defined as the ratio of the total building energy usage to the IT energy usage,” the engineers wrote. “We then trained two additional ensembles of deep neural networks to predict the future temperature and pressure of the data center over the next hour. The purpose of these predictions is to simulate the recommended actions from the PUE model, to ensure that we do not go beyond any operating constraints.”
The end result was that the machine learning system was able to consistently achieve a 40 percent reduction in the amount of energy used for cooling, “which equates to a 15 percent reduction in overall PUE overhead after accounting for electrical losses and other noncooling inefficiencies.” The system also produced the lowest PUE the site had ever seen.
Striving for Even Greater Gains
“They were pretty astounded,” he said. “We think there might be even more; it depends on how many sensors you put in. Now that we know that works, we can maybe put more things in.”
Evans and Gao noted that the “because the algorithm is a general-purpose framework to understand complex dynamics, we plan to apply this to other challenges in the data center environment and beyond in the coming months.”
Other applications of the AI technology could include improving power plant conversion efficiency (or getting more energy from the same unit of input), reducing semiconductor manufacturing energy and water usage, or helping manufacturing facilities increase throughput.