Jul 13 2012

What Happens When an Update in the Cloud Goes Wrong?

Cloud solutions offer increased redundancy, but they’re not immune to disaster.

Given the anytime, anywhere accessibility that cloud-based computing solutions offer, some people might assume that cloud infrastructure or software is immune to the perils of natural disasters. After all, it’s in the cloud; nothing can bring the sky down, right? Wrong.

The Midwest and East Coast was recently battered by something meteorologists call a derecho — which is sort of like a thunderstorm and tornado hybrid. Guess whose very public and widely used cloud went down after that? Amazon’s. Major sites such as Instagram, Pinterest and Netflix suffered from downtime as a result of Amazon’s cloud outage.

But natural disasters aren’t the only things that can bring down the cloud. Jay Heiser, a research vice president at Gartner, recently wrote about how FirstServer, a Japanese cloud vendor, suffered from downtime after a live software upgrade went completely wrong. Heiser strikingly compares live upgrades to the cloud to an organ transplant on an active, unanaesthetized patient.

Like the Gmail outage early last year, which required 4 days to recover service for what Google described as constituting only .02% of their user base, and an AWS incident about the same time that resulted in some permanent loss of data, the FirstServer incident was the result of a software upgrade.

It is hardly surprising that live upgrades of clouds sometimes result in failures. Replacing a code module within a running service is the equivalent of transplanting an organ without any anesthetic. Not only does the patient not have any anesthetic, the patient isn’t even lying down. The operation takes place while the patient is hard at work, performing heavy lifting on behalf of thousands of tenants simultaneously.

These outages shouldn’t dissuade organizations from latching on to cloud computing. The cloud offers a variety of benefits and advantages that IT departments regularly rely on. But they are proof that even the cloud needs a backup plan, because anything can happen.