Confidence in cloud services took a severe one-two punch last week with the lengthy downtime of Amazon Web Services' Elastic Compute Cloud and the breach of Sony's PlayStation Network, in which 77 million accounts were stolen. Last year I noted that more than half of IT pros surveyed felt cloud services weren't secure enough to trust. Well, the events of the past week won't help.
A brief recap: Amazon Web Services went down on April 21 and took dozens of sites with it, including social network Foursquare, news aggregator Reddit, and question-and-answer site Quora. The error was eventually traced to a network failure that caused problems with Elastic Block Storage, an Amazon storage service that's similar to a disk array for a standard datacenter. Some sites were offline for days as Amazon struggled with the outage, and some data was permanently lost.
In the PlayStation Network case, Sony discovered two weeks ago that PSN and Qriocity, a streaming service similar to Netflix, had been hacked. For the sake of security, it shut both services down completely until it could revise the network to make it more secure. It's still down as of this writing.
In both cases, it wasn't the disruption or breach, it was the handling of the problems that hurt the companies.
Amazon was painfully slow to alert its users of the outage and repeatedly failed to make timely updates on the status of the service. Cloud management provider RightScale summed up Amazon's failings in its own blog post, and there were a lot -- starting with no blog updates from Amazon Web Services for four days. What we had there was a failure to communicate.
Sony messed up even worse. It took down PSN and Qriocity on April 20 but didn't admit that personal information, including credit card information, was possibly accessed until April 25. Finally, on April 26, the company said on its blog that credit card info was encrypted, but personal data (name, address) was not.
So a lot of people are canceling credit cards now as a precaution, and the Xbox stands to gain a whole ton of sales.
The lawsuits are going to pile up over these outages, and Sony could be in real trouble if a 77-million-person class action suit is brought against it. But this is the time for cooler heads to prevail. If there is one thing I detest in what passes for political discourse it's the habit of both sides to define their opponents by the worst among them and make them the rule, not the exception. So let's not do it here.
- Don't abandon Elastic Compute Cloud. Are we forgetting that Amazon started out selling books, and has done a better job with a cloud service than most of the companies with the expertise in this area? Amazon has done a better job than any other cloud service provider, which is reflected in just how many major, important services were taken down. Amazon was subjected to DDoS attacks for kicking Wikileaks off the EC2 service, and it withstood those hackers.
- Do diversify. Your 401k doesn't consist of one stock or mutual fund, does it? EC2 sites that stayed up, like SmugMug and Twilo, didn't use Elastic Block Storage, and Netflix only used some of EBS for storage. It split the storage service with other providers. Don't put all your eggs with one provider if you can help it. Amazon is not the only infrastructure-as-a-service provider. There's Microsoft, CSC, and more.
- Check that SLA. Amazon's SLA does promise 99.95 percent uptime, and given that AWS has been around since 2006, it's had a pretty good track record. Again, this is one screw-up in five years. That said, make sure any cloud provider has some teeth behind its SLA. CSC likes to note that its SLAs have penalties if they ever screw up as badly as Amazon did.
- Pick up the phone, send some emails, make some noise. People sat around waiting to hear from Amazon. If the provider messes up, take initiative and contact them.
- As for Sony, I don't know what to say about that company anymore. Its string of failures just grows daily, and it's hard to believe Sony was the model on which Steve Jobs based Apple. Waiting five days to inform people that their credit cards might have been compromised is just inexcusable.
We learn from our mistakes. The fact is, Amazon hasn't had that many to learn from. But let's hope it learned well from this outage, and let's learn a few lessons ourselves. The first should be to not depend entirely on one provider. You wouldn't get all of your IT equipment from one vendor, would you? The same should apply to IaaS services.