A partial failure at Amazon Web Services' cloud-computing infrastructure brought down some Internet operations today, including the Web sites of Quora and Reddit.
The outage struck the Elastic Compute Cloud (EC2) service at Amazon's northern Virginia site, which handles AWS operations for the U.S. East Coast. The problems began at 1:41 a.m. PT, according to Amazon's AWS status dashboard, with delays and errors when connecting to servers over a network.
We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS [Elastic Block Storage] volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.
We were in final stages of making the decision to move to the AWS and this happens. We are now working on redundancy to make sure we don't get caught like this. Currently we have identical servers in very secure data centers 100 miles apart that monitor each other constantly with a program called replistor. IF we can solve the redundancy issue, we estimate we will save 80% or more on current costs not including the issue of replacing aging servers.
So essentially amazon has way oversold its capacity and didn't correctly engineer its storage zones to be fault tolerant. Nice.
I interviewed there and it was about the most inhumane and depressing interview i've ever witnessed. THey're so proud of their crap that they laugh you out the door if you speak on behalf of your own enterprise tools/san/nas experience.
People won't buy a Pepsi car, wear Crest shirts or live in Kodak homes. Why use Amazon's computer services?
Seriously, eleven years ago I worked for a large regional retailer who did what Amazon is doing--they sold their computer expertise. AND REAL COMPANIES BOUGHT IT. It was amazing. Did our customers get burned. Ohhhh yeah.
The last time I leased major computer power I took a look at Amazon. It was obvious that they were not serious about this business. And, we were building it out to sell. So, we went with a brand name firm. I was worried that our buyers would see Amazon as our provider as a negative.