engine - 10:29 am on Apr 22, 2011 (gmt 0)
Amazon Cloud Outage Affected Major Sites [news.cnet.com]
A partial failure at Amazon Web Services' cloud-computing infrastructure brought down some Internet operations today, including the Web sites of Quora and Reddit.
The outage struck the Elastic Compute Cloud (EC2) service at Amazon's northern Virginia site, which handles AWS operations for the U.S. East Coast. The problems began at 1:41 a.m. PT, according to Amazon's AWS status dashboard, with delays and errors when connecting to servers over a network.
We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS [Elastic Block Storage] volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.