Abstract Highly available cloud storage is often implemented with complex, multi-tiered distributed systems built on top of clusters of commodity servers and disk drives. Sophisticated management, load balancing and recovery techniques are needed to achieve high performance and availability amidst an abundance of failure sources that include software, hardware, network connectivity, and power issues. While there is a relative wealth of failure studies of individual components of storage systems, such as disk drives, relatively little has been reported so far on the overall availability behavior of large cloud-based storage services. We characterize the availability properties of cloud storage systems based on an extensive one year study of Google's main storage infrastructure and present statistical models that enable further insight into the impact of multiple design choices, such as data placement and replication strategies. With these models we compare data availability under a variety of system parameters given the real patterns of failures observed in our fleet.
Msg#: 4274626 posted 7:25 pm on Mar 1, 2011 (gmt 0)
Thanks, engine. I've downloaded the paper and will study it with real interest. With so much data moving into the cloud these days (including website objects in many cases) availability is an extremely big deal.
[edited by: tedster at 9:50 am (utc) on Mar 2, 2011]
Msg#: 4274626 posted 9:52 am on Mar 2, 2011 (gmt 0)
Yes, it really is ironic timing. Google has hitched their wagon to cloud computing in a big way. It sounds nice, at least in the abstract. In reality, there is the potential for many new kinds of mayhem, so I'm a bit cautious - even though I do use some cloud services.