tedster - 10:29 pm on Apr 10, 2010 (gmt 0)
For anyone who wants to dig deeper into what is going on technically with Google's Caffeine infrastructure, here's a discussion from August 2009 right when the Caffeine announcements first came out. In fact, the code name Caffeine had yet to be announced. Some third party sources were calling it "GFS II".
The interview takes place between Kirk McKusick [from ACM - Association of Computing Machinery] and Sean Quinlan [Google infrastructure engineer].
GFS: Evolution on Fast-forward [queue.acm.org]
From what I can glean, several key factors are involved - first off, to make Google's entire system more failure aware with quicker error recovery.
But even more, Google is moving to a system that uses not only distributed slaves (servers that hold the basic data) but also distributed masters (servers that store metadata for the basic records). The master servers had become a major pinch-point in the system.
The new slaves will also store much smaller files. The chunks will go from 64MB down to 1MB. That alone is a reason why Caffeine and the previous infrastructure are not compatible.