Forum Moderators: Robert Charlton & goodroi
A very interesting presentation is available from a recent Large Scale Distributed Systems Symposium held at Cornell:
[cs.cornell.edu...]
This give a lot of detail on how Google manage their data and datacenters, some of it will be old if you follow the subject, but some is new and there are many figures (not so sure on how up to date they are, Google may have changed some for competition reasons too, but I think the audience it was aimed at means they will not be too far off).
Spanner is interesting, Google are looking to have around 1 to 10 million servers around the globe in hundreds/thousands of locations.
I suppose the biggest thing to take from this is the methodology that is used to access data, we already know that Google has never been up for the idea of "big iron", preferring lots of networked commodity hardware and smart software - well it looks as though they are now cemented in that approach and have systems that are built around serving the needs of the majority - this leads me to believe that they are not likely to be able to access enough data in a reasonable timeframe to massively increase the amount of information used for any given web search - they handle too many searches and hold too much data.
Is there room for a different approach to web search that serves a niche audience? I think there is.
[webmasterworld.com...]
Google File System v2A couple of years ago at the first Seattle Conference on Scalability, Google's Jeffrey Dean remarked that the company wanted 100x more scalability. Unsurprising given the rapid growth of the web. But there was more to it than that: GFS - the Google File System was running out of scalability.[storagemojo.com...]
Background Here:
[storagemojo.com...]...and also...
The Google File System - Abstract
Google Research Publications
[labs.google.com...]