Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Why is Google so FAST?


grelmar - 1:56 am on Nov 2, 2007 (gmt 0)


Two Words:

Beowulf Cluster [en.wikipedia.org]

If you read the (rather brief, and not terribly accurate) Wikipedia entry, you will get a sense that such clusters are almost tailor made for rapid indexing/searching. If properly coded, they are also highly scalable, and offer redundancy of process, so that if one machine fails in a cluster (or one hard drive), it is irrelevant, the other machines in the cluster pick up the slack.

While the first official "HowTo" was published in 1998, the first cluster was made at NASA years before this (can't remember the exact date), and the techniques and methodology had been floating around the open source community almost from day one.

Given what Google Search is doing, and how fast they are able to do it, I have little doubt that this is the backend architecture they are using. It would allow for much of the "index" to be held in RAM. Keep in mind, the "index" is just that, an index. Cue cards that point to larger stores of Data held on hard drives. When you do a search, the index looks at the words you have entered, and points to specific hard stored data to be retrieved.

I've never seen anyone at G directly reference Beowulf Clusters, but they have let slip a few bits to indicate this is a key backend technology. For one, they have repeatedly referred to their use of cheap, commodity PCs for their backbone. They have also referred in the past to their operating system as a customized version of Linux.

Making such large clusters work, and making the indexing process lean and efficient is still one heckuva feat.

I just wish they'd obey the spirit of the Open Source community and release some of their code optimizations back into the community.


Thread source:: http://www.webmasterworld.com/google/3493873.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com