Welcome to WebmasterWorld Guest from 22.214.171.124
Google: A Behind-the-Scenes Look
In this program, Jeff Dean of Google describes some of these challenges, discusses applications Google has developed, and highlights systems they've built, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. He also shares some interesting observations derived from Google's web data.
Google's Linux cluster currently processes over 150 million queries a day, searching a multi-terabyte web index for every query with an average response time of less than a quarter of a second, with near-100% uptime. In this discussion, Google Fellow Urs Hölzle will describe the software and hardware infrastructure that makes this performance possible, as well as provide an overview of the main problems facing a web search, software architecture, servers and compact rack hardware designs.
For those with massive bandwidth and low latency (warning: my 300k/sec cable isnt even fast enough) you can try using their ultra high quality MPEG2 stream via the IBM "VideoCharger" player which can be found here:
These videos can be saved permanently using HiDownload, WMRecorder or similar - some of the slides are worthy of much closer study ;)
(for example in this video it is the first time I have heard of google "shards" [google.com] but maybe I just haven't been paying attention?)
This would seem to confirm that for ranking purposes related to PR (not anchor text or other criteria), the theme of the crosslinked sites is irrelevent.
That bit is about 12-13 minutes into the show.
.... back to the video ....
WHY are they saving the higher PR shards more often? That confuses me in terms of relevancy...
If I save the high PR shards more often and the lower PR shards less often, everything comes down to PR, which is simply not the case.
Keyword in title, incoming named links, etc. are surly outweighting PR very often, why not saving "often searched keyword" shards more often? Or am I thinking too SEO for that?
Or is the keyword density of an "often searched keyword" in a document influencing it's PR? Surely not in the original formular!
I guess my frustration lead me to find archived or downloadable copies of this presentation.
Here goes : [norfolk.cs.washington.edu...]
I spent almost an hour searching for a good streaming media downloader for my mac. And i got a wireless dialup which goes max at 144kbs.
Was getting frustrated when i saw you URL. Its currently downloading. Cant wait.
Think its about time i visit the Mac Webmaster forum.
Seriously, one thing makes me wonder. Google pride themselves how they can provide reliable service on unreliable hardware using fault tolerant software. Kudos to them but how do the other SE's do it? Dont't they need a similar kind of infrastructure? Maybe not, considering that they only get a fraction of the traffic Google gets. What would happen if, say, MSN suddenly got a huge increase in traffic? Would they just die?