| 6:31 pm on Mar 23, 2005 (gmt 0)|
most excellent - thanks.
| 6:39 pm on Mar 23, 2005 (gmt 0)|
Also check out their oldie but goodie from 2002:
|Google's Linux cluster currently processes over 150 million queries a day, searching a multi-terabyte web index for every query with an average response time of less than a quarter of a second, with near-100% uptime. In this discussion, Google Fellow Urs Hölzle will describe the software and hardware infrastructure that makes this performance possible, as well as provide an overview of the main problems facing a web search, software architecture, servers and compact rack hardware designs. |
For those with massive bandwidth and low latency (warning: my 300k/sec cable isnt even fast enough) you can try using their ultra high quality MPEG2 stream via the IBM "VideoCharger" player which can be found here:
These videos can be saved permanently using HiDownload, WMRecorder or similar - some of the slides are worthy of much closer study ;)
(for example in this video it is the first time I have heard of google "shards" [google.com] but maybe I just haven't been paying attention?)
| 7:28 pm on Mar 23, 2005 (gmt 0)|
Ya, shards are not new really.
This is fun too ;-)
| 8:04 pm on Mar 23, 2005 (gmt 0)|
Just watching the video and one of his explanations caught my ear. The Google engineer confirms that PageRank is query independent, meaning that the Pagerank contribution to sorting search results doesn't take into account what terms are being searched for.
This would seem to confirm that for ranking purposes related to PR (not anchor text or other criteria), the theme of the crosslinked sites is irrelevent.
That bit is about 12-13 minutes into the show.
.... back to the video ....
| 8:20 pm on Mar 23, 2005 (gmt 0)|
Very interesting video.
I liked a part where he talks about query clustering. You may say it is related to "thesaurus" or even "LSI". But giving ranking boost based on high cluster points was the most interesting part.
| 8:45 pm on Mar 23, 2005 (gmt 0)|
>> I am unable to view video. Tried IBM Charger thingy ..
Looks like server crashed. Can some one sticky me the video ... if they have it?
| 9:43 pm on Mar 23, 2005 (gmt 0)|
I guess my frustration lead me to find archived or downloadable copies of this presentation.
Here goes : [norfolk.cs.washington.edu...]
Hope the "powers" don't edit out the URL
| 10:36 pm on Mar 23, 2005 (gmt 0)|
| 11:19 pm on Mar 23, 2005 (gmt 0)|
what I consider interesting:
WHY are they saving the higher PR shards more often? That confuses me in terms of relevancy...
If I save the high PR shards more often and the lower PR shards less often, everything comes down to PR, which is simply not the case.
Keyword in title, incoming named links, etc. are surly outweighting PR very often, why not saving "often searched keyword" shards more often? Or am I thinking too SEO for that?
Or is the keyword density of an "often searched keyword" in a document influencing it's PR? Surely not in the original formular!
| 11:28 pm on Mar 23, 2005 (gmt 0)|
Hehe, since I live in Seattle, I've seen that on the UWTV public access channel a couple times now. Slightly more interesting to me than the calculus lectures. Nice of them to put it up on the web ;)
| 11:33 pm on Mar 23, 2005 (gmt 0)|
hehehe, I should first watch to the end! now with the clusters, it makes much more sense :-)
| 12:11 am on Mar 24, 2005 (gmt 0)|
Yah, that's a pretty good talk. :)
| 1:14 am on Mar 24, 2005 (gmt 0)|
GG - think you guys/girls could release the tool used to view the model of clusters? :)
| 2:21 am on Mar 24, 2005 (gmt 0)|
Shhhh....... check out the adwords keyword suggestion tool and you'll find something remarkably similar to the tool you saw in the video, just without the numbers.
I know I shouldn't give ALL the secrets away, but sometimes I can't resist. :)
| 10:28 am on Mar 24, 2005 (gmt 0)|
> I know I shouldn't give ALL the secrets away, but sometimes I can't resist. :)
Yeah, you should be Sharper than that! ;-) But thanks anyway, I'll have to watch it at home. Can't do it at the office. Boss isn't deaf, unfortunately.
Oops, suddenly I'm preferred? Good!
| 3:09 pm on Mar 24, 2005 (gmt 0)|
> That confuses me in terms of relevancy...
But does not in terms of spider frequency.
| 10:48 pm on Mar 24, 2005 (gmt 0)|
|I guess my frustration lead me to find archived or downloadable copies of this presentation. |
Here goes : [norfolk.cs.washington.edu...]
I spent almost an hour searching for a good streaming media downloader for my mac. And i got a wireless dialup which goes max at 144kbs.
Was getting frustrated when i saw you URL. Its currently downloading. Cant wait.
Think its about time i visit the Mac Webmaster forum.
| 10:25 pm on Mar 25, 2005 (gmt 0)|
I liked the joke about the workers ... Didn't seem to work with that kind of audience, though ...
Seriously, one thing makes me wonder. Google pride themselves how they can provide reliable service on unreliable hardware using fault tolerant software. Kudos to them but how do the other SE's do it? Dont't they need a similar kind of infrastructure? Maybe not, considering that they only get a fraction of the traffic Google gets. What would happen if, say, MSN suddenly got a huge increase in traffic? Would they just die?
| 2:44 pm on Mar 26, 2005 (gmt 0)|
So any other little discoveries from this video?
| 6:50 pm on Mar 29, 2005 (gmt 0)|
Does anybody has the downloadable version of this video?
It is the one from November, 2002.
It is impossible for me to see it in streaming mode.
| 7:11 pm on Mar 29, 2005 (gmt 0)|
Here is another Google presentation from the "19th ACM Symposium" (October 2003) [ramp.ucsd.edu] which can actually be downloaded and saved (right click). Here is their little paper from that event: [labs.google.com...]