Welcome to WebmasterWorld Guest from 54.145.13.215

Message Too Old, No Replies

Google "behind the scenes" video/slideshow

engineer Jeff Dean lectures at Uni. of Washington

     
5:24 pm on Mar 23, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


This is a facinating video with some details I have never heard elsewhere:
[uwtv.org...]
Produced by:University of Washington, October 21, 2004
Runtime:00:55:36
Google: A Behind-the-Scenes Look
In this program, Jeff Dean of Google describes some of these challenges, discusses applications Google has developed, and highlights systems they've built, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. He also shares some interesting observations derived from Google's web data.
6:31 pm on Mar 23, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


most excellent - thanks.
6:39 pm on Mar 23, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


Also check out their oldie but goodie from 2002:
[researchchannel.org...]
Google's Linux cluster currently processes over 150 million queries a day, searching a multi-terabyte web index for every query with an average response time of less than a quarter of a second, with near-100% uptime. In this discussion, Google Fellow Urs Hölzle will describe the software and hardware infrastructure that makes this performance possible, as well as provide an overview of the main problems facing a web search, software architecture, servers and compact rack hardware designs.

For those with massive bandwidth and low latency (warning: my 300k/sec cable isnt even fast enough) you can try using their ultra high quality MPEG2 stream via the IBM "VideoCharger" player which can be found here:
[www-306.ibm.com...]

These videos can be saved permanently using HiDownload, WMRecorder or similar - some of the slides are worthy of much closer study ;)

(for example in this video it is the first time I have heard of google "shards" [google.com] but maybe I just haven't been paying attention?)

7:28 pm on Mar 23, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


Ya, shards are not new really.

This is fun too ;-)
[google.com...]

8:04 pm on Mar 23, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 28, 2003
posts:188
votes: 0


Just watching the video and one of his explanations caught my ear. The Google engineer confirms that PageRank is query independent, meaning that the Pagerank contribution to sorting search results doesn't take into account what terms are being searched for.

This would seem to confirm that for ranking purposes related to PR (not anchor text or other criteria), the theme of the crosslinked sites is irrelevent.

That bit is about 12-13 minutes into the show.

.... back to the video ....

8:20 pm on Mar 23, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 13, 2003
posts:379
votes: 0


Very interesting video.
I liked a part where he talks about query clustering. You may say it is related to "thesaurus" or even "LSI". But giving ranking boost based on high cluster points was the most interesting part.
8:45 pm on Mar 23, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 9, 2003
posts:91
votes: 0


Help guys

>> I am unable to view video. Tried IBM Charger thingy ..

Looks like server crashed. Can some one sticky me the video ... if they have it?

Thanks

Don

9:43 pm on Mar 23, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 9, 2003
posts:91
votes: 0


I guess my frustration lead me to find archived or downloadable copies of this presentation.

Here goes : [norfolk.cs.washington.edu...]

Hope the "powers" don't edit out the URL

Don

10:36 pm on Mar 23, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


thanks.
11:19 pm on Mar 23, 2005 (gmt 0)

Senior Member from DE 

WebmasterWorld Senior Member 10+ Year Member

joined:May 25, 2002
posts:926
votes: 0


what I consider interesting:

WHY are they saving the higher PR shards more often? That confuses me in terms of relevancy...

If I save the high PR shards more often and the lower PR shards less often, everything comes down to PR, which is simply not the case.

Keyword in title, incoming named links, etc. are surly outweighting PR very often, why not saving "often searched keyword" shards more often? Or am I thinking too SEO for that?

Or is the keyword density of an "often searched keyword" in a document influencing it's PR? Surely not in the original formular!

Cheers,
Puzzler

11:28 pm on Mar 23, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Nov 14, 2003
posts:260
votes: 0


Hehe, since I live in Seattle, I've seen that on the UWTV public access channel a couple times now. Slightly more interesting to me than the calculus lectures. Nice of them to put it up on the web ;)
11:33 pm on Mar 23, 2005 (gmt 0)

Senior Member from DE 

WebmasterWorld Senior Member 10+ Year Member

joined:May 25, 2002
posts:926
votes: 0


hehehe, I should first watch to the end! now with the clusters, it makes much more sense :-)
12:11 am on Mar 24, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Yah, that's a pretty good talk. :)
1:14 am on Mar 24, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 11, 2003
posts:427
votes: 0


GG - think you guys/girls could release the tool used to view the model of clusters? :)
2:21 am on Mar 24, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 28, 2003
posts:188
votes: 0


iblaine,
Shhhh....... check out the adwords keyword suggestion tool and you'll find something remarkably similar to the tool you saw in the video, just without the numbers.

I know I shouldn't give ALL the secrets away, but sometimes I can't resist. :)

10:28 am on Mar 24, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 8, 2003
posts:548
votes: 0


> I know I shouldn't give ALL the secrets away, but sometimes I can't resist. :)

Yeah, you should be Sharper than that! ;-) But thanks anyway, I'll have to watch it at home. Can't do it at the office. Boss isn't deaf, unfortunately.

Oops, suddenly I'm preferred? Good!

3:09 pm on Mar 24, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


> That confuses me in terms of relevancy...

But does not in terms of spider frequency.

10:48 pm on Mar 24, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 24, 2004
posts:95
votes: 0


I guess my frustration lead me to find archived or downloadable copies of this presentation.

Here goes : [norfolk.cs.washington.edu...]

Thanks Don.

I spent almost an hour searching for a good streaming media downloader for my mac. And i got a wireless dialup which goes max at 144kbs.

Was getting frustrated when i saw you URL. Its currently downloading. Cant wait.

Think its about time i visit the Mac Webmaster forum.

10:25 pm on Mar 25, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 8, 2003
posts:548
votes: 0


I liked the joke about the workers ... Didn't seem to work with that kind of audience, though ...

Seriously, one thing makes me wonder. Google pride themselves how they can provide reliable service on unreliable hardware using fault tolerant software. Kudos to them but how do the other SE's do it? Dont't they need a similar kind of infrastructure? Maybe not, considering that they only get a fraction of the traffic Google gets. What would happen if, say, MSN suddenly got a huge increase in traffic? Would they just die?

2:44 pm on Mar 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


So any other little discoveries from this video?
6:50 pm on Mar 29, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 31, 2005
posts:1651
votes: 0


Does anybody has the downloadable version of this video?
It is the one from November, 2002.

[uwtv.org...]

It is impossible for me to see it in streaming mode.

7:11 pm on Mar 29, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


Here is another Google presentation from the "19th ACM Symposium" (October 2003) [ramp.ucsd.edu] which can actually be downloaded and saved (right click). Here is their little paper from that event: [labs.google.com...]