homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

The Google Cluster Architecture
Article by Google engineers

 5:37 pm on May 19, 2003 (gmt 0)

I've just read about this article on Aaron Swartz's Google Blog and haven't seen it being mentioned here:


Nothing really new, but nicely summarized and technically detailed information on how Google responds to a query.



 5:45 pm on May 19, 2003 (gmt 0)

Good find Markus...I hadn't seen that.




 5:56 pm on May 19, 2003 (gmt 0)

Great find Markus, As a system admin its very intersting!


 7:32 pm on May 19, 2003 (gmt 0)

If you like that paper, you might also enjoy

It's a partial list of different papers that Googlers have published. There's enough technical papers there to overload most SEOs, but that page is like honey to pull in great engineers. :)


 7:38 pm on May 19, 2003 (gmt 0)

I enjoy.

It's like what I was saying in another post [webmasterworld.com], Google is more than a search engine, it's also a web site.


 8:23 pm on May 19, 2003 (gmt 0)

that is a really interesting article to help understand what was the traditional path of a query request to Google, and might still be.

The question I have always had, and if someone could shed some light: Up until now, were the various data center index servers all querying the same document servers?


 4:44 am on May 20, 2003 (gmt 0)


Hey thanks Googleguy, have I been sleeping or is this a new honey pot?

jeremy goodrich

 4:53 am on May 20, 2003 (gmt 0)

Wait a second, we're being had!

This forum has high PageRank, right? And we all *know* that if you get a link from a related site - that is high in PageRank - it will benefit you in the SERP's, right?

GoogleGuy, I never thought I would see you doing SEO - and in public!

Do you feel bad now or what? If I see that page riding high in the SERP's for 'money keywords' what will happen if I spam report you? lol.

webmasterworld nick: jeremy_goodrich
subject: GoogleGuy link dropping

message: I saw him do it - right out in the open!
trying to inflate the PR of Google.com, as if it's
not high enough!

Expect the action on the SERP's to be found shortly, lol.

Thanks for the link seriously. :) One more thing on the evening 'to do' list.


 12:14 pm on May 20, 2003 (gmt 0)

WOW GoogleGuy ... That list is awsome!.

Now I am dreaming about finding a paper breaking down the list of those 100 variables :)

On a more serious note this massive parallel computing technique using commodity intel boxes can be very well extended beyond web environment in other computing intensive but stateless fields like Gnome mapping , crash test simulation ( auto companies use expensive cray supercomputers for this) , SETI like projects etc etc...

But its very unsuitable for big database applications like finance/payroll ( this is a killer money making area where SUN/HP/IBM servers rule! )


 3:53 pm on May 20, 2003 (gmt 0)

Thanks for the paper Markus. That'll make for some good reading.

And great new feature Googleguy, thanks!

Looks like there are good bits of info in there that were missed.

Just be sure to pass on to the guys who maintain that page to keep it updated frequently. :)


 4:03 pm on May 20, 2003 (gmt 0)

Too funny, jeremy goodrich. :) vitaplease, it's part of a relatively new honeypot, but for engineers. I'm surprised that WebmasterWorld folks didn't find it already--makes me think people are paying too much attention to SJ. ;)

The part that I personally like the most is this: this page has been up for a little while. At the same time, some article quoted another SE rep saying "Google never publishes any papers now; they squirrel away their knowledge" or something like that. The juxtaposition was a little humorous to me, at least, esp. given this page and the details in the IEEE paper. I'm not aware of any other search engines publishing papers like that lately. :)

Oh well. People knock on Google sometimes. If you just keep doing what you know is right, things seem to work out just fine. :)


 11:13 pm on May 20, 2003 (gmt 0)

Thanks GG. I have some engineer friends from your world that will definitely love this info. I've told one of them who would be a great match for Google about you hiring engineers in NY, but he already made some serious dough off an IPO so I doubt you guys would offer him anything he'd find interesting (unless it's consulting work). He's too busy enjoying his boat ;)

As for knocking Google, rest assured that if you achieve success, people will knock you down or try. Take it as a compliment. No sense letting it bother you. If it *doesn't* happen then it means you're nobody or you're doing something wrong :)

P.S. Have you ever heard of a database called "R"? I always wondered if Google ever used tools like that which are great for processing batch data like PR should be, or if it was 100% custom made.

I was once looking to try out R for a project and searched Google (a couple of years back) but couldn't find it! I had a team of people search for it and finally found it. But while speaking I just did a search on "r" (not even adding the word database) and it was the first SERP! So I guess you folks have made some progress in the intervening time! I mean, how much harder can it be to find a page than using one letter search?


 4:12 am on May 21, 2003 (gmt 0)

That's a pretty hard search. :) I would prod your engineer friend to apply. We've gotten some top-notch engineers lately. I just found out today that we hired a really good person that I was rooting for. :)


 6:23 am on May 27, 2003 (gmt 0)

Finally read it Markus, very nice read.

Related past article on not going for the fastest chip: Forget Moore's Law [redherring.com]



 8:49 am on May 27, 2003 (gmt 0)

Hey GoogleGuy,

I am an Engineer by Profession , not a Masters but Bachelors.
Would love to one day work in Google. :)

Sorry moderator if I crossed TOS.

After all Google is the best known corporate in the whole world.
I doubt anyone can match your popularity worldwide.

What interested me about the papers section is the Genetic Algorithm and Artificial Intelligence uproach.It's truly remarkable that google has somebody who did some research in this area.This approach coupled with Evolutionary Algorithms becomes the next century science called Complexity..my favourite area. To Learn more about complexity visit www.santafe.edu, the institute opened by Noble Laurates.

My point is that google has an outstanding resource pool of engineers going by the papers alone.

I am really happy that google such wide variety of people at their resources.


 9:55 am on May 27, 2003 (gmt 0)

according to Google's SEO rules each webmaster should keep the number of links under 100 on each page. This page has over 200...

It seems like they not only are looking for great engineers but also can use a new webmaster/SEO specialist as well ;)

anyway, great resource, this must definitively keeps everybody out of the current whining-and-exiting threads about the movement on SJ and other datacenters...


 10:36 am on May 27, 2003 (gmt 0)

>>according to Google's SEO rules each webmaster should keep the number of links under 100 on each page. This page has over 200...

Matt Cutts mentioned at Pubcon that should probably better have been 101 kb.
also: [webmasterworld.com...]


 10:40 am on May 27, 2003 (gmt 0)

Hmmm yummy, here goes my sleeping and eating time I'm afraid.

I've seen quite a few of these papers already from the stanford repositories though. The also publish a lot of Google related things.

Besides John Koza and more recent papers based on his work, Google related publishings are my favourite fodder :)

Now I can compare my own search engien and database engines to see how clos I got to the google system. I'm a big fan of clusterign and am still dreaming of the day I can buy a few docent old PCs to try out my clustered GP algos :)

What do you do with the previous generation PCs? give em all to unis? Where can I apply for a "hardware grant" hehe...

Seriously, Working at google is a bit like dying and going to heaven... Nothing left after that, after all it seems the place to develop and make real ideas.

Keep up the good work, I'll get beck to you when I'm done reading the papers ;)



 2:38 pm on May 27, 2003 (gmt 0)

hey Killory,

John Koza and GPs Algo :).

If john Holland invented Genetic Algorithm,then John Koza brought it to life at stanford.

That's the match I was looking for a long time mate.

>How close to Google System

Same here,I am trying at a smaller scale though.

Do report here abt your computation time and results.

>>>Seriously, Working at google is a bit like dying and going to heaven... Nothing left after that, after all it seems the place to develop and make real ideas.

Very Well Said KillRoy. :)



 3:20 pm on May 27, 2003 (gmt 0)

Hi GoogleGuy,

I tried running some of these articles through Google's translation service, and still couldn't understand them. It sure would be handy if it could translate engineerian into plain English. ;)


 1:13 am on May 28, 2003 (gmt 0)

Hmm yuuuummy :) I already got a gazillion ideas. I bet you can built a kick ass automatic text categorizer using genetic programming, much more accurately then the bayesian networks... hmm hmm hmm... now if I JUST had that mini cluster of a dozend or so PCs...

damn you Google, another sleepless night...


[edit]typo [edit]typo in edit[/edit][/edit]


 10:17 am on May 28, 2003 (gmt 0)

Hmm added word frequency analysis to my search index... thinking of adding statistical phrases, as in mitra97, but the payoff seems to be relatively small.

Thanks again for thiss well spring of information.



 2:30 pm on May 29, 2003 (gmt 0)

I was trying to get Latent semantic Indexing using Genetic Algorithm!,
Couldn't go beyond a certain time consuming process.But The only hitch with GP's seem to be TIME!

>>I bet you can built a kick ass automatic text categorizer using genetic programming, much more accurately then the bayesian networks.

Perhaps the Best search Engine that evolues on it's own with the web, can be built using GA.That's like Adaptive Search Engine for me.



 3:11 pm on May 29, 2003 (gmt 0)

And what better way to built it then with distributed systems. I've run my own tests with multithreaded colonies of populations and immigration caches. This can be very coarse grained, and loosly synchronised (or not at all) leadign to near 100% efficiency and near zero network lag.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved