Forum Moderators: open

Message Too Old, No Replies

Back to Google Basics

Is the original theory behind Google still valid?

         

Brilliant

10:12 am on May 26, 2003 (gmt 0)

10+ Year Member




I was re-reading the orginal papers presenting the birth of google and page rank last night. Written by Segei and Larry and presented to Stanford.

"The anatomy of a large scale hypertextual web search engine"

and the prequel to this:

"The PageRank Citation Ranking: Bringing Order to the Web."

Has there been any discussion anywhere as to whether the core, fundamental hypothesis behind Google has changed since this was written by arguably the 2 most powerful men on Earth (and a couple of friends)?

Cos if you filter out the algebra, its quite obvious and straightforward how the whole thing is supposed to work. It stikes me that the tweaks in the algo are only changes to the emphasis on the core weightings, and everything else is pretty standard. All the discussion around Dominic and son of Dominic just appears to me to be the Google fathers trying to adhere even more strongly to their original theories?

Any thoughts on this? Am I being thick/naive? Apols if I asking a question already asked.

vitaplease

10:49 am on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Brilliant welcome to WebmasterWorld,

IMO, the original thoughts are still valid and seem to work for many webmasters. Obvious "over-optimisation" effects have been tuned down to a level over the past years.

If you had not seen it before, this might be interesting as well: [webmasterworld.com...]

albert

11:12 am on May 26, 2003 (gmt 0)

10+ Year Member



Good question, Brilliant.

But there's GoogleGuy speaking in some of his posts about changes in "algorithm or methods" - see

[webmasterworld.com ]

If that means they have really changed something in the core algo, or if they've made only changes in emphasis on the core weightings as you said, is up to speculation so far.

There are some interesting thoughts about themeing around here, though ...

Brilliant

12:44 pm on May 26, 2003 (gmt 0)

10+ Year Member



Thanks vitaplease and albert - I have read these 2 comprehensively. Sorry - I am a returning member after not having posted for a while and my account expiring.

And yes, even with theming its not a new concept - for example Bretts original documents last year (?) on Theming Pyramids and also 15 k a day etc etc are a still imho the best two guides on this forum.

Everything else just seems lost in the general hysteria and hyperbole.

But it's still a great forum although its in danger of losing its value to everyone due to the tide of panic that is currently in full flow due to Dominic.

Perhaps the subject of Brin and Page thesis follow up?

"Post Modern Marketing Techniques Give Rise to Premature Aging, Low Birth Rate, Suicide in the Google Age"

vitaplease

12:53 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome back then!

This page might give a bit more than the first two papers you mentioned:

[labs.google.com...]

(By the way, a site internal Google search on "papers" does not show this page - Dominic?)

seekanddestroy

1:12 pm on May 26, 2003 (gmt 0)



I would liken the original concept for Google to that for the internal combustion engine, a pretty basic theory at the heart of it all, and put into practise works well - usually.

But you see you've got your diesel, petrol, 1 stroke, 2 stroke, 16 valve, V8 etc etc, and if you put the timing out, lose a spark here and there, or tweak too much (let's say like adding Nitrous!) and if you're not careful you either end up on the side of the road going nowhere, or it blows up completely.

Google algo is the same thing - they tweak and it's good, but then the spammers build a steeper hill and they have to retweak, or the competition is catching up and they feel they need a bit more edge, so they tweak some more.

Regular pit stops could go out in favour of mid race refuels, etc.

Indie/Formula 1 Cars and Combined Harvesters have a great deal in common, the internal combustion engine - but just look at what a hundred years or so of tweaking does!

john316

1:13 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



adhere even more strongly to their original theories? nah

The theories were developed in an academic environment, Google is no longer [google.stanford.edu...] ,you need to factor in business considerations.

seekanddestroy

1:23 pm on May 26, 2003 (gmt 0)



Can somebody give the Googlebus a push please, they added 'Dominic' to the fuel recently and now it's blowing smoke and they can't see where they're going at all. I've been sitting behind them for a few weeks now in my Ferrari and I haven't got a clue where I am either!

I hope Googleplex has satellite navigation installed.

8O)

Mohamed_E

1:25 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow, vitaplease, that is an interesting list!

Just scanned it, but saw two documents from the early days of UNIX: Kernighan and Pike's The UNIX Programming Environment, also saw that Eric Schmidt was one of the authors of Lex - A Lexical Analyzer Generator, hence an old time Bell Labs person. I wonder how many other Bell Labs people are at Google now?

vitaplease

1:41 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mohamed,

from: [webmasterworld.com...]

From Googleguys own "post-publication"

Mohamed_E

2:06 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



vitaplease,

Thanks, I missed that thread completely as I was staying away from forum 3 for obvious reasons until the mods and admins restored some sanity to it.

doc_z

2:09 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have to distinguish between two different algorithms: PR alculation and ranking.

The subject of the dominic discussion (as well as GoogleGuy's statement) is the ranking algorithm. This is indepentdent from the PR algorithm by Page et.al. The ingredients of this algorithm are Google's secret and they change the properties regularly.

The other question is if Google is still using the original PR algorithm. (And I'm not only refering to marginal changes as stopping the PR transfer from special pages/sites.) In this case one has to distinguish between the algorithm to numerical compute the PR and the underlying equations. I think that Google has changed the algorithm to compute PR (compared to the simple Jacobi iterationed mentioned in the original papers), because there are a number of other techniques which are computationally less expensive (e.g. block algorithms). The question if they also changed the equations behind, can not be answered cleary. My observation is that they not only changed such things as the damping factor, but also some fundamental basics.