Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

Google as a black box

Let's face it: we're all blind guessing at the way it really works.

5:09 pm on Feb 2, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 29, 2002
votes: 0

First of all, please let me spend a few words to thank the senior members for their often brilliant and sometimes passionate contributions. I think this forum really is an invaluable resource for anyone interested in the SE world and its current leader. Or at least, one of the very best Google-oriented discussion boards I've seen so far. So let's keep up the good work.

Now, back on topic.

I've spent a fair amount of time reading previous posts here, and one thing that really struck me is the fact that about 99% of the assertions about Google's ranking mechanism(s) in this forum are nothing more than realistic hypotheses. After all, the only publicly-available official documentation of the PageRank algorithm appears to be the famous recursive formula from the early paper The Anatomy of a Large-Scale Hypertextual Web Search Engine [www-db.stanford.edu]. Just about everything else I've been reading so far about PageRank seems to be mere inference based on personal (=limited) experience, if not plain speculation.

About PageRank, the best independent analysis that I've been able to find is PageRank Explained [goodlookingcooking.co.uk] by Chris Ridings, who --while illustrating some really interesting theories indeed-- clearly states that "There is, at this point in time, not enough information for us to be 100% certain about anything. I am merely presenting theories, based upon the best information available, which seem to largely hold true".

What's perhaps even more important, from what I've been able to observe from several web site rankings (including my own), PageRank is obviously a single part of a wider, more complicated and partly obscure mechanism. As Chris intelligently argues, "PageRank has its place in the ranking process. That place is not as big as many might imagine. Its significance in the ranking algorithm is less than many other factors [...]".

So, let's be honest: what do we know about what's really going on inside the Google black box? Very little, IMHO. I find it really funny to read about some SEO geek hacking, reverse-engineering, spoofing or otherwise exploiting the Googlebar just to see the actual PR values associated with each URL. Besides, I'm pretty sure the guys at the GooglePlex have lots of fun reading our posts, too. :)

Google's engineers have always been very careful about giving away potential hints (let alone detailed explanations) about the way the SE really works. And they have justified their reserve saying that such information may seriously harm Google's search quality (and thus its users' experience) if disclosed. I deeply respect and appreciate that position, which I think reveals a very mature and responsible corporate philosophy. One may argue that Google's secrets are mainly aimed at retaining their current leadership in the SE market, but then how many other SE's do you know of which are so constantly focused on providing oustanding quality search results, and so genuinely concerned about spamming, cloaking, and other unethical so-called "SEO" practices?

About self-appointed SEO professionals, I really wish those guys could learn something from the WebSeed incident [webmasterworld.com]. Let aside the questionable ethics of behaviours such as setting up a link farm in order to boost a single web site's ranking, and even looking at things from a strictly opportunistic point of view, following Google's tips on how to get a good ranking has proven to be the most effective web page optimization strategy by far. From a fresh interview with Google software engineer Matt Cutts [clickz.com]:

When asked how to gain high rankings, Cutts replied, "The guidelines are pretty simple: Stay away from hidden text, hidden links, cloaking, sneaky redirects, lots of duplicate content on different domains, and doorway pages. [...] The best use of a Webmaster's time is building good content."

So why not just stick to Paul Boutin's wise SEO guidelines [hotwired.lycos.com] instead?

One last thing: I'd like to be proven wrong about the lack of information regarding Google's actual ranking mechanisms, so if anyone knows of any official (or equally reliable) resource about obscure subjects such as theme assessment and the way Google extracts contextual information from web pages and hyperlink structures, please post it here. Although I have hardly any interest for SEO techniques, I am about to write my graduation thesis on Google, so any additional references would be welcome. Thanks.


11:21 am on Feb 7, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
votes: 2

Chris, everything you need to do well in Google is known in terms of optimising rankings, but the methods by which pages, domains, or groups of links are choosen to be flattened is not well known (not by me, anyway).

Recently, my view has become that parts of Google are not very logical. The bizarre choices made when duplicates were encountered in the December crawl weren't logical, but for the most part they were fixed as Google had indicated they would be. The new generation of PR0s is likely to be complex (otherwise it would be widely known by now). Yes, the best course of action is to be careful but I for one am very unssure about what I should be careful about. It seems that to run a group of closely related Web services is enough, depending on how they are arranged.

I remain a big fan of Google, but when it comes to automatic spam traps I no longer assume that Googlebot will act "in an intelligent and logical manner". I've written often that to please users, a search engine will most likely be willing to throw out a lot of 'innocent' content along with the spam. My strong feeling is that is where we are. I would not be surprised if this is relaxed during this crawl, but just in case...


12:23 pm on Feb 7, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 15, 2001
votes: 16

> John316 Success will eventually breed failure.

Love it John
my version in relation to Google and everything else in life:

"from the top the only way is down"

12:52 pm on Feb 7, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 21, 2001
votes: 0


Buy my a drink and I will fill you in. Just kidding - I don't know about that stuff either. I have been spending the last 30 minutes or so over at wisenut.

It is getting more and more like Google. It was like looking at Google from three - four months ago with a little bit of rankings mixed up.

6:46 pm on Feb 7, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 4, 2002
votes: 0


WiseNut certainly is interesting. I do like its potential. But they really need to update the index at least once or twice a year to be taken seriously. :-)

An input of certain keywords yields the same, out of date results that I have been seeing for months, despite submissions to them, emails and even a fax to one of their officials.

It appears to me that WiseNut is simply not taking care of its consumer search business.

I know this is off-topic, and so now we return you to our regularly scheduled thread....

6:48 pm on Feb 7, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 4, 2002
votes: 0

Dang! I'm now a Junior Member! Whoopee!!!
11:02 pm on Feb 7, 2002 (gmt 0)

New User

10+ Year Member

joined:Oct 31, 2001
votes: 0

I will assert that some people who come here not so frequently have a pretty good idea how Google (and other engines) work. Some of those infrequent visitors actually write search engine technology of various types, also.
3:22 am on Feb 8, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Aug 26, 2001
votes: 0

>> we're all blind guessing at the way it really works

Guess what Bill Gates is blind about the way windows really works. He knows how he wanted it to work ... and what he told the programer to do. The programers thought they made it the way Bill Gates wanted.

Software is complex, Bill thought WIN XP was a secure operating system. We know better ;)

I often have seen people who test software know more about how the software really works than those who wrote it. With that in mind keep looking for the patterns they are real.

>> but am concerned about spam penalties because it PR0 would have serious affects

You, me, and alot of other people also. All i can say at this point is:

1> Massive link popularity sites like yahoo seem to become untouchable, more links pointing toward your site based on content is a "get our of jail for free" card.

2> No link pointing at your site should be able to take down your site, to get in trouble you need to try to (in the AI mind of googlebot) attempt to boost the rating of pages linked to you by linking to them. Nobody seems to have a idea how paranoid googlebot has become about feedback links. Many such loops have nothing to do with googlebot.

3> Many nature link structures appear to be uneffected. I have been watching the rating of a site that has many other related sites. That network has a hub and all sites link to the hub and the hub links to all sites. The hub has a honorable position within google, its got all the luck. Its a standard yahoo type stucture across many domains. (although it may have enough other external links to have the get out of jail card)

At this point i would say the best type of links to look for are links that point to your content but you don't need to exchange a link.

PS. there may be enough patterns established that some people may be willing to conclude we know the basic parts of the filter.

3:25 am on Feb 8, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Aug 26, 2001
votes: 0

Oh i am picking on yahoo because they do link to some bad feedback loops. They should not do that.
This 38 message thread spans 2 pages: 38