Larry and Sergey must changed their minds - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Larry and Sergey must changed their minds

tedster

7:26 am on Jan 18, 2001 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

In 1997 Google founders Larry Page and Sergey Brin wrote this paper [www7.scu.edu.au] introducing Google.

Here's a quote that sounds rather odd today:

"One of our main goals in designing Google was to set up an environment where other researchers can come in quickly, process large chunks of the web, and produce interesting results that would have been very difficult to produce otherwise."

I guess they were talking about academic research, not commercially useful research. Still, this is a fascinating paper, especially reading it now, over three years later. Here's a quote that I found to be a rather juicy tidbit:

"PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank."

Also extremely interesting is the thematic diagram of the original Google in section 4.1 of the paper.

BoneHeadicus

7:54 am on Jan 18, 2001 (gmt 0)

10+ Year Member

That's one of the papers I read that got me started on the term vector route. I'm reading two now that Brett pointed out in another thread and they all seem to point towards relevent linking by keyword more than anything else..in a nutshell of course.

tedster

8:21 am on Jan 18, 2001 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I follow you, BH. I read this Page/Brin document at least a year ago, but I juat stumbled over it again and it sure reads differently to me today. Remembering that this was written before Google went big time, I still find these quotes very interesting:

We also plan to support user context (like the user's location)
There are even numerous companies which specialize in manipulating search engines for profit.
As for link text, we are experimenting with using text surrounding links in addition to the link text itself.
Every hitlist includes position, font, and capitalization information.
There are two types of hits: fancy hits and plain hits. Fancy hits include hits occurring in a URL, title, anchor text, or meta tag.
Plain hits include everything else. A plain hit consists of a capitalization bit, font size, and 12 bits of word position in a document (all positions higher than 4095 are labeled 4096). Font size is represented relative to the rest of the document.

BoneHeadicus

3:44 pm on Jan 18, 2001 (gmt 0)

10+ Year Member

Well....money is the root of all evil....guess the boys had high ideals and good intentions but the world has a way of beating you down and making things conform to its standards...especially in business.

It occurs to me that Google is easily spammed and I am led to be beleive it does give weight to text emphasis pretty heavily. it seems to pull strings of text from a document as well...sometimes seemingly unrelated to the context of the keywords.

Fusioneer

3:19 pm on Jan 19, 2001 (gmt 0)

10+ Year Member

I was about to post this in another thread, but I think it bears sharing here as well.

Regarding Google's recent adjusting of NOFRAMES content:

I have a page on one site that was ranked #1 for a two word search phrase - until Google changed their algo for NOFRAMES a few weeks back and it disappeared from the listing.

It wasn't an important term for the site - just incidental - so I didn't bother re-authoring or tweaking.

This week it has bounced all the way to the #1 slot.

Now the odd thing about this page is that it is a frameset containing a full-screen flash file.

The next oddest thing, when I checked the page, is that the keyword is not in the title, or anywhere in the framed pages.

Oddest of all - this is an absolute first - the keyword does not appear ANYWHERE in the frameset or the NOFRAMES content. In fact there is no NOFRAMES content at all.

So how has this page achieved #1 ranking for a keyword phrase that does not exist ANYWHERE within the frameset or framed pages? No, I am not cloaking.

There is a single link from another page of the site that matches the keyword phrase (2 word) exactly. But that is all folks.

Curiouser and curiouser. But testament to Google weighting of inbound links to pages and their text content.

Anyone had similar experiences?

- Fusioneer

bigjohnt

11:27 pm on Jan 19, 2001 (gmt 0)

10+ Year Member

Yup. Had similar experiences, before goggle booted my domain<I STILL dont know why, maybe it was giving good advice?>
I have seen Hundreds of #1 - #10 ranking pages with absolutely no keywords on them, <hint - the cache will tell you the answer almost every time>

tedster

11:42 pm on Jan 19, 2001 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Same thing here. One page ended up well ranked for a keyword that I had intentionally kept off the site. And just as you experienced, the kw was in link text from another site.

Title of the linking page also figures in and text surrounding the link as well.

In the paper I cited above, they say that finding exact ranking formulas for combining on-page and link-page factors is a "black art". Indeed!

Fusioneer

1:25 am on Jan 20, 2001 (gmt 0)

10+ Year Member

Some more notes on the black art:
Sometimes top 10 slots for contested terms are cloaked and show no keywords (bad giveaway in my opinion) - I've been able to penetrate some with a Cold fusion script that sets a different UA, but now I'm thinking some I thought were cloaked might just have greate link text ;)

I checked the cached page - just to make sure I didn't have noframes content in there at some point.

Google's text states:
These terms only appear in links pointing to this page: "keyword keyword"

Still seems odd to me how a page like this could beat out many others with page titles including the term, keywords repeated in text etc.

It is not a "popular" page either - no-one external to the site is linking to it.