Forum Moderators: open
Here's a quote that sounds rather odd today:
"One of our main goals in designing Google was to set up an environment where other researchers can come in quickly, process large chunks of the web, and produce interesting results that would have been very difficult to produce otherwise."
I guess they were talking about academic research, not commercially useful research. Still, this is a fascinating paper, especially reading it now, over three years later. Here's a quote that I found to be a rather juicy tidbit:
"PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank."
Also extremely interesting is the thematic diagram of the original Google in section 4.1 of the paper.
We also plan to support user context (like the user's location)There are even numerous companies which specialize in manipulating search engines for profit.
As for link text, we are experimenting with using text surrounding links in addition to the link text itself.
Every hitlist includes position, font, and capitalization information.
There are two types of hits: fancy hits and plain hits. Fancy hits include hits occurring in a URL, title, anchor text, or meta tag.
Plain hits include everything else. A plain hit consists of a capitalization bit, font size, and 12 bits of word position in a document (all positions higher than 4095 are labeled 4096). Font size is represented relative to the rest of the document.
It occurs to me that Google is easily spammed and I am led to be beleive it does give weight to text emphasis pretty heavily. it seems to pull strings of text from a document as well...sometimes seemingly unrelated to the context of the keywords.
Regarding Google's recent adjusting of NOFRAMES content:
I have a page on one site that was ranked #1 for a two word search phrase - until Google changed their algo for NOFRAMES a few weeks back and it disappeared from the listing.
It wasn't an important term for the site - just incidental - so I didn't bother re-authoring or tweaking.
This week it has bounced all the way to the #1 slot.
Now the odd thing about this page is that it is a frameset containing a full-screen flash file.
The next oddest thing, when I checked the page, is that the keyword is not in the title, or anywhere in the framed pages.
Oddest of all - this is an absolute first - the keyword does not appear ANYWHERE in the frameset or the NOFRAMES content. In fact there is no NOFRAMES content at all.
So how has this page achieved #1 ranking for a keyword phrase that does not exist ANYWHERE within the frameset or framed pages? No, I am not cloaking.
There is a single link from another page of the site that matches the keyword phrase (2 word) exactly. But that is all folks.
Curiouser and curiouser. But testament to Google weighting of inbound links to pages and their text content.
Anyone had similar experiences?
- Fusioneer
Title of the linking page also figures in and text surrounding the link as well.
In the paper I cited above, they say that finding exact ranking formulas for combining on-page and link-page factors is a "black art". Indeed!
I checked the cached page - just to make sure I didn't have noframes content in there at some point.
Google's text states:
These terms only appear in links pointing to this page: "keyword keyword"
Still seems odd to me how a page like this could beat out many others with page titles including the term, keywords repeated in text etc.
It is not a "popular" page either - no-one external to the site is linking to it.