Some of you are missing the point - (deprecated) SEM Research Topics forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Some of you are missing the point

Textual optimization is only part of the story

ArmchairExecutive

1:27 pm on Dec 2, 2000 (gmt 0)

Hi everybody - this is my first ever post here - thought I could offer some guidance.

I read some people who think sites ranking highly which do not contain the search term in question must be spam. This often is not at all the case. It has a lot to do with the latest algorithms Altavista and others have been using.

A few months ago I read a research paper written by groups of researches including Altavista's research team about how they used link popularity in assessing rankings. Basically this is what I could make of how they rank. I don't guarantee this is precisely what they use but I have a strong inkling based on observation that they use some similar variation.

1. Given a particular search term, search the entire database of crawled sites for a subset of sites with high textual relevance to the term. Lets call this set, H.
2. Collect all of the sites that are linked to (above a linking code relevance threshold) from any site in H. Lets call this set, A.
3. "Hubs" are sites in H which have lots of relevant links to "authorities" in A. "Authorities" are sites in A which are relevantly linked to by the best "hubs". Sound circular? It is - but some neat linear algebra called eigen-analysis makes sense of it and assigns a quality measure to each hub and each authority.
4. The rankings for the chosen search term are the highest ranking authorities selected as per above. (Also hubs seem to be rewarded highly as well - thats why linking to relevant sites can often be beneficial)

So, as you can see, a high ranking authority for a particular search term need not have that particular search term anywhere on the page PROVIDED it is linked to by a lot of hubs that do.

Without an appreciation of the above, I really dont see how anybody could possibly still have their sanity intact ;)

redzone

3:01 pm on Dec 2, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

armchairexecutive:
welcome to WMW...
While some complain about how specific URL's are ranking well for a specific KWP, without it appearing in the title, and sometimes body text.

Most of us are well aware of the link popularity weighting into AV's algorithm, and their has been some discussion over the past year, about Clever(IBM)'s new link popularity, authority hub model...

While AV has been the most visible, as is Google, there are many other's (iWon's Inktomi tuned Algo), that are employing this methodology...

Again, welcome to WMW...

ArmchairExecutive

8:50 pm on Dec 2, 2000 (gmt 0)

redzone,

Thanks for the welcome.

I know that many understand the importance of link popularity. I have read much of this forum and was a participant in Brett's Buddylink program. However without an understanding of the specifics of the actual algorithm, link popularity alone is not enough to explain this lack of need for textual relevance.

The impression I get of many people's understanding of the role of link popularity is that a site that isn't textually relevant to the term cannot be ranked highly even with great link popularity. This would seem to make sense because naturally a lot of sites would link to places like search engines, and sponsors which are not specifically relevant to the term. So, the thinking goes, to overcome the problem that such sites will rank highly for any term, the page should also be optimized textually for the term.

However, the way I believe Altavista gets around this is by ensuring that the linking text is relevant. I don't think Altavista does a great job of this some times because you will occasionally find a site on a more general topic or even unrelated topic ranked highly for a term.

rencke

11:06 pm on Dec 2, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Welcome to WmW ArmchairExcecutive.

That reasearch paper you mentioned, would that be the one presented at the Amsterdam conference in May, where Altavista. Google and Compaq researchers describe the concept of a "Term vector database".

If so, do you redzone, or anyone else know if that is into production yet or still in the labs.

Brett_Tabke

1:38 pm on Dec 3, 2000 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Welcome ArmChairExec.

Rencke, we went through it in this thread on alta research:

[moved to seo research forum, under the title Alta Research]

The paper was published at the w9 conference.

rencke

3:46 pm on Dec 3, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Yes I know Brett. I have followed that thread. What I wondered is simply if we know for a fact whether or not the technique described is actually in use yet. The discussion doesn't state so categorically and Seth_Wilde wrote: "There is no evidence that these method are currently used or ever will be used."

ArmchairExecutive

8:21 pm on Dec 3, 2000 (gmt 0)

I don't think the actual paper describing the algorithm I was referring to was at this conference. However I noticed a reference to it on the papers on the URL Brett posted - [Kleinberg 98]

The paper was called "Authoritative Sources in a Hyperlinked Environment" by Jon M Kleinberg. I dont recall where I got this - but probably through an academic paper search but I will email it to anyone who wants it. However there was an extension to this paper by Altavista (or DEC or Compaq) researchers discussing how to improve this using link relevance. I am trying to locate this paper - it may have been at this conference.

As to whether it has been implemented, search result evidence says yes, but apart from this I vaguely recall them talking about these techniques in a "we-just-implemented-them" way but I will try to confirm this.

Woz

12:14 am on Dec 4, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Jon Kleinberg's site is here [cs.cornell.edu]. Lots of info including the aformentioned paper!

Onya
Woz

ArmchairExecutive

1:36 am on Dec 4, 2000 (gmt 0)

Thanks Woz.

Looks like the Altavista research paper describing the extension to Kleinberg's paper wasnt presented at the conference discussed on the URL Brett posted either. Still looking for it....

My apologies for the sometimes vague recollection - I read a ton of stuff and file away the useful stuff somewhere in my brain and bookmarks (in between disk crashes). I'm going to do a braindump despite possible inaccuracies because I believe there may be some good stuff filed away there somewhere. :)

One thing I recall from either the paper I was referring to or another one along the same lines was another method to determine if two sites that are linked are "relevant".
There is where the term vector database paper posted in the conference Brett mentioned is relevant to this algorithm as I recall -
[www9.org...]
Under this method, two sites are considered "relatively-relevant" if the dot products of their normalized term vectors is above a certain threshold.

In plain language, what this means is, an inward link is considered relevant if there is ANYTHING (of threshold substance) textually in common with two pages. So if your site talks about famous historians, Southpark jokes and train-spotter quotes and your site links to another site which is entirely about Southpark jokes, then the two sites are considered relevant (under this scheme) in link popularity calculations even for the term "famous historians".

Not entirely sure which method is implemented in Altavista at this time - couldnt be too hard to determine this empirically though.

georged

10:00 am on Dec 4, 2000 (gmt 0)

10+ Year Member

>>So if your site talks about famous historians, Southpark jokes and train-spotter quotes and your site links to another site which is entirely about Southpark jokes, then the two sites are considered relevant (under this scheme) in link popularity calculations even for the term "famous historians".

ArmchairExec, this is extremely interesting. But do you mean that there would have to be several of these 'south park, famous historian' sites pointing to the 'south park' site, in order that the 'south park only' site also be relevant for 'famous historians'? (Yeucch! Subjunctive!)Because we are talking about link pop. above a certain threshhold. Or is it enough that the 'south park, famous historians' site is an authority hub, and that's all you need?

boss101

2:22 am on Dec 5, 2000 (gmt 0)

Hi,
this is my first post here. I have seen the whole term vector thing being discussed here in Michael Campbells vault pages and secret reports. the problem is his cell west pages rank well in google but nowhere in altavista for"cell phones" or "cellular phones". I think anything is worth a try on AV if there is some evidence of success in top 10 placement for competetive terms.

redzone

3:32 am on Dec 5, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Who is Michael Campbell? :)

We thought we knew everyone that's a Who's Who in the SEO game.... But then again, we thought they all hung out here...

ArmchairExecutive

4:56 am on Dec 5, 2000 (gmt 0)

George,

> But do you mean that there would have to
> be several of these 'south park, famous
> historian' sites pointing to the 'south
> park' site, in order that the 'south park
> only' site also be relevant for 'famous
> historians'? .....

> Or is it enough that the 'south park,
> famous historians' site is an authority
> hub, and that's all you need?

According to the the algorithm I described, it depends on the "quality" of the hub in your second scenario. The eigen-analysis will give you a spectrum of values for the quality of each hub and authority. Generally, the higher the quality of the hub, the more beneficial the links from them are. However to be a high ranked authority would require a "quality package" (which may just have one or few elements) of hubs pointing to it.

(Remember I am not sure which method of linking relevance they use. I am just telling you what I read in various papers - some written by Altavista researchers)

If this algorithm is indeed what is being used, its pretty clear what the way to get good placement would be. Contact the highest ranking sites for your terms and ask them for a link AND/OR link to all those sites (using relevant text)!

seth_wilde

6:19 am on Dec 5, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

ArmchairExecutive-

The closet paper that I can think of would be the Computing Web Page Reputations [www9.org] Which talks about improving link popularity through hubs and authorities (this is the paper I was reffering to in renke's quote) It's from the department of computer science at the university of toronto.

"Who is Michael Campbell?"

He wrote "Nothing But Net", has a pay for "SEO secrets" site, and a auto submission software that's promoted on Planet Ocean. (although I've never purchased any of his stuff, so I can't give a critique)

cfhoney

4:52 pm on Dec 5, 2000 (gmt 0)

As an aside:

I majored in Mathematics, but no one could ever really explained to me how it would come in handy. I guess now I should be glad that I took Linear Algebra and kept the book!

DrCool

5:42 pm on Dec 5, 2000 (gmt 0)

10+ Year Member

I have seen Michael Campbell's book and there is nothing special in it. There is more information to be found here in these forums than in his book. It is a decent place to start from but nothing earth shattering.

ArmchairExecutive

12:59 am on Dec 8, 2000 (gmt 0)

Ok I have located a paper written by people at DEC which cites the Kleinberg algorithm I described. You can find it here -

ftp://ftp.digital.com/pub/DEC/SRC/publications/monika/sigir98.pdf

Note that this paper does NOT confirm what I said earlier about non-query-specific relevance. This paper takes a normalized dot product between the query term vector and the page term vector. Relevance defined here is very query-specific. (So yes, textual optimization is still very important if this is what is being used at AV)

I'm pretty sure this isn't the paper I was thinking of but its more good evidence that a version of the Kleinberg algorithm is used in Altavista.

cirelle

3:59 pm on Dec 8, 2000 (gmt 0)

Welcome ArmchairExecutive

Thanks for the reference doc. Improved Alg for topic dist etc...

While reading the paperI was continually reminded of the old Dykstra Algo. (Taught in DB Engineering 1st or 2nd year) and was initially designed as a search for the shortest distance between 2 points and is used today in the airline reservation industry.

My impression of the paper was that it focused mainly on the retrieval of info based on a criteon. But reading data from the database can be time consuming and therefore there appears to be several index layers between the search interface and the result set.

The information generated by spiders is 1 classified and indexed on a main index, which intern 2 references a hub index which eventually leads to 3 a target data set.

I wish I had a document that identified the classification algo by which the data is indexed. With that knowlege, I believe consistent top rankings could be achieved.

my 2cents

Again Welcome

c

tedster

9:21 pm on Dec 8, 2000 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>I wish I had a document that identified the classification algo by which the data is indexed.

Amen. It would probably need to be a daily newsletter.

cirelle

10:14 pm on Dec 8, 2000 (gmt 0)

got that right tedster

Brett_Tabke

6:22 am on Jan 17, 2001 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

I think the most important paper to date on link relevance and link context is the title:

Automatic resource compilation by analyzing hyperlink structure and associated text [decweb.ethz.ch]. Same group of names that worked on IBM's clever project. That is the research I've heard Inktomi based their context directory engine on.