welcome to WMW...
While some complain about how specific URL's are ranking well for a specific KWP, without it appearing in the title, and sometimes body text.
Most of us are well aware of the link popularity weighting into AV's algorithm, and their has been some discussion over the past year, about Clever(IBM)'s new link popularity, authority hub model...
While AV has been the most visible, as is Google, there are many other's (iWon's Inktomi tuned Algo), that are employing this methodology...
Again, welcome to WMW...
Thanks for the welcome.
I know that many understand the importance of link popularity. I have read much of this forum and was a participant in Brett's Buddylink program. However without an understanding of the specifics of the actual algorithm, link popularity alone is not enough to explain this lack of need for textual relevance.
The impression I get of many people's understanding of the role of link popularity is that a site that isn't textually relevant to the term cannot be ranked highly even with great link popularity. This would seem to make sense because naturally a lot of sites would link to places like search engines, and sponsors which are not specifically relevant to the term. So, the thinking goes, to overcome the problem that such sites will rank highly for any term, the page should also be optimized textually for the term.
However, the way I believe Altavista gets around this is by ensuring that the linking text is relevant. I don't think Altavista does a great job of this some times because you will occasionally find a site on a more general topic or even unrelated topic ranked highly for a term.
Welcome to WmW ArmchairExcecutive.
That reasearch paper you mentioned, would that be the one presented at the Amsterdam conference in May, where Altavista. Google and Compaq researchers describe the concept of a "Term vector database".
If so, do you redzone, or anyone else know if that is into production yet or still in the labs.
Rencke, we went through it in this thread on alta research:
[moved to seo research forum, under the title Alta Research]
The paper was published at the w9 conference.
Yes I know Brett. I have followed that thread. What I wondered is simply if we know for a fact whether or not the technique described is actually in use yet. The discussion doesn't state so categorically and Seth_Wilde wrote: "There is no evidence that these method are currently used or ever will be used."
I don't think the actual paper describing the algorithm I was referring to was at this conference. However I noticed a reference to it on the papers on the URL Brett posted - [Kleinberg 98]
The paper was called "Authoritative Sources in a Hyperlinked Environment" by Jon M Kleinberg. I dont recall where I got this - but probably through an academic paper search but I will email it to anyone who wants it. However there was an extension to this paper by Altavista (or DEC or Compaq) researchers discussing how to improve this using link relevance. I am trying to locate this paper - it may have been at this conference.
As to whether it has been implemented, search result evidence says yes, but apart from this I vaguely recall them talking about these techniques in a "we-just-implemented-them" way but I will try to confirm this.
Jon Kleinberg's site is here [cs.cornell.edu]. Lots of info including the aformentioned paper!
Looks like the Altavista research paper describing the extension to Kleinberg's paper wasnt presented at the conference discussed on the URL Brett posted either. Still looking for it....
My apologies for the sometimes vague recollection - I read a ton of stuff and file away the useful stuff somewhere in my brain and bookmarks (in between disk crashes). I'm going to do a braindump despite possible inaccuracies because I believe there may be some good stuff filed away there somewhere. :)
One thing I recall from either the paper I was referring to or another one along the same lines was another method to determine if two sites that are linked are "relevant".
There is where the term vector database paper posted in the conference Brett mentioned is relevant to this algorithm as I recall -
Under this method, two sites are considered "relatively-relevant" if the dot products of their normalized term vectors is above a certain threshold.
In plain language, what this means is, an inward link is considered relevant if there is ANYTHING (of threshold substance) textually in common with two pages. So if your site talks about famous historians, Southpark jokes and train-spotter quotes and your site links to another site which is entirely about Southpark jokes, then the two sites are considered relevant (under this scheme) in link popularity calculations even for the term "famous historians".
Not entirely sure which method is implemented in Altavista at this time - couldnt be too hard to determine this empirically though.
>>So if your site talks about famous historians, Southpark jokes and train-spotter quotes and your site links to another site which is entirely about Southpark jokes, then the two sites are considered relevant (under this scheme) in link popularity calculations even for the term "famous historians".
ArmchairExec, this is extremely interesting. But do you mean that there would have to be several of these 'south park, famous historian' sites pointing to the 'south park' site, in order that the 'south park only' site also be relevant for 'famous historians'? (Yeucch! Subjunctive!)Because we are talking about link pop. above a certain threshhold. Or is it enough that the 'south park, famous historians' site is an authority hub, and that's all you need?
this is my first post here. I have seen the whole term vector thing being discussed here in Michael Campbells vault pages and secret reports. the problem is his cell west pages rank well in google but nowhere in altavista for"cell phones" or "cellular phones". I think anything is worth a try on AV if there is some evidence of success in top 10 placement for competetive terms.
Who is Michael Campbell? :)
We thought we knew everyone that's a Who's Who in the SEO game.... But then again, we thought they all hung out here...
> But do you mean that there would have to
> be several of these 'south park, famous
> historian' sites pointing to the 'south
> park' site, in order that the 'south park
> only' site also be relevant for 'famous
> historians'? .....
> Or is it enough that the 'south park,
> famous historians' site is an authority
> hub, and that's all you need?
According to the the algorithm I described, it depends on the "quality" of the hub in your second scenario. The eigen-analysis will give you a spectrum of values for the quality of each hub and authority. Generally, the higher the quality of the hub, the more beneficial the links from them are. However to be a high ranked authority would require a "quality package" (which may just have one or few elements) of hubs pointing to it.
(Remember I am not sure which method of linking relevance they use. I am just telling you what I read in various papers - some written by Altavista researchers)
If this algorithm is indeed what is being used, its pretty clear what the way to get good placement would be. Contact the highest ranking sites for your terms and ask them for a link AND/OR link to all those sites (using relevant text)!
The closet paper that I can think of would be the Computing Web Page Reputations [www9.org] Which talks about improving link popularity through hubs and authorities (this is the paper I was reffering to in renke's quote) It's from the department of computer science at the university of toronto.
"Who is Michael Campbell?"
He wrote "Nothing But Net", has a pay for "SEO secrets" site, and a auto submission software that's promoted on Planet Ocean. (although I've never purchased any of his stuff, so I can't give a critique)
As an aside:
I majored in Mathematics, but no one could ever really explained to me how it would come in handy. I guess now I should be glad that I took Linear Algebra and kept the book!
I have seen Michael Campbell's book and there is nothing special in it. There is more information to be found here in these forums than in his book. It is a decent place to start from but nothing earth shattering.
Ok I have located a paper written by people at DEC which cites the Kleinberg algorithm I described. You can find it here -
Note that this paper does NOT confirm what I said earlier about non-query-specific relevance. This paper takes a normalized dot product between the query term vector and the page term vector. Relevance defined here is very query-specific. (So yes, textual optimization is still very important if this is what is being used at AV)
I'm pretty sure this isn't the paper I was thinking of but its more good evidence that a version of the Kleinberg algorithm is used in Altavista.
Thanks for the reference doc. Improved Alg for topic dist etc...
While reading the paperI was continually reminded of the old Dykstra Algo. (Taught in DB Engineering 1st or 2nd year) and was initially designed as a search for the shortest distance between 2 points and is used today in the airline reservation industry.
My impression of the paper was that it focused mainly on the retrieval of info based on a criteon. But reading data from the database can be time consuming and therefore there appears to be several index layers between the search interface and the result set.
The information generated by spiders is 1 classified and indexed on a main index, which intern 2 references a hub index which eventually leads to 3 a target data set.
I wish I had a document that identified the classification algo by which the data is indexed. With that knowlege, I believe consistent top rankings could be achieved.
>I wish I had a document that identified the classification algo by which the data is indexed.
Amen. It would probably need to be a daily newsletter.
got that right tedster
I think the most important paper to date on link relevance and link context is the title:
Automatic resource compilation by analyzing hyperlink structure and associated text [decweb.ethz.ch]. Same group of names that worked on IBM's clever project. That is the research I've heard Inktomi based their context directory engine on.