Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Latent Semantic Indexing and Backlinks

         

IsItUPYet

1:03 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



There's been much talk of latent semantic indexing of late, and there still seems to be a degree of uncertainty regarding how much of a factor, if any, it plays in G SERPS.

In recent years backlinks have slowly gained more and more authority to the point where they have made all other factors almost obsolete. However, the problem with backlinks is that they reward popularity, rather than relevance, which is to my mind the main purpose of a search engine.

The theory of LSI would surely make body content less manipulable and reduce the dependance upon backlinks and PR; but is this likely to happen?

Does anyone believe that LSI now is, or will become in the near future a pivitol factor in deciding rankings?

I'd appreciate any thoughts.

ThomasB

2:30 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can certainly see LSI being implemented into G's algo to some degree. But there again I still can't see it being that important that you can't rank without having a certain percentage of semantic related links. In general I seriously doubt that SEs will be aible to implement LSI completely within the next 6-12 months and still have a great index without making it a minor ranking factor.

IsItUPYet

2:45 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



"In general I seriously doubt that SEs will be aible to implement LSI completely within the next 6-12 months"

I'd certainly agree with that. However, I'd imagine with something as technologically challenging as LSI that it would be phased in and gradually improve in terms of weight and intelligence; much in the same way that backlinks have.

This would mean that we could see the effects of LSI, even if they are initially minor, sooner than might be expected, no?

selomelo

5:24 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



I have some personal experience supporting the LSI hypothesis: Recently, I noticed something bizzare when checking logs. My site receives visitors (a few) from Google with a search query that would be impossible without a semantic interpretation of backlinks.

The search query in question is a three word expression: kw1 + lyrics + kw2

My site has nothing to do with lyrics, and the word "lyrics" is not mentioned at all. When researching how this could be, I observed that a ring page with a link to my site includes also a link to another site with a description that contains the word "lyrics."
google: site:mydomain.com lyrics = 0 results
google: mydomain (a unique name) lyrics = 5 results (all from the same website linking to my site)

This means that Google should be taking the whole page content into account when evaluating the SERP position of a link for a certain keyphrase:
kw1 (included in my anchor text) + lyrics (included in another link's description) + kw2 (included in my anchor text) -> a serp position for my site for the kw1 + non-kw + kw2

jd01

12:42 am on Nov 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



selomelo -- Interesting...

I think it will be some time before full LSI can be implemented. The comments I have heard/read on this say they processing power necessary to apply LSI to the full G index is too great to make it feasible currently.

It would not suprize me though if the guys at G were working with a type of 'hybrid', that reduces the processing necessary to apply this sort of algo (heuristic) on a large scale. (Not sure how they would go about it, but they are 'kinda smart' and could probably come up with something workable if this is the direction they are going.)

Justin

martinibuster

12:46 am on Nov 12, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Anybody notice that Disney isn't number one anymore for the term EXIT in the new index viewable on 66.102.9.104?

selomelo

1:31 am on Nov 12, 2005 (gmt 0)

10+ Year Member



I agree with Justin that some kind of heuristics might be involved in LSI. Otherwise, an exhaustive processing would require tremendous resources.

I am a Psychology major, and heuristics was one of my favorite interests. And from another field in which Google has also some ambitions (machine translation), I know how it is difficult to implement heuristics in software engineering for language processing. Yet, I have some other observations that would support the LSI hypothesis, however rudimentary:

My site is in English, but received a few visitors from Google with a keyword combination consisting of some words in English and some words in a different language. Usually, backlinks to my site has a three-words anchor text in English. In a single instance, I have a backlink with the same three-word anchor text, but this time in another language.

The interesting thing I observed is this: Sometimes, surfers type unusual search strings when searching over the internet. Even, they use some words in one language, and some others in another language. My site received some visitors during last September for the keyword combination: kw1 (in a different language) + kw2 (in a different language) + kw3 (in English).

It was very interesting, and I begun to monitor it daily. Also, I checked Yahoo and MSN for a similar occurrence. But it was isolated to Google, and for two weeks, Google delivered the same SERP for the keyword combination, my site as #1. However, by the end of September, it disappeared. I think they tested it for a while, and perhaps left it aside for further development.

But the “lyrics” combination I mentioned previously still works (my site as #3 in SERPs) in live datacenters. However, in 66.102.9.104 (aka J3), it has gone, since the ring page in question deleted the link to my site! This might be considered as a further confirmation.

annej

7:42 am on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Fascinating stuff. Thanks for starting this thread.

neuron

8:56 am on Nov 13, 2005 (gmt 0)

10+ Year Member



Do you guys remember when Google announced Stemming? This was like two years ago. Duh! Stemming is the major artifact/indicator of LSI.

Google announced Stemming as if it was a feature, but it is a bug of LSI. LSI was implemented by Google a long time ago. If you want to know the degree of implementation, the best way to measure it would be to measure the current saturation/extent of stemming.

while the papers I've read on LSI apply to on-page content, and that is what I believe happened two years ago, Google implmented LSI to on-page factors (Implementation of on-page LSI mmay have been prior to the Florida update, which to me is the "hilltop" update, but it was close to it), it is also possible that LSI can also be applied to the linking content of the web as well.

I believe the Allegra update of February, this year, was in large part implementation to LSI to linking content.

I'm not gonna look all this up for you fellows, there have been posts about this here before, but I can't currently google them. Perhaps something to do with the server change recently and archives.

In addition, the original LSI work delineated an enormous computational real time load, but I think that was for the purpose of clarity, depth and breadth of explanation in a scholarly format, not because the process could not be modularized into a realtime much-less computationally intensive useful application, and it was certainly not the end of a good idea that could be easily applied

jd01

9:59 am on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you guys remember when Google announced Stemming? This was like two years ago. Duh! Stemming is the major artifact/indicator of LSI.

OK -- You're Right -- We're Wrong -- I haven't read any of the papers or posts on LSI.
...or maybe I have and I know there are some papers here:
[pandia.com...]

and there are posts on it here:
[webmasterworld.com...]

and some more papers here:
[cs.utk.edu...]

Justin

Marcia

10:30 am on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It doesn't have to totally be LSI as we understand it from what's out there - and besides, LSI as such is actually a proprietary technology as well as resource intensive in its original implementation.

Nothing much is new under the sun, things just carry different weights and different names and metamorphisize into different incarnations, but check this OLD thread out for a touch on the effect of links with Google 5 years ago

[webmasterworld.com...]

Yep, memory like an elephant. :)

tantalus

2:44 pm on Nov 13, 2005 (gmt 0)

10+ Year Member



Thats a fascinating serp martinibuster.

For what it is worth I always look at the indented listing for my site name (kw - unrelated word) as to clues to what google may be doing with LSI.

When I first asked the question "Now why did google choose that page?" (just after florida) I nearly posted a thread saying "OMG google has learned to read". The page in question was a minor page, low pr and no incoming links and there were plenty of other pages to choose from. I just couldn't understand the choice untill it dawned on me that the snippet was displaying pseudonyms, in perfect proximity, for that unrelated word. I could not have chosen a better page for that unrelated word myself and I wrote it!

Today, my indented listing is a brand.htm and yes it is synonymous with the keyword in my site title. The only way google could know this though, is through linkage data.

(The brand name site btw is top ten for that competitive kw)

Just my 2 cents with the normal imo disclaimer.