Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

LSI and Google

         

MrFewkes

11:50 pm on Jul 1, 2011 (gmt 0)



All,

I have read here and there that the old LSI appears to be incorporated into the ranking algo.

Can anyone in the know start off a thread about this - is it true - what can we do to improve our LSI score if there is one.

Is it truly LSI or some cheaper version?

Any bidders?

Thanks

tedster

2:38 am on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google's semantics processing is definitely NOT LSI. LSI is 1) patented, 2) old and 2) too computationally intensive to scale to the size of today's web.

Google's version is more powerful than LSI, and far from a "cheaper" version - except that it's computationally cheaper.

The Google's venture into this territory goes back at least to their purchase of Applied Semantics in 2003 [newsbreaks.infotoday.com]. A giant step came in 2006 with the phrase-based indexing patents from Google's Anna Lynn Patterson [webmasterworld.com]. And that was five years ago.

It seems clear to me that Google has now improved far beyond their 2006 level. I see evidence of this every day in Google Suggest and Instant Search results. And no, I don't think there's something simple we can apply to our own sites - except to take off rigid restrictions about the vocabulary we are willing to use on our pages.

MrFewkes

11:08 pm on Jul 2, 2011 (gmt 0)



Tedster if we remove our rigid restrictions and incorporate more colours - lets say we landed a match with the kind of words google wanted to see alongside our target phrase - would we rank higher for our target phrase.

tedster

2:47 am on Jul 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure what you mean by "more colours" but yes - this has been my experience. A natural breadth of vocabulary in the text, as opposed to rigid "SEO content writing" can provide a ranking benefit today for the most core term.

MrFewkes

5:39 pm on Jul 3, 2011 (gmt 0)



Tedster.

The question of colours ok.

Lets say I want to rank for the word "Rainbows".

Then - if I have two pages as follows

Page 1 looks exactly like this
Rainbows are red yellow and brown.

Page 2 looks exactly like this
Rainbows are red yellow and blue.

In the above example - we see that one page mentions 2 separate words which are associated with rainbows - and one word (brown) which is not associated with the rainbow.

The second page on the other hand mentions 3 great words - all of which are associated with rainbows.

Which page will rank the highest for rainbows?

I think that people are generally implying now that page 2 would outrank page 1.

What say you Tedster?

tedster

5:50 pm on Jul 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I say if you want to rank for rainbows, then content about "the law of refraction", "angle of sunlight" or even "a pot of gold" can help.

MrFewkes

6:08 pm on Jul 3, 2011 (gmt 0)



Ok - thats fine - I am with that - and I also agree with it in terms of a ranking signal.

So now im at a point where I ask myself these questions.

1. Given that google egineers dont know everything about everything - they must know that rainbows are about pots of gold from somewhere which is not inside their heads. What is that somewhere?

2. Is this a strong signal - IE on par with say a pr 5 backlink? or is it a weak signal - like an alt tag.

This is possibly the most important thread Ive ever been in - or is it? Is it my missing piece of the jigsaw? Im not even sure!

tedster

6:31 pm on Jul 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's the kind of signal that can disambiguate a page. The more clearly a page is disambiguated semantically, the more easily can it be assigned a proper place in the taxonomy of document types and then assigned appropriately to the right taxonomy of query types

...from somewhere which is not inside their heads. What is that somewhere?

A huge pile of data, calculated automatically on a regular (but not daily) basis, based on actualy language use, and distributed around Google's 1 million plus servers. The phrase-based indexing patents go into a lot of detail about how the related computations work.

This is one area where Google has really taken some giant steps. Earlier in the process I saw some rather funny growing pains, like when a page about a "Rolls" (the car) ranked well for a query about "bread".

There have only been a handful of SEOs following this development... a small handful. And as I've said before, it's not the kind of thing you can manipulate directly, nor should you try. But it can really free up your content to be more human and more directly serve your market.

Forget about the "is it as strong as a PR 5 backlink" thinking. That's too linear to give you an appropriate mental model. I've seen one situation where an SEO tried to get every related phrase he could find into his page - and the rankings took a nosedive. The patents explain why... the number of related phrases on the page went too far beyond the statistical norm. And unless you are analyzing all text on the web that Google does, you'll never be able to compute that norm.

MrFewkes

10:18 am on Jul 4, 2011 (gmt 0)



Tedster,

Ok - I am with it. My page has already clearly been classified correctly as it is number 4 in the serp for word1 word2.

Inline though with what you are saying, this means that there would be no further improvement in my serp if I added more - shall we say - "related terms" for want of a better word?

To explain myself better, lets say I am where I am now - and my page is my page.
Then I somehow manage to add the perfect number of the perfect extra words to my page.

Now then - I havent tripped the filter you describe by over doing the perfect number of perfect words/phrases. Lets assume I take my page and its now perfect for its keywords. Its taxonomy is 100% nailed.

Will it now get an increase in the serp position from 4? All else being equal?

As I described - I am perfectly well categorised - I have been for over 2 years - its just that I cant afford to spend 2 months learning about something which wont help me rank higher.

The thing is - I am interested in this sort of thing as you know - but I just cant risk the time going into depth without being certain of a benefit at the moment.

I have GOT to rank tedster - otherwise I am out of business (rightly or wrongly relying on SE's - but its the only route for this product).