Welcome to WebmasterWorld Guest from 54.227.96.5

Forum Moderators: phranque

Message Too Old, No Replies

LSI & Google

where is the link

     
11:37 pm on Jun 8, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 20, 2003
posts:197
votes: 0


There has been talk over the past few months about google using latent semantic indexing in thier algo and I've come to agree recently.
The part of the lsi process that I'm most interested in is the calculation of word cooccurance. lsi websites don't get into this much and I can't find any papers on it by googler's.
Can anyone point me in the direction of a paper or a recent tech company acquisition that connects google to lsi ideas.
11:57 pm on June 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


Here's a few

[javelina.cet.middlebury.edu...]

[lsi.research.telcordia.com...]

[cs.utk.edu...]

[www-psych.nmsu.edu...]

Latent Semantic Analysis (again Telecordia)
[lsi.research.telcordia.com...]

Microsoft also
[research.microsoft.com...]

2:25 am on June 9, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


Word co-occurrence - can you elaborate a little more? There may be something else out there that isn't called the same thing.

Added:

Here's one

[cs.cornell.edu...]

This looks really good. Here's the HTML in the cache, but it's a PDF

Higher Precision for Two Word Queries

[66.102.7.104...]

Constructing and Examining Personalized Cooccurrence-based Thesauri on Web Pages

[www2003.org...]

That last one looks like hitting paydirt.

5:19 pm on June 12, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 16, 2003
posts:77
votes: 0


Is there any proof that the main search engines are using lsi?

I did some tests but from a brief overview, I do not think so.

5:23 pm on June 12, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7246
votes: 0


I did some tests but from a brief overview, I do not think so.

Google certainly is.

Not sure about the others.

TJ

5:59 pm on June 12, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 16, 2003
posts:77
votes: 0


Do you have any data to proof this? I think that lsi really works when your document base is realativley small and non SEO'd.

When you run a search on billions of docs, keywords will take precedence. Considering the way documents are SEO'd today, lsi will not be of much use.

Searches on library books,articles/news,news groups are some what different though. Since the authors did not alter the keyword frequency and introduce semantically related words on purpose.

If there are 1000 documents SEO'd on the word printer, how can google decide to put a document related to ink or hp in it's SERP's when the other 1000 documents are more important to the person who is doing the search.

Can some one explain this to me?

6:15 pm on June 12, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7246
votes: 0


ink or hp

According to Google, "inkjet" is semantically linked to "Epson" (12,500,000 pages).

[google.com...]

Interesting, I didn't realise that semantics could extend to brandnames. Anyone else seen similar in google?

Also "discount" which doesn't really surprise me, but not a good example of effective sematic indexing.

TJ

3:39 pm on June 13, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 16, 2003
posts:77
votes: 0


~ does the tilde instruct google to do a semantic search?

Searching on phone with a tilde brings nokia in the 1st position. Without the tilde, the results are different.

3:49 pm on June 13, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7246
votes: 0


Yes the tilde shows semantic keywords.

Another one - phone¦nokia

[webmasterworld.com...]

Very very interesting...

TJ

8:42 am on June 15, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 20, 2002
posts:933
votes: 0


Soda gets pepsi, cola and coke but not the coca in coca-cola.
8:45 am on June 15, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 20, 2002
posts:933
votes: 0


~mike ~tyson gets boxing. He's sort of a brand right?

I was really hoping to find Don King. lol.

8:00 am on June 23, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 20, 2003
posts:197
votes: 0


When I mentioned word cooccurance I meant that I am interested in the statistical threshold that will result in two words being tagged as similar.

In other words, how many times does a pair of words have to occur together on the web for google to deem the pair "related". From merely anctedotal observations it appears that level is relativly low and is dependant on the number of times the pair occurs in relation to the number of times the seperate words occur without eachother.

The result of a concrete knowledge of how google determines weather or not two words are related is the ability to manipulate "related word" recognization, adding a whole new level to google SEO.