homepage Welcome to WebmasterWorld Guest from 54.227.40.166
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google synonyms and the tilde operator
member22




msg:4526019
 10:49 pm on Dec 7, 2012 (gmt 0)

Are the synonyms that google shows me when I do ~command the exact list of synonyms

 

tedster




msg:4526198
 11:14 pm on Dec 8, 2012 (gmt 0)

No, it's not the exact list at all. It's just the top few synonyms that have the strongest statistical correlation. In fact, Google has a huge pile of data about semantic relationships and sometimes the SERPs themselves can be a better tip-off than the tilde operator!

The key discussion and patent to understand is Phrase-based indexing [webmasterworld.com]. Next to that (and possible BEFORE wrapping your barin around that] is hrase Based Multiple Indexing and Keyword Co-Occurrence [webmasterworld.com]

Note, this all started back in 2006, and by now it has become quite mature and advanced.

ergophobe




msg:4526224
 2:33 am on Dec 9, 2012 (gmt 0)

>> huge pile of data about semantic relationships

Have you played much with Google Correlate? Some interesting things shake out from there if you experiment - some surprising connections showing the promise and limitations of using statistical correlation.

ergophobe




msg:4526225
 2:39 am on Dec 9, 2012 (gmt 0)

Maybe I should give an example of one experiment I ran. If you grab a table of US obesity rates by state and plug them into correlate, you'll notice that the terms that match that pattern the most closely, according to Google Correlate, mostly relate to rap music.

A friend who is a professional statistician and I were discussing this and his guess at what's happening is

1. You're only starting with 51 data points, so there's a lot of noise in the signal.

2. To save cycles, Google does a first pass approximation and then a second pass (I think - check this). If it finds something promising on the first pass, it explores that avenue some more.

3. So if you get some semi-random connection because of limited data or the random occurrence of two curves that match for no good reason, Google will look at curves for related terms to see which of those match, which means it has a self-reinforcing aspect.

So the end result is that when Google tries to build statistical correlations between a data set and a search, it can get pretty whacky. It can also be used to accurately predict flu outbreaks in the US in advance of official CDC warnings.

In short, it can be useful, but one must exercise caution. As the old saw goes, "correlation is not causation".

tedster




msg:4526230
 3:51 am on Dec 9, 2012 (gmt 0)

limitations of using statistical correlation

In the early days of the tilde operator I saw the [~bread] search give results about the Rolls Royce luxury car. Yes, I suppose it does take a lot of "bread" to own and operate one, but it was still a pretty humorous result.

ergophobe




msg:4526399
 10:11 pm on Dec 9, 2012 (gmt 0)

I expect that's a "real" relationship Ted, unlike the "rap music" relationship I found on Correlate which was related only in terms of frequency. Granted correlate is not looking at co-occurrence on pages, which is a much simpler problem. It's mapping far more complex problems with far less data.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved