homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Google synonyms and the tilde operator

 10:49 pm on Dec 7, 2012 (gmt 0)

Are the synonyms that google shows me when I do ~command the exact list of synonyms



 11:14 pm on Dec 8, 2012 (gmt 0)

No, it's not the exact list at all. It's just the top few synonyms that have the strongest statistical correlation. In fact, Google has a huge pile of data about semantic relationships and sometimes the SERPs themselves can be a better tip-off than the tilde operator!

The key discussion and patent to understand is Phrase-based indexing [webmasterworld.com]. Next to that (and possible BEFORE wrapping your barin around that] is hrase Based Multiple Indexing and Keyword Co-Occurrence [webmasterworld.com]

Note, this all started back in 2006, and by now it has become quite mature and advanced.


 2:33 am on Dec 9, 2012 (gmt 0)

>> huge pile of data about semantic relationships

Have you played much with Google Correlate? Some interesting things shake out from there if you experiment - some surprising connections showing the promise and limitations of using statistical correlation.


 2:39 am on Dec 9, 2012 (gmt 0)

Maybe I should give an example of one experiment I ran. If you grab a table of US obesity rates by state and plug them into correlate, you'll notice that the terms that match that pattern the most closely, according to Google Correlate, mostly relate to rap music.

A friend who is a professional statistician and I were discussing this and his guess at what's happening is

1. You're only starting with 51 data points, so there's a lot of noise in the signal.

2. To save cycles, Google does a first pass approximation and then a second pass (I think - check this). If it finds something promising on the first pass, it explores that avenue some more.

3. So if you get some semi-random connection because of limited data or the random occurrence of two curves that match for no good reason, Google will look at curves for related terms to see which of those match, which means it has a self-reinforcing aspect.

So the end result is that when Google tries to build statistical correlations between a data set and a search, it can get pretty whacky. It can also be used to accurately predict flu outbreaks in the US in advance of official CDC warnings.

In short, it can be useful, but one must exercise caution. As the old saw goes, "correlation is not causation".


 3:51 am on Dec 9, 2012 (gmt 0)

limitations of using statistical correlation

In the early days of the tilde operator I saw the [~bread] search give results about the Rolls Royce luxury car. Yes, I suppose it does take a lot of "bread" to own and operate one, but it was still a pretty humorous result.


 10:11 pm on Dec 9, 2012 (gmt 0)

I expect that's a "real" relationship Ted, unlike the "rap music" relationship I found on Correlate which was related only in terms of frequency. Granted correlate is not looking at co-occurrence on pages, which is a much simpler problem. It's mapping far more complex problems with far less data.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved