|Google Uses Sentiment Analysis for news and blogs but not search|
This information is a small part of a Google announcement about algo changes to prevent abusive merchants from ranking well based on complaint links - such as the recent "DecorMyEyes" debacle [nytimes.com]
The new (to me) tidbit that sort of sneaks into the article is this:
|As it turns out, Google has a world-class sentiment analysis system (Large-Scale Sentiment Analysis for News and Blogs [icwsm.org]). |
But if we demoted web pages that have negative comments against them, you might not be able to find information about many elected officials, not to mention a lot of important but controversial concepts.
So far we have not found an effective way to significantly improve search using sentiment analysis. Of course, we will continue trying.
Sentiment analysis has been an area of interest for me for many years. It's notoriously challenging because computers are very poor at understanding irony, subtext, implicit context and all kinds of other linguistic constructs that a human reader easily comprehends.
So I'm surprised that Google is using it at all - even in a narrow context. I'm surprised because there are many social media listening platforms that try to use sentiment analysis, but they end up with a high amount of "no sentiment detected" or even worse, incorrectly assigned sentiment.
As far as I know, accuracy in sentiment analysis still requires a lot of human data scrubbing. So Iwill be studying the paper very closely.
In many ways I never really want to take a look at these high-level systems as oftentimes the stuff I see kind of scares me.
I guess the muckety-mucks that come up with them are comfortable enough with data that, to me, looks rather poor. But then they're probably thinking about huge data sets where everything all washes out in the end.
For example, the daily sentiment example [textmap.com] under "Place" includes terms such as Dvd, Heisman, Cup, Nissan, Bears and HD DVD. Interesting places. And the Title section for the most part simply includes scraps, bits and pieces of words.
Again, to me it would be scary if this is actually being used live in any way. But then again, again, what the heck do I know.
|Instead, in the last few days we developed an algorithmic solution which detects the merchant from the Times article along with hundreds of other merchants that, in our opinion, provide an extremely poor user experience. The algorithm we incorporated into our search rankings represents an initial solution to this issue, and Google users are now getting a better experience as a result. |
So now to bring the competitor down, all you have to do is publish many fake bad reviews?
aakk9999, but they are already answering you through their comments in the lines that follow...
they do seem to be saying their magic formula (which they cannot reveal) can detect the "fake" bad reviews from the "genuine" bad reviews...
Yeah, right, and they also have an algorithm that tells them when poker players are bluffing.
how about this from the article?
|A crucial factor in Google search results, the spokesman explained, is the number of links from respected and substantial Web sites. The more links that a site has from big and well-regarded sites, the better its chances of turning up high in a search |
So spam, social engineering and other shady methods maybe used to promote a site.
The more they rely on "reviews", "feedbacks", "links" and other factors which are basically "uncharted territory", the higher the probability for these areas to be abused. Seems that search engines still not getting it.
- Age of domain with same content, whois
- Original/Authority Content
- Internal site usage and structure
Because once you start digging in areas you cannot analyze or monitor the chances are, you will do more wrong than right. cc disputes, refunds, imitated items and the like, can be handled by banks, police, gov, etc once a customer files a report. It's not possible for search engines to know the details in these cases.
And yes seems like fake reviews can be posted over time using botnets to weigh reputation of a company either way.
BTW this is not only with Google. Social networks and other search engines have the same pitfalls.
|they do seem to be saying their magic formula (which they cannot reveal) can detect the "fake" bad reviews from the "genuine" bad reviews... |
There's not much magic to it. If YOU write the bad review your browser gives you up in numerous ways. Perhaps the page with the comment had adsense, perhaps you use a toolbar (Google, Alexa, makes no difference), perhaps you revisit the comment by clicking on it's link in your web history or perhaps you bookmark the page.
Point is there are countless web beacons that record data and Google knows about (and collects) from them all if even remotely possible.
"Ohh, look... these negative comments all have you as the first to ever view them, and we know you're a competitor because you are also the first to view your own article webpages".
It's not rocket science, in fact I have no doubt this very post can be linked to me, in several ways, because I'll be the first to ever load it.
So, if I really, really, hate my competition and post my sentiments all over the Internet, my competition will be wiped off of google?
Funny how the New York Times is partially culpable here in maybe more ways than 1.
According to Danny Sullivan, who says this is 2 stories - about a jerk eccomerce operator and a site that ranks well through spammy backlinks -
I think it is also about 1 more story - the fact that the NYT seems to get the analysis wrong.
I wonder if they will print a retraction?
The concept made me wonder right away whether a significant number of negative reviews on Google itself would yield a downranking of "Google" in search results for, say, "search engines".
|...all kinds of other linguistic constructs that a human reader easily comprehends. |
That really hasn't been my experience. Many a times I've watched people stare blankly in the face of sarcasm or irony.
So I think Google can be forgiven if their 'sentiment analysis' is not yet sophisticated enough to be effective.
Final chapter in this saga?
Vitaly Borker arrested - [searchengineland.com...]
According to the original NYT article he had already been arrested on State charges back in October. This is the Feds now getting into the act.
Now, back to the topic.
Is there anybody out there that can give me a teaching moment on my post above? Is it that the data set used for the sample pages was small and that any of the "dirty data" I noted would go largely unnoticed in a much larger set?
One more question: I don't use Google News that often as I find it virtually useless. Has anybody seen any evidence that SA is used in either news or blog search? I've poked around a bit and can't see anything myself. Maybe I'm simply having a problem formulating queries that might trigger it.
Or (Heaven forfend!), maybe tedster fell into the trap of interpreting Google instead of reading Google: "As it turns out, Google *has* a world-class sentiment analysis system..." It doesn't say that it's actually used.
Just my sideways opinion, but with the recent consumer reports article about ATT being the worse US Based mobile carrier, shouldn't Google make them harder for people to do business with?
Consumer reports is a big name, people are being hurt by them... seems like the right thing to do.
Very sly comment ;) And you've also highlighted why sentiment analysis isn't a good fit for organic search.
This is offtopic. I came across this NYtimes article only recently. More than the Google algorithm that still appears to be on the fence on sentiment analysis, I am surprised the search query that is mentioned in the article still places the culprit's website as number one on the search results.
Google had a quick turnaround when it came to the Michelle Obama racist picture issue ( [outsidethebeltway.com...] ). But I am surprised in this case, where there is commerce involved, no action has been taken even a month and a half down.