Forum Moderators: mack
The link below is to the white paper on this at MSN Research.
"Automatic and Systematic Discovery of Search Spammers through Non-Content Analysis"
"A common approach to detecting spam web pages is through content analysis based on classification heuristics [2,3]. In this report, we propose an orthogonal context-based approach that uses URL-redirection analysis. Our work was primarily motivated by two key observations:"
And according to News.com it is now in use.
[news.com.com...]
The rest of your argument addresses a marketing question, not a technical one. You claim that customers never want to see a blog result. Is this simply your personal preference, or do you have a study in mind that supports your claim?
We're sensitive to the fact that blog spam is a serious problem; it's bad that keywords you care about are drowned in it. We like to think we've made progress on this in the past month, though, and we expect to improve further in the months to come, so I hope the problem you're seeing will go away soon enough.
But simply trying to eliminate all blogs entirely smacks of giving up and admitting defeat. Where's the fun in that? :-)
Mr. Scoble
Glenn
Cory
Ann
Andy
Ann
Om
the other Ann
Andrew and Danny
Jeremy
and of course, Dave (Dave's not here man.)
Lots of others would vanish as well.
I keep telling whoever looks like they will listen "It's just another content management system."
... but ... they are not listening.
I just looked up 'low cost health insurance'. Positions one, three and six are blogspot spam blogs. Its getting out of control.
At least they could do is make any link from a blog subdomain count as "0". On MSN not only spam blogs rank high, but probably 60% of the websites in the first 250 results rank because of links from such blogs.
Obviously MSN does not want to, or more likely does not have the capability to deal with this problem, so we all will suffer from it.
It seems like an easy decision - every link comming from subdomains from blogspot, myspace, mymsn, etc. - should not count as such. This way if any blogs rank - at least they should be worth it.
Just to be on the safe side MSNdude, I hope that when you guys end up seeing a ton of blog/guestbook spam coming in to a specific domain that you'll err on the side of simply discounting those links, rather than potentially penalizing the domain -- I work in a few hypercompetitive industries, where the temptations for a competitor to simply do a bit of blog/gb spamming on my behalf might be a bit too much.
When I read through the discussion of strider search defender, it sounded like you'd be paying attention to how those links compare with other links received, which is good, so I think you're on the right path. I still just hope that the possibility of 3rd party activity remains large in the alogrithm as you continue to roll this out.
On a side note, I noticed that the overall quality of one of my search phrases improved today, with no more blogs holding spots in the top 10. Granted, my index page when from #8 to not in the top 25 pages, but I'm sure it'll come back once respidered.
Keep innovating.
Cygnus