| This 36 message thread spans 2 pages: < < 36 ( 1  ) || |
|Microsoft: MSN - Strider Search Defender|
MSN publishing anti-spam techniques.
| 10:04 pm on Jul 13, 2006 (gmt 0)|
MSN - Strider Search Defender
The link below is to the white paper on this at MSN Research.
"Automatic and Systematic Discovery of Search Spammers through Non-Content Analysis"
"A common approach to detecting spam web pages is through content analysis based on classification heuristics [2,3]. In this report, we propose an orthogonal context-based approach that uses URL-redirection analysis. Our work was primarily motivated by two key observations:"
And according to News.com it is now in use.
| 3:55 am on Jul 21, 2006 (gmt 0)|
altrus: A dedicated search that only searches blogs that supply feeds to it would certainly be easy to implement, but it's hard to believe that would allow us to cease offering blog results in the main engine.
The rest of your argument addresses a marketing question, not a technical one. You claim that customers never want to see a blog result. Is this simply your personal preference, or do you have a study in mind that supports your claim?
We're sensitive to the fact that blog spam is a serious problem; it's bad that keywords you care about are drowned in it. We like to think we've made progress on this in the past month, though, and we expect to improve further in the months to come, so I hope the problem you're seeing will go away soon enough.
But simply trying to eliminate all blogs entirely smacks of giving up and admitting defeat. Where's the fun in that? :-)
| 4:36 am on Jul 21, 2006 (gmt 0)|
Just think of the people who would no longer be accessible via search if blogs were removed from the results.
the other Ann
Andrew and Danny
and of course, Dave (Dave's not here man.)
Lots of others would vanish as well.
I keep telling whoever looks like they will listen "It's just another content management system."
... but ... they are not listening.
| 4:34 pm on Jul 22, 2006 (gmt 0)|
I don't think eliminating blogs would be admitting defeat. A targetted
move like eliminating .blogspot domains from the first 20 results would clean up your search results immensely, with little or no negative effect.
I just looked up 'low cost health insurance'. Positions one, three and six are blogspot spam blogs. Its getting out of control.
| 6:00 pm on Jul 22, 2006 (gmt 0)|
I see MSNDude meant eliminating ALL blogs is a sign of defeat, and I agree there...but the more targetted move in my last post seems reasonable.
Or at least, if the spider could detect java redirects or whatever they use to link to the commercial search pages, that would make a big difference...
| 7:35 pm on Jul 22, 2006 (gmt 0)|
If MSN thinks that there is anything valuable in the millions of computer generated blog subdomains - well, be it.
At least they could do is make any link from a blog subdomain count as "0". On MSN not only spam blogs rank high, but probably 60% of the websites in the first 250 results rank because of links from such blogs.
Obviously MSN does not want to, or more likely does not have the capability to deal with this problem, so we all will suffer from it.
It seems like an easy decision - every link comming from subdomains from blogspot, myspace, mymsn, etc. - should not count as such. This way if any blogs rank - at least they should be worth it.
| 2:11 am on Jul 24, 2006 (gmt 0)|
I do believe that the key to controlling blog spam will be in closer scrunity towards the sources of inbound links.
Just to be on the safe side MSNdude, I hope that when you guys end up seeing a ton of blog/guestbook spam coming in to a specific domain that you'll err on the side of simply discounting those links, rather than potentially penalizing the domain -- I work in a few hypercompetitive industries, where the temptations for a competitor to simply do a bit of blog/gb spamming on my behalf might be a bit too much.
When I read through the discussion of strider search defender, it sounded like you'd be paying attention to how those links compare with other links received, which is good, so I think you're on the right path. I still just hope that the possibility of 3rd party activity remains large in the alogrithm as you continue to roll this out.
On a side note, I noticed that the overall quality of one of my search phrases improved today, with no more blogs holding spots in the top 10. Granted, my index page when from #8 to not in the top 25 pages, but I'm sure it'll come back once respidered.
| This 36 message thread spans 2 pages: < < 36 ( 1  ) |