homepage Welcome to WebmasterWorld Guest from 54.196.225.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

This 36 message thread spans 2 pages: 36 ( [1] 2 > >     
Microsoft: MSN - Strider Search Defender
MSN publishing anti-spam techniques.
unreviewed




msg:3006198
 10:04 pm on Jul 13, 2006 (gmt 0)

MSN - Strider Search Defender

The link below is to the white paper on this at MSN Research.

"Automatic and Systematic Discovery of Search Spammers through Non-Content Analysis"

"A common approach to detecting spam web pages is through content analysis based on classification heuristics [2,3]. In this report, we propose an orthogonal context-based approach that uses URL-redirection analysis. Our work was primarily motivated by two key observations:"

[research.microsoft.com...]

And according to News.com it is now in use.
[news.com.com...]

 

jimh009




msg:3006219
 10:40 pm on Jul 13, 2006 (gmt 0)

Now that is an interesting idea on MSN's part. One thing they need to keep in mind, though, that if not done properly it would be all too easy to get a competitors site kicked out of the index simply by spamming their url in various spammy places. While I'm sure MSN will try to come up with something to prevent that, I can't really see them implementing something that is 100% foolproof - after all, this is Microsoft. And should people find out how to rid the web of their competition, the index in MSN could really become a true mess.

unreviewed




msg:3006315
 11:34 pm on Jul 13, 2006 (gmt 0)

>>it would be all too easy to get a competitors site kicked out of the index simply by spamming their url in various spammy places.

Agree, but on the other hand, doesnít an engine have more to gain vs. that type of collateral damage? MSN could take the view that Pepsi would not and could not take out Coke Ö sort of thing Ö In other words, trusted and important web sites could have a free pass. Engines are going to take the gloves off, they have little choice.

msndude




msg:3006329
 11:43 pm on Jul 13, 2006 (gmt 0)

Remember how about a month ago many queries produced nothing but "splog" results? I was kind of disappointed that no one noticed that most of them went away fairly suddenly. Following my rule that "I won't talk about if you don't notice it" was very frustrating this time!

Now you also know why I've been persistently asking if anyone thought their site had been wrongly blacklisted, and why I've been checking those out personally. I'm happy to report, by the way, that all the mistakes I found during this period were generated by the old system; so far I haven't seen a single false positive from the new system -- at least, not affecting anyone on Webmaster World.

Finally, it's certainly true that someone could do this to try to blacklist a competitor (or even the New York Times), but, as you say, we have a plan for that too -- a bit more sophisticated than "give big sites a free pass." :-)

plumsauce




msg:3006338
 11:53 pm on Jul 13, 2006 (gmt 0)


While I'm sure MSN will try to come up with something to prevent that, I can't really see them implementing something that is 100% foolproof - after all, this is Microsoft.

As compared to who exactly? The almighty Google? Kings of the Perpetual Beta and Bad Data Push.

A great point from the whitepaper:


Similarly, advertisement syndicators can detect potential spammers by monitoring those customers who serve ads on a huge number of different URLs through a single account because it is highly unlikely that anyone can generate quality content at that scale.

TAILOR MADE FOR COMBATTING MFA(MADE FOR ADSENSE) PUBLISHERS.

And get this:


The ranked Top Domain list is then used to prioritize manual investigation.

Being actually willing to use human processes instead of depending on automated algos. Kudos to MS for being willing to spend money to make money. Unlike those other skinflints.

cabbie




msg:3006384
 12:29 am on Jul 14, 2006 (gmt 0)

>>so far I haven't seen a single false positive from the new system -- at least, not affecting anyone on Webmaster World.>>
I would love you to look at 2 I know that got burnt.
I doubt you be able to fault these(unless its because they don't have enough blog links) and these were enjoying good positions with msn till just now.
and i still see sites that no longer exist, have decent placements in your serps.How are you missing those?

[edited by: cabbie at 12:39 am (utc) on July 14, 2006]

egurr




msg:3006388
 12:36 am on Jul 14, 2006 (gmt 0)

{ Remember how about a month ago many queries produced nothing but "splog" results? I was kind of disappointed that no one noticed that most of them went away fairly suddenly. Following my rule that "I won't talk about if you don't notice it" was very frustrating this time!}

I noticed it MSNdude. I run a research site that showed MSN.com beating the snot out of the other search engines over the last couple of months (in terms of relevance). We knew something was up. It's amazing how fast this worked. In a matter of months the SERPs changed drastically for local search. Well done.

cabbie




msg:3006400
 12:46 am on Jul 14, 2006 (gmt 0)

If you want to test something to beat spam, let it loose in the spammy areas like adult for instance.
I still see so many spammy kw rich doorway with js redirects.
Getting rid of splogs is one thing.Spam something else.

msndude




msg:3006456
 1:44 am on Jul 14, 2006 (gmt 0)

Yes, it's only one piece of the puzzle; there is much work still to be done. But this was a very important piece.

And summer is far from over.

arbitrary




msg:3006465
 1:48 am on Jul 14, 2006 (gmt 0)

msnbot is like a stone that skips over water, it does not go deep.

if it went deep, spidered properly and kept those pages in its index, maybe there would be less spam.

crobb305




msg:3006485
 2:21 am on Jul 14, 2006 (gmt 0)

MSNdude,

You said
Following my rule that "I won't talk about if you don't notice it" was very frustrating this time!

Trust me, I noticed the enormous improvment in my sector. I imagine there are some folks out there who took advantage of the old system that are very frustrated now that their spam pages are gone.

Furthermore, it is amazing to me that you have been so responsive to comments, concerns, and some very harsh criticism (at times). But, I think that receptiveness/attitude is consistent with the business philosophy that helped make Microsoft what it is today. You guys are not only extremely determined and intelligent, but very humble; it is the humbleness that will continue to make MSN search a big player in my opinion. Thanks for listening, and taking such quick action.

It is very smart of you guys to take advantage of all this free labor (i.e., the hundreds, maybe thousands, of eyes out here watching every sector for you and sending in reports). I can't say as much for some of your competitors who won't so much as return an email.

What more can I say? I know this is has been off topic, but the expeditious improvements I have seen deserve recognition. Good work.

Chris

Praxus




msg:3006535
 3:18 am on Jul 14, 2006 (gmt 0)

I agree as well the results are much cleaner than they used
to be. I still see lots of blogspot sites redirecting to spam sites; but its still a substantial improvement from before the end of may as far as spam is concerned.

tictoc




msg:3006602
 4:00 am on Jul 14, 2006 (gmt 0)

I was kind of disappointed that no one noticed that most of them went away fairly suddenly. Following my rule that "I won't talk about if you don't notice it" was very frustrating this time!

I do see improved results in MSN Search lately when it comes to redirects. I agree that people do not praise MSN Search enough for accomplishes they have made. However, I do still see a lot of sub domain sites in the serps (blogspot.com). I am glad that the Strider Search Defender Team is focused on improving these. We could end up with far better results in MSN than we see with Google and Yahoo.

I agree that MSNDude has been very humble and helpful to all of us here in the Webmaster World forums and we are very glad you are here. MSNDude has been here to get feedback like I have never seen before. Maybe this feedback will help MSN become the #1 search engine based on quality.

Yahoo is also having a difficulty with these blogsite -> redirect problems. I guess we could open up a whole new forum just on these splog sites alone.

Similarly, advertisement syndicators can detect potential spammers by monitoring those customers who serve ads on a huge number of different URLs

I am not understanding what they are saying here. I doubt it has anything to do with AdSense.

whoisgregg




msg:3007788
 4:35 pm on Jul 14, 2006 (gmt 0)

It's interesting that the Microsoft paper uses examples of how Google's results are spammed.
[research.microsoft.com...]

I thought that "side-by-side" that Microsoft pulled at the Vegas Pubcon last year didn't go over so well with this crowd... I guess it was considered a success by the Microsoft team.

Added: Ahh, I see now that MSN is also referenced in the paper as having the same issue. Good on ya.

msndude




msg:3008072
 7:57 pm on Jul 14, 2006 (gmt 0)

Gregg: Thank you for noticing that.

Marcia




msg:3009029
 7:50 am on Jul 15, 2006 (gmt 0)

>>Similarly, advertisement syndicators can detect potential spammers by monitoring those customers who serve ads on a huge number of different URLs

>>I am not understanding what they are saying here. I doubt it has anything to do with AdSense.

Sure, tracking down common site ownershp and control can be done using the Adsense account number so it does have something to do with Adsense. There are people who crank out a boat-load of domains (URLs) - MFA sites - and run Adsense on them, with all having the same Adsense account number. How hard could it be for a search engine to track down a ton of domains/URLs all using the same account number?

tictoc




msg:3009322
 12:04 pm on Jul 15, 2006 (gmt 0)

Good point Marcia. I am glad you explained that.

Spiekerooger




msg:3011073
 4:31 am on Jul 17, 2006 (gmt 0)

This looks a good idea.

I've got two things coming to my mind, the first just a trifle:

The way I see it, your report strengthens the said spam domains by linking them and not even using "rel=nofollow" - ok, most SEs should have banned the said domains by now, but searching for fendi handbags on serveral SEs is still bringing up the spam domains - with more linkpower by Microsoft...

Second:

I'm using framebusters written in javascript to prevent my domain from being framed by anyone - therefor a user may see a different site (just my domain) than MSN seeing my site framed in another domain ... is your technique able to differentiate between those framebusters and cloaking?

Please excuse any language mistakes - this is not my mother tongue ;-)

Greetings,

Chris

UnitedRigo




msg:3011104
 5:11 am on Jul 17, 2006 (gmt 0)

My site includes a message board. We allow users to post messages on the board in our site. Often we get a lot of spammers posting links, etc. We make our best efforts to remove the spam, but it is imposible to have the message board clean 100% of the time since so much spam is placed in it. Will this hurt my site on MSN? My site was removed from MSN in January and recently was allowed back into MSN search. If the spam on my message board is picked up by MSN, then we may be labeled as a spam site. Should I remove the message board? Does anyone have a similar experience?

msndude




msg:3011765
 4:11 pm on Jul 17, 2006 (gmt 0)

Do you use "NOFOLLOW" in your message board? If so, we won't blame you for anything spammers post there.

Check out our instructions here:

[search.msn.com...]

UnitedRigo




msg:3012278
 10:28 pm on Jul 17, 2006 (gmt 0)

Thank you for the tip. I will check and make sure that we are using the NO FOLLOW command.

atlrus




msg:3013012
 1:16 pm on Jul 18, 2006 (gmt 0)

You know what would reduce spam by 50% - get rid of the blog subdomains; completely. Make a new search feature that will search only the blogs, similar to "News", so if people really want to read blogs - to go there.

Spammers use them as free domain names, so even if they get banned - they loose nothing, and this right now hurts you more than anything else I've seen.

crobb305




msg:3014890
 7:04 pm on Jul 19, 2006 (gmt 0)

atlrus,

That is a good idea, in my opinion. "Free" domains have always been frowned upon. Why should "Free" subdomains be viewed any differently? The subdomains seem to rank well because of the "trust" passed on to it from the parent domain.

msndude




msg:3015276
 12:45 am on Jul 20, 2006 (gmt 0)

Sometimes, though, a blog really does have the best answer for a query. The challenge is to negate the effect of the spammers but without destroying the value in the legitimate blogs. This is a lot harder than just throwing out all blogs, but it's a better long-term strategy.

The reason should be easy to see: if you choose to discard entire categories, your errors are cumulative; that is, every good site you lose is lost for good. Do this too many times and you won't have anything left.

atlrus




msg:3015726
 12:42 pm on Jul 20, 2006 (gmt 0)

MSNDUDE,
I beg to differ. Here is how it's done - a person creates about 5000 blogs on, let say blogspot, throws some auto generated text on them, inter-links them - and there you have it - 5000 blogs link to 20 related blogs, and 10 of them get into the top 20 results on MSN.

It's so simple, even Google was able to "get it", and I have to congratulate them on their "blog search" feature. And they have almost no blogs in the search results I monitor, deffinitely none in the top 100 results.

And if we have to real about it - "Joe's blog" would never be able to compete against a normal website, as the website would get far more links than the blog. So all the blogs that rank on top spots on MSN are only ONLY spam blogs.

And I am not asking to "discriminate" against blogs, just ask your engeneers to spend a day and create a separate category just for blogs. I dont think it will be all that hard to do, and it will greatly clean your results.

atlrus




msg:3015755
 1:06 pm on Jul 20, 2006 (gmt 0)

Just for the sake of the argument, I went and counted the blogspot results for one of the keywords I monitor.
Please, note, these are ONLY blogspot results, I have not added the mymsn, myspace etc. results to this number:

*Number of blogspot SPAM subdomains: 23 out of 250 results
*Number of blogspot SPAM subdomains in the first 50 returned results: 12
*Number of blogspot subdomains made by human: 0/250
*Number of blogspot subdomains wich DO NOT redirect: 0/250

I see no quality being lost from removing blog subdomains from the "Web Search" result, on the contrary...

P.S. Forgot to mention that my website ranks in the top 10 on MSN, so I am not just b*chin' ;)

whoisgregg




msg:3015880
 2:48 pm on Jul 20, 2006 (gmt 0)

just ask your engeneers to spend a day and create a separate category just for blogs

I'm sure it would take weeks of work by a few engineers (and the interference of dozens of management types) to add a "blog search" (or any "niche" search) to a major search engine.

Toss in the personality of the top management people and you might be into months. Not digging on Bill here... Larry, Sergey, and Terry also bring in overhead (along with their unique positive influences).

The scale of the effort is belied by the apparent simplicity of the final product.

msndude




msg:3016671
 12:41 am on Jul 21, 2006 (gmt 0)

whoisgregg: I can't speak for our competitors, but that's not how it works in MSN Search. We're fairly nimble and we have a lot of autonomy. It helps make this a fun team to work on.

The first problem that would have to be solved to accomplish what's proposed here would be to reliably identify blogs. After all, if you offer a specialized "blog search," people will be unhappy if it finds things that aren't blogs or misses things that are. This could end up just introducing a new source of error which would be added to any current errors.

The second problem is "who is going to use this feature?" If the whole point of it was to avoid dealing with blog spam, that suggests that this new blog search is going to be pretty poor -- or that we'd STILL have to fix the blog spam problem while maintaining a more complex system.

The last problem is "exactly how are we going to expose the UI for this?" If you have a great feature but no one can figure out how to access it, then it didn't really help the customers very much. The UI can be just as important as the AI. In this case, I don't quite see how the UI is supposed to work.

If someone seriously proposed the feature described here, all three of those arguments would have to be answered inside the two feature teams who would own it. Given agreement there, it'd be presented to the management of MSN Search itself. If the teams had good answers to the three questions above, I expect it would be rubber-stamped and would ship within a month of the time it was ready. (Probably not on a Friday or a weekend.) :-)

No one higher up the organization than the General Manager for MSN Search would be involved at all, although if we were really proud of the feature, we'd definitely show it off to them.

So when it takes time to fix things, it is definitely not due to bureaucracy; we have very little of that. Instead, things usually take longer than you're think because a) the solutions aren't obvious b) it takes time to test the solutions c) the tests show that the "solution" creates more problems than it solves or d) no one is available to work on it -- they're working on higher-priority things.

atlrus




msg:3016724
 2:22 am on Jul 21, 2006 (gmt 0)

Well,
Then you guys should bow down your heads and go to blogsearch(dot)google(dot)com and get the answers to your questions.

You need to understand that most people find no use in conventional blogs - that is the blog Joe created to describe what he did the day before - completely useless information, nobody would ever search for it, let his friends go and read it.

The blogs which are worth reading - they have their own domain names(and I dont look at them as "blogs" anyways).

Google figured it out, and istead of completely taking off blogs from their results (i.e. the subdomains from blogspot etc.) they just moved them away from what's more important - the web search results.

You dont have to worry about sacrificing the quality of your search results, as if you look at the numbers I gave you - none of the blogspot results were created by human, thus - no quality to be lost to begin with. And I have never seen a blogspot subdomain with real sentences ranking on MSN EVER. Well, all you have to do is go to your search engine and type in a competitve word and see for yourself.

Or at least, for the love of God, make the links from blog subdomains count for nothing when your algo is at work.

If you guys clear all the blog subdomains from your results you will have probably one of the best results. And this is what I want, as my website ranks great on MSN, but MSN just doesnt generate enough traffic - maybe if you show some clean results more people will move from Google to you, and the monopoly will disappear...I have a dream...

spander




msg:3016725
 2:24 am on Jul 21, 2006 (gmt 0)

The biggest problem that I see is that MSN doesn't index all of the pages - in fact, just a small percentage - so they don't have the wealth of all the pages to draw on.

This 36 message thread spans 2 pages: 36 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved