homepage Welcome to WebmasterWorld Guest from 23.22.29.137
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
MSNBot intentionally requesting bogus file names.
MSNBot is making requests for manufactured URIs apparently to trigger 404s
KenB




msg:4048910
 1:52 pm on Dec 24, 2009 (gmt 0)

In what is apparently a rather old bad behavior, msnbot has a practice of regularly requesting totally manufactured URIs that appear to be designed to trigger 404 errors. Here are two sample log entries of the two styles of bogus URIs msnbot requests:
'65.55.207.126'¦Tue, 15 Dec 2009 20:39:49 -0500¦'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'¦'*/*'¦'/ADBF3C7AB534E8356F30D8AC05291640_00000.temp019f.html'¦''

'65.55.207.28'¦Wed, 16 Dec 2009 05:46:22 -0500¦'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'¦'*/*'¦'/000166709_00001.temp00be.html'¦''

The requests ALWAYS take on one of the formats above starting with either a 32byte GUID or a nine digit integer.

Now in a way, testing how a website responds to what should cause a 404 error makes sense when trying to detect SERP spam sites, however, such test only needs to be tested once or twice a week to confirm that a site is responding with 404s properly. Unfortunatly msnbot is making these requests dozens of times a day wasting server resources and cluttering up server logs with useless noise. The server log clutter in turn makes it much harder to filter out other 404 errors to find URI's that might need to be redirected to the proper destination.

If the goal is to detect SERP spammers, said bogus requests shouldn't be in a format that would be so easy to detect using a regular expression where a spammer could make sure to respond with a 404 for those requests while responding with 200 for any other request.

There has been some discussion on this matter over on the Bing forums with on webmaster reporting that he has been observing this behavior for about 10 years. You can read that thread at: [bing.com...]

Regardless of MSFT's self justified reason for doing this, intentionally requesting bogus files like this many times a day is not cool and represents yet another bad behavior by msnbot that MSFT should stop.

 

Brett_Tabke




msg:4050440
 7:36 pm on Dec 28, 2009 (gmt 0)

Nothing specific to add, but here are some related threads:

[webmasterworld.com...]
[webmasterworld.com...]

Lord Majestic




msg:4060203
 9:11 pm on Jan 13, 2010 (gmt 0)

Regardless of MSFT's self justified reason for doing this, intentionally requesting bogus files like this many times a day is not cool and represents yet another bad behavior by msnbot that MSFT should stop.

Microsoft (or any other crawler for that matter) can't possibly know whether a URL exists or not on a website before crawling it.

Big search engines follow links they find on the Net, and often those can actually be bogus (but search engines don't know it until they crawl them) - this is normal situation in any large scale crawling, there is a special error code 404 designed just for that.

Now if real visitors try to visit your site and get 404, then that's really an issue that's worth looking into.

TheMadScientist




msg:4060264
 11:18 pm on Jan 13, 2010 (gmt 0)

Regardless of MSFT's self justified reason for doing this, intentionally requesting bogus files like this many times a day is not cool and represents yet another bad behavior by msnbot that MSFT should stop.

Many times in a day might be a bit much, but they're not the only ones reported to be doing it... [webmasterworld.com...] I think you'll probably find more threads like this if you look. Google seems to do it with query_strings more than pages, but IMO it's essentially the same and them knowing how your server handles errors is probably as good for you as it is for them. Do they get a 302, 200, 404, 410, etc. from your specific site/server is a fairly important question for them to have an answer to, IMO.

KenB




msg:4060608
 1:45 pm on Jan 14, 2010 (gmt 0)

Microsoft (or any other crawler for that matter) can't possibly know whether a URL exists or not on a website before crawling it.

Big search engines follow links they find on the Net, and often those can actually be bogus (but search engines don't know it until they crawl them) - this is normal situation in any large scale crawling, there is a special error code 404 designed just for that.

No this isn't a normal situation and no msnbot did not find these URLs on the Internet. I'm 100% positive that msnbot is intentionally manufacturing bogus URLs to test how the server responds to them and whether or not 404 errors are properly issued. As TheMadScientist pointed out this would be a valuable thing for both the search engine and webmaster to know. HOWEVER, they don't need to be testing for 404 responses multiple times each day.

Heck the bogus URL format is predictable enough that any half decent spammer could create a regular expression to make sure proper 404 errors were provided for these bogus URLs while still creating SERP spam by feeding 200 codes and fake pages for everything else.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved