homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

MSN bot hitting hard without a referrer

 1:02 pm on Aug 29, 2011 (gmt 0)

I just don't get it, why do they hit all domains with a string like this: - - [28/Aug/2011:18:31:59 -0400] "GET /.... .html HTTP/1.1" 200 10869 "-" "-"

Google Anylytics will show this as direct visits, but this is not true, it is just a bot hitting.

Usually bingbot contains:

"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"



 7:24 am on Aug 30, 2011 (gmt 0)

Block them:

RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^ - [F]

Anything that refuses to identify itself properly, doesn't gain access to my sites. Bingbot still comes around and the sites are all indexed normally.

Plus Google Analytics doesn't count 403s.

Have you read this thread? It is about the same issue, and more:



 7:36 am on Aug 30, 2011 (gmt 0)

Bot visits actually are direct requests, so it's normal for the referer to be blank (in fact, it's suspicious if it's not blank), but you're right that robots normally provide a user-agent string, and it's unusual that one would not do so.


It's my understanding that it's possible to send a completely bogus (fake) IP address with a request, so these might not be from the MSN bot at all.

The real sender won't get a reply back from your server, but maybe they don't want one.

In fact, since your server's replies will go to the MSN IP address, maybe it's part of a DDoS attack on MSN?


 9:17 pm on Aug 30, 2011 (gmt 0)

Can't say I've ever seen hits from MSN or any other SE that had really suspicious IPs. Whenever I've seen what appears to be an incorrect IP it's either because I've missed a genuine bot rDNS or because the SE has not yet provided a proper rDNS entry (discussed extensively elsewhere on this forum and made known to MSN).

I frequently see "valid" bot UAs (usually google) that come from "bad" IPs, generally either known bad server farms or through botnets or, ocassionaly, some SEO bod trying it on.


 9:29 pm on Aug 30, 2011 (gmt 0)

I see a variety of accesses from agents that claim to be Googlebot, but which come from various non-Google IP addresses. It's not so easy to identify this for Bing, because they don't publish a list of valid IP addresses.


 9:38 pm on Aug 30, 2011 (gmt 0)

I think the likelihood of the MSN IP being a forgery (or an attack pass-thru) are verrrrrrrrry slim. Conversely, and as countless threads in this forum attest, the major SEs habitually bot-run any number of 'unofficial' oddities.


 7:17 pm on Aug 31, 2011 (gmt 0)

Neither SE publishes their bot IP ranges. They do have valid rDNS for their bots, though. Except for those that don't, which we've asked bingdude to follow up for bing/msnbot. Googlebot seems not to run on "bad" IPs but they do run a load of other junk on both bot and non-bot IPs.

I've run DNS lookups looking for both googlebot and bing/msnbot (and others). Bing/msnbot is terribly disorganised, returning at least 140 "valid" rDNS ranges, many only a handful of IPs within a /24. Apart from that there are several ranges that bing/msnbot runs valid UAs on but with no rDNS; these get banned here.

Googlebot seems to concentrate all its IPs within a /23 and a /21-ish range, which makes it easier to keep track of true bots but I still reject those that have a non-googlebot UA.


 1:22 pm on Sep 2, 2011 (gmt 0)

FWIW: msnbot Host without msnbot UA --

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)

robots.txt? NO [projecthoneypot.org...]


 8:57 pm on Sep 26, 2011 (gmt 0)

Ditto again today, no robots.txt, no msn-anything UA --

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)

robots.txt? NO [http://www.projecthoneypot.org/ip_207.46.204.214]


 12:00 am on Sep 29, 2011 (gmt 0)

Same day, same msnbot-in-Host name, same no-robots.txt conduct (but requesting .html), and with another no msn-anything UA:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

robots.txt? NO

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved