Welcome to WebmasterWorld Guest from 54.204.106.194

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

MSN bot hitting hard without a referrer

65.52.104.27

     
1:02 pm on Aug 29, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 6, 2002
posts:1831
votes: 22


I just don't get it, why do they hit all domains with a string like this:

65.52.104.28 - - [28/Aug/2011:18:31:59 -0400] "GET /.... .html HTTP/1.1" 200 10869 "-" "-"

Google Anylytics will show this as direct visits, but this is not true, it is just a bot hitting.


Usually bingbot contains:

"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
7:24 am on Aug 30, 2011 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 21, 2005
posts: 379
votes: 0


Block them:

RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^ - [F]

Anything that refuses to identify itself properly, doesn't gain access to my sites. Bingbot still comes around and the sites are all indexed normally.

Plus Google Analytics doesn't count 403s.

Have you read this thread? It is about the same issue, and more:

[webmasterworld.com...]
7:36 am on Aug 30, 2011 (gmt 0)

Preferred Member

10+ Year Member

joined:July 25, 2006
posts: 460
votes: 0


Bot visits actually are direct requests, so it's normal for the referer to be blank (in fact, it's suspicious if it's not blank), but you're right that robots normally provide a user-agent string, and it's unusual that one would not do so.

----

It's my understanding that it's possible to send a completely bogus (fake) IP address with a request, so these might not be from the MSN bot at all.

The real sender won't get a reply back from your server, but maybe they don't want one.

In fact, since your server's replies will go to the MSN IP address, maybe it's part of a DDoS attack on MSN?
9:17 pm on Aug 30, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3145
votes: 4


Can't say I've ever seen hits from MSN or any other SE that had really suspicious IPs. Whenever I've seen what appears to be an incorrect IP it's either because I've missed a genuine bot rDNS or because the SE has not yet provided a proper rDNS entry (discussed extensively elsewhere on this forum and made known to MSN).

I frequently see "valid" bot UAs (usually google) that come from "bad" IPs, generally either known bad server farms or through botnets or, ocassionaly, some SEO bod trying it on.
9:29 pm on Aug 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I see a variety of accesses from agents that claim to be Googlebot, but which come from various non-Google IP addresses. It's not so easy to identify this for Bing, because they don't publish a list of valid IP addresses.
9:38 pm on Aug 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


I think the likelihood of the MSN IP being a forgery (or an attack pass-thru) are verrrrrrrrry slim. Conversely, and as countless threads in this forum attest, the major SEs habitually bot-run any number of 'unofficial' oddities.
7:17 pm on Aug 31, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3145
votes: 4


Neither SE publishes their bot IP ranges. They do have valid rDNS for their bots, though. Except for those that don't, which we've asked bingdude to follow up for bing/msnbot. Googlebot seems not to run on "bad" IPs but they do run a load of other junk on both bot and non-bot IPs.

I've run DNS lookups looking for both googlebot and bing/msnbot (and others). Bing/msnbot is terribly disorganised, returning at least 140 "valid" rDNS ranges, many only a handful of IPs within a /24. Apart from that there are several ranges that bing/msnbot runs valid UAs on but with no rDNS; these get banned here.

Googlebot seems to concentrate all its IPs within a /23 and a /21-ish range, which makes it easier to keep track of true bots but I still reject those that have a non-googlebot UA.
1:22 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


FWIW: msnbot Host without msnbot UA --

msnbot-157-55-112-210.search.msn.com
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)

robots.txt? NO

157.55.112.210 [projecthoneypot.org...]
8:57 pm on Sept 26, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Ditto again today, no robots.txt, no msn-anything UA --

msnbot-207-46-204-214.search.msn.com
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)

robots.txt? NO

207.46.204.214 [http://www.projecthoneypot.org/ip_207.46.204.214]
12:00 am on Sept 29, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Same day, same msnbot-in-Host name, same no-robots.txt conduct (but requesting .html), and with another no msn-anything UA:

msnbot-65-54-247-141.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

robots.txt? NO
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members