homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Microsoft bot 157 ranges updated
DNS for IP ranges 157.55 and 157.56 updated

 11:05 pm on Nov 7, 2011 (gmt 0)

I've been keeping a desultory eye on the MS 157 ranges for some time. Not sure when it actually happened - probably within the past few weeks - but MS seem to have taken notice of complaints - or just finally got their act together.

I had a handful of IPs tagged as msnbot in my database from a previous DNS scan some time back. The new ranges/sub-ranges (about 100) extend this range but also, in some cases, extend or reduce previously logged ranges. The list I now have, taken from DNS scans of the MS DNS servers, is included below. I think the list is accurate but I cannot guarantee it.

I may have missed a few breaks in the actual bot list IPs (eg there may be a gap in a sequence I missed) but it is essentially correct.

------------------------------------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
------------------------------------------ - - - - - - - - - - - - - - - - - - - - - -



 11:58 pm on Nov 7, 2011 (gmt 0)

MS seem to have taken notice of complaints

Not sure if this is related but I've been complaining to Bing, msnbot, et al about errors these bots create at my server; sometimes as many as a couple hundred per day. I've exchanged numerous emails with them over the last few months.

Yesterday all those errors stopped.


 1:47 am on Nov 8, 2011 (gmt 0)

Could be it was that recent. I did a single check on a repeatedly failed IP on Saturday and that showed the update.

I complained to bingdude some time back, in another forum on WebmasterWorld, about the IP and UA problems and nudged again when he popped up there a few days ago so maybe that did some good. Who knows? :(


 1:48 pm on Nov 8, 2011 (gmt 0)

I'm curious: What do you do with the 157. info? Do you only allow msnbot/bingbot from those ranges? Or keep track for your own interest? Or--?


 5:16 pm on Nov 8, 2011 (gmt 0)

here's dstiles 156 & 157 Class C's (the lines are too many IMO to separate by Class D's) to regex:




 11:39 pm on Nov 8, 2011 (gmt 0)

Pfui - I hold the ranges in a MySQL database, wherein are all other blockable ranges, short-term auto-blocks etc. When I detect a bingbot UA I check it against the IP. If there is no match, no access. Which is why I'd run up a massive block list of 157 IPs until this list got added.

Reason for the method is historical as much as anything - my security system is a decade or more old, built when IIS had no proper regex let alone a decent htaccess capability.


 12:12 am on Nov 9, 2011 (gmt 0)

Thanks, dstiles. I, too, must juggle expediency with capability (the server's; my own:) Thus when it comes to the 157s et al, things get a bit-ham-fisted: msnbot and bingbot are only okay from:

RewriteCond %{REMOTE_HOST} !\.(bing|live|msn)\.com$
RewriteCond %{REMOTE_ADDR} !^65\.(54|55)\.
RewriteCond %{REMOTE_ADDR} !^157\.55\.
RewriteCond %{REMOTE_ADDR} !^207\.46\.

(The jury's still out on 157.56.)


 8:12 pm on Nov 13, 2011 (gmt 0)

You might want to change:
RewriteCond %{REMOTE_ADDR} !^65\.(54|55)\.

RewriteCond %{REMOTE_ADDR} !^65\.(52|54|55)\.

I've seen them come in through that range and it maps back to them. :)


 10:18 pm on Nov 14, 2011 (gmt 0)

I had an attempted drive by scraping from those ranges this morning.

Got hit requesting over 500 pages by about 45 different MS IPs, each asking for around 15-20 page each, all using this UA:

"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)"

What in the hell is going on over there?

... and it's still ongoing, more page requests being made as I type this...


 12:46 am on Nov 15, 2011 (gmt 0)

1.) Are all the IPs bare/no rDNS, all .search.msn.com, or a mix?

2.) Are all the hits to public files? I ask because of:

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

21:45:31 /robots.txt
21:46:21 /access_logs/

That robots.txt-ignoring, literally-waaay-outta-line, never-public URI is why neither 65.52. nor bingbot have carte blanche access. (Took me a while to recall why, @motorhaven:)


 3:54 am on Nov 15, 2011 (gmt 0)

Everything I've gotten (thousand of hits, its a forum with a very high page count) shows the 65.52 is legit for the crawlers.

Out of thousands of hits from the 65.62.#*$!.#*$! range the vast majority identify as a Microsoft crawler. These all fetch robots.txt and obey it.

The IPs below come in looking like either a user, or a stealth checker, but are rare in nature, hit a few pages and then leave. No search engine UA with these.
These do not fetch robots.txt but do obey it.


 5:39 am on Nov 15, 2011 (gmt 0)

The IPs hitting my site all rDNS to .search.msn.com with 891 hits so far today and counting

Also noticed a couple of different UAs from the same ranges, valid rDNS, also asking for hundreds of pages.

OK, going to do an MS lock down, screw this...


 2:26 pm on Nov 15, 2011 (gmt 0)

@motorhaven: For more about the 65.52s and 207.46s, see: MSN's Stealth Missions [webmasterworld.com...]


 4:10 pm on Nov 15, 2011 (gmt 0)

Thanks. I let in their stealth bots. There's nothing to hide on my site, the search engine crawlers get the same thing the users get.


 10:14 am on Nov 18, 2011 (gmt 0)

Since about Nov 8th I'm getting hit 50,000 - 80,000 times per day by a bot with the useragent

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)

Selection of IP addresses:

Not identified as a bot. And it is executing Javascript, which is using up a lot of resource and buggering up our stats.

Are other users just blocking via user-agent?


 10:31 am on Nov 18, 2011 (gmt 0)

try this

# ends with Gecko and from IP ranges
RewriteCond %{HTTP_USER_AGENT} Gecko)$
RewriteCond %{REMOTE_ADDR} ^207\.46\.(12|204)\.
RewriteCond %{REMOTE_ADDR} ^157\.55\.20[38]\.
RewriteRule .* - [F]


 11:57 am on Nov 18, 2011 (gmt 0)

Actually wilderness, ya left out a C range:

RewriteCond %{HTTP_USER_AGENT} Gecko)$
RewriteCond %{REMOTE_ADDR} ^207\.46\.(12|204)\.
RewriteCond %{REMOTE_ADDR} ^157\.55\.112\.20[38]\.
RewriteRule .* - [F]

But if blocking with this method, IMO it would be more pro-effective to use complete ranges: - -

RewriteCond %{HTTP_USER_AGENT} Gecko)$
RewriteCond %{REMOTE_ADDR} ^207\.46\.
RewriteCond %{REMOTE_ADDR} ^157\.[56][0-9]\.
RewriteRule .* - [F]


 1:41 pm on Nov 18, 2011 (gmt 0)

Thanks for the suggestions guys.


 1:45 pm on Nov 18, 2011 (gmt 0)

Syntax error! The code doesn't do what you want.

^157\.55\.112\.20[38]\. blocks and

Actually, it doesn't even do that because of the trailing period.


 3:32 pm on Nov 18, 2011 (gmt 0)

Whatever this is --

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)

-- that's running amok on IncrediBill's and HenryUK's sites from MSN, also has an atypical + in the "Kit/" version spot in addition to missing the post-"Gecko)" version number. So the preceding example --

RewriteCond %{HTTP_USER_AGENT} Gecko\)$ [NC]
(Note: I escaped the closing paren)

-- or a variation on a theme --

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Gecko\)$ [NC]

-- should suffice, imo, regardless of REMOTE_ADDR. I don't care where it comes from, it's not getting in.


 6:03 pm on Nov 18, 2011 (gmt 0)

That's what I get for being awake in the wee hours.

Line Should read:

RewriteCond %{REMOTE_ADDR} ^157\.55\.112\.20[38]$

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved