keyplyr

msg:4384633 | 11:58 pm on Nov 7, 2011 (gmt 0) |
| MS seem to have taken notice of complaints |
| Not sure if this is related but I've been complaining to Bing, msnbot, et al about errors these bots create at my server; sometimes as many as a couple hundred per day. I've exchanged numerous emails with them over the last few months. Yesterday all those errors stopped.
|
dstiles

msg:4384662 | 1:47 am on Nov 8, 2011 (gmt 0) |
Could be it was that recent. I did a single check on a repeatedly failed IP on Saturday and that showed the update. I complained to bingdude some time back, in another forum on WebmasterWorld, about the IP and UA problems and nudged again when he popped up there a few days ago so maybe that did some good. Who knows? :(
|
Pfui

msg:4384865 | 1:48 pm on Nov 8, 2011 (gmt 0) |
I'm curious: What do you do with the 157. info? Do you only allow msnbot/bingbot from those ranges? Or keep track for your own interest? Or--?
|
wilderness

msg:4384954 | 5:16 pm on Nov 8, 2011 (gmt 0) |
here's dstiles 156 & 157 Class C's (the lines are too many IMO to separate by Class D's) to regex: 157\.55\.([27]|[1[0136-9]|2[0-3]|3[6-9]|48|50|9[89]|10[0236-9]|11[012468]|154)\. 157\.56\.([0-5]|1[678]|80)\.
|
dstiles

msg:4385090 | 11:39 pm on Nov 8, 2011 (gmt 0) |
Pfui - I hold the ranges in a MySQL database, wherein are all other blockable ranges, short-term auto-blocks etc. When I detect a bingbot UA I check it against the IP. If there is no match, no access. Which is why I'd run up a massive block list of 157 IPs until this list got added. Reason for the method is historical as much as anything - my security system is a decade or more old, built when IIS had no proper regex let alone a decent htaccess capability.
|
Pfui

msg:4385097 | 12:12 am on Nov 9, 2011 (gmt 0) |
Thanks, dstiles. I, too, must juggle expediency with capability (the server's; my own:) Thus when it comes to the 157s et al, things get a bit-ham-fisted: msnbot and bingbot are only okay from: RewriteCond %{REMOTE_HOST} !\.(bing|live|msn)\.com$ RewriteCond %{REMOTE_ADDR} !^65\.(54|55)\. RewriteCond %{REMOTE_ADDR} !^157\.55\. RewriteCond %{REMOTE_ADDR} !^207\.46\. (The jury's still out on 157.56.)
|
motorhaven

msg:4386526 | 8:12 pm on Nov 13, 2011 (gmt 0) |
You might want to change: RewriteCond %{REMOTE_ADDR} !^65\.(54|55)\. to: RewriteCond %{REMOTE_ADDR} !^65\.(52|54|55)\. I've seen them come in through that range and it maps back to them. :)
|
incrediBILL

msg:4386919 | 10:18 pm on Nov 14, 2011 (gmt 0) |
I had an attempted drive by scraping from those ranges this morning. Got hit requesting over 500 pages by about 45 different MS IPs, each asking for around 15-20 page each, all using this UA: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)" What in the hell is going on over there? ... and it's still ongoing, more page requests being made as I type this...
|
Pfui

msg:4386966 | 12:46 am on Nov 15, 2011 (gmt 0) |
1.) Are all the IPs bare/no rDNS, all .search.msn.com, or a mix? 2.) Are all the hits to public files? I ask because of: msnbot-65-52-109-66.search.msn.com Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 21:45:31 /robots.txt 21:46:21 /access_logs/ That robots.txt-ignoring, literally-waaay-outta-line, never-public URI is why neither 65.52. nor bingbot have carte blanche access. (Took me a while to recall why, @motorhaven:)
|
motorhaven

msg:4387006 | 3:54 am on Nov 15, 2011 (gmt 0) |
Everything I've gotten (thousand of hits, its a forum with a very high page count) shows the 65.52 is legit for the crawlers. Out of thousands of hits from the 65.62.#*$!.#*$! range the vast majority identify as a Microsoft crawler. These all fetch robots.txt and obey it. The IPs below come in looking like either a user, or a stealth checker, but are rare in nature, hit a few pages and then leave. No search engine UA with these. 65.52.6.105 65.52.7.30 65.52.7.177 65.52.21.72 65.52.33.140 65.52.33.130 These do not fetch robots.txt but do obey it.
|
incrediBILL

msg:4387030 | 5:39 am on Nov 15, 2011 (gmt 0) |
The IPs hitting my site all rDNS to .search.msn.com with 891 hits so far today and counting Also noticed a couple of different UAs from the same ranges, valid rDNS, also asking for hundreds of pages. OK, going to do an MS lock down, screw this...
|
Pfui

msg:4387155 | 2:26 pm on Nov 15, 2011 (gmt 0) |
@motorhaven: For more about the 65.52s and 207.46s, see: MSN's Stealth Missions [webmasterworld.com...]
|
motorhaven

msg:4387193 | 4:10 pm on Nov 15, 2011 (gmt 0) |
Thanks. I let in their stealth bots. There's nothing to hide on my site, the search engine crawlers get the same thing the users get.
|
HenryUK

msg:4388483 | 10:14 am on Nov 18, 2011 (gmt 0) |
Since about Nov 8th I'm getting hit 50,000 - 80,000 times per day by a bot with the useragent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) Selection of IP addresses: 207.46.12.75 207.46.204.162 207.46.204.164 207.46.12.73 157.55.112.203 157.55.112.208 Not identified as a bot. And it is executing Javascript, which is using up a lot of resource and buggering up our stats. Are other users just blocking via user-agent?
|
wilderness

msg:4388487 | 10:31 am on Nov 18, 2011 (gmt 0) |
try this # ends with Gecko and from IP ranges RewriteCond %{HTTP_USER_AGENT} Gecko)$ RewriteCond %{REMOTE_ADDR} ^207\.46\.(12|204)\. RewriteCond %{REMOTE_ADDR} ^157\.55\.20[38]\. RewriteRule .* - [F]
|
keyplyr

msg:4388515 | 11:57 am on Nov 18, 2011 (gmt 0) |
Actually wilderness, ya left out a C range: RewriteCond %{HTTP_USER_AGENT} Gecko)$ RewriteCond %{REMOTE_ADDR} ^207\.46\.(12|204)\. RewriteCond %{REMOTE_ADDR} ^157\.55\.112\.20[38]\. RewriteRule .* - [F] But if blocking with this method, IMO it would be more pro-effective to use complete ranges: 207.46.0.0 - 207.46.255.255 157.54.0.0 - 157.60.255.255 RewriteCond %{HTTP_USER_AGENT} Gecko)$ RewriteCond %{REMOTE_ADDR} ^207\.46\. RewriteCond %{REMOTE_ADDR} ^157\.[56][0-9]\. RewriteRule .* - [F]
|
HenryUK

msg:4388539 | 1:41 pm on Nov 18, 2011 (gmt 0) |
Thanks for the suggestions guys.
|
g1smd

msg:4388540 | 1:45 pm on Nov 18, 2011 (gmt 0) |
Syntax error! The code doesn't do what you want.
^157\.55\.112\.20[38]\. blocks 157.55.112.203 and 157.55.112.208 Actually, it doesn't even do that because of the trailing period.
|
Pfui

msg:4388569 | 3:32 pm on Nov 18, 2011 (gmt 0) |
Whatever this is -- Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) -- that's running amok on IncrediBill's and HenryUK's sites from MSN, also has an atypical + in the "Kit/" version spot in addition to missing the post-"Gecko)" version number. So the preceding example --
RewriteCond %{HTTP_USER_AGENT} Gecko\)$ [NC]
(Note: I escaped the closing paren) -- or a variation on a theme --
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Gecko\)$ [NC]
-- should suffice, imo, regardless of REMOTE_ADDR. I don't care where it comes from, it's not getting in.
|
wilderness

msg:4388609 | 6:03 pm on Nov 18, 2011 (gmt 0) |
That's what I get for being awake in the wee hours. Line Should read: RewriteCond %{REMOTE_ADDR} ^157\.55\.112\.20[38]$
|
|