Forum Moderators: open
What am I looking at? Why is it grabbing only 40,000 bytes of images that are between 150 and 200K in size?
www.example.com 65.55.108.nnn - - [15/Jul/2009:15:09:03 -0500] "GET /my-photos/photo-06.jpg HTTP/1.1" 200 40000 "-" "WinHttp"
www.example.com 65.55.220.nnn - - [15/Jul/2009:15:13:24 -0500] "GET /my-photos/photo-04.jpg HTTP/1.1" 200 40000 "-" "WinHttp"
www.example.com 65.55.220.nnn - - [15/Jul/2009:15:22:20 -0500] "GET /example-photos/example-04.jpg HTTP/1.1" 200 34240 "-" "WinHttp"
www.example.com 65.55.220.nnn - - [15/Jul/2009:15:24:13 -0500] "GET /my-photos/photo-03.jpg HTTP/1.1" 200 40000 "-" "WinHttp"
www.example.com 65.55.220.nnn - - [15/Jul/2009:15:25:34 -0500] "GET /example-photos/example-24.jpg HTTP/1.1" 200 40000 "-" "WinHttp"
www.example.com 65.55.220.nnn - - [15/Jul/2009:15:29:13 -0500] "GET /example-photos/example-29.jpg HTTP/1.1" 200 41233 "-" "WinHttp"
www.example.com 65.55.108.nnn - - [15/Jul/2009:15:36:24 -0500] "GET /example-photos/example-16.jpg HTTP/1.1" 200 41233 "-" "WinHttp"
This did not impress me either:
www.example.com 65.55.231.nnn - - [16/Jul/2009:02:02:50 -0500] "GET /robots.txt HTTP/1.1" 200 554 "-" "Mozilla/4.0"
www.example.com 65.55.108.nnn - - [16/Jul/2009:04:28:42 -0500] "GET /robots.txt HTTP/1.1" 200 554 "-" "Mozilla/4.0"
www.example.com 65.55.231.nnn - - [16/Jul/2009:05:14:05 -0500] "GET /robots.txt HTTP/1.1" 200 554 "-" "Mozilla/4.0"
www.example.com 65.55.231.nnn - - [17/Jul/2009:20:54:18 -0500] "GET /robots.txt HTTP/1.1" 200 554 "-" "Mozilla/4.0"
www.example.com 65.55.231.nnn - - [17/Jul/2009:20:54:18 -0500] "GET /articles/my-article HTTP/1.1" 200 10045 "-" "Mozilla/4.0"
www.example-one.com 65.55.217.nnn - - [17/Jul/2009:21:08:01 -0500] "GET /robots.txt HTTP/1.1" 200 465 "-" "Mozilla/4.0"
www.example-one.com 65.55.217.nnn - - [17/Jul/2009:21:08:01 -0500] "GET /content-that-may-return HTTP/1.1" 302 - "-" "Mozilla/4.0"
www.example-one.com 65.55.217.nnn - - [17/Jul/2009:21:08:01 -0500] "GET / HTTP/1.1" 200 3711 "-" "Mozilla/4.0"
www.example-one.com 65.55.230.nnn - - [17/Jul/2009:22:34:16 -0500] "GET /robots.txt HTTP/1.1" 200 465 "-" "Mozilla/4.0"
www.example-two.com 65.55.230.nnn - - [17/Jul/2009:22:44:05 -0500] "GET /robots.txt HTTP/1.1" 200 522 "-" "Mozilla/4.0"
www.example-two.com 65.55.230.nnn - - [17/Jul/2009:22:44:05 -0500] "GET /my-page.html HTTP/1.1" 200 2985 "-" "Mozilla/4.0"
As for winhttp, ditto - it's been blocked for years and always generates an IP block. If they play with amateur tools what can they expect? They're supposed to be professionals... no, sorry, I can't write that with a straight face. :)
This should solve these issues, while reducing the MSN bot ranges to a reasonable and/or long used Class C range.
RewriteCond %{REMOTE_ADDR} ^65\.55\.(1[0-9][0-9]¦2[0-5][0-9])\.
RewriteCond %{HTTP_USER_AGENT} !msnbot
RewriteRule .* - [F]
(Please note; correction required of broken pipe characters by forum).
Please note; this also takes out the MSN/Bing Translator which uses your own browser UA. It does use a translator refer page (along with your own IP) in the interpreted page, while it views both versions of the page side-by-side.
It make take out some other MSN Tools as well (didn't check).
Thanks for the "WinHttp".