wilderness

msg:4503044 | 3:53 pm on Oct 2, 2012 (gmt 0) |
This is sufficient, and will trap some other random bots as well. RewriteEngine on #UA "contains" RewriteCond %{HTTP_USER_AGENT} Crawler [NC] RewriteRule .* - [L] There are a many bots that derive from Amazon. It's best to deny the Amazon IP's as well. See this thread [webmasterworld.com]
|
roblaw

msg:4503047 | 4:07 pm on Oct 2, 2012 (gmt 0) |
Wilderness, Thanks for the quick reply. Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site? Seems like a lot of the stuff running on AWS is completely unwanted. However, I have agree with some of the posters who mentioned the potential that you are blocking the "next big thing" to come along. boblaw
|
phranque

msg:4503051 | 4:10 pm on Oct 2, 2012 (gmt 0) |
get rid of the '\*$' in the pattern as it will prevent the user agent string from matching.
|
wilderness

msg:4503138 | 7:04 pm on Oct 2, 2012 (gmt 0) |
| Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site? |
| Only if the term "crawler" is contained in the User-Agent. BTW, you may also add "spider" and trap a few more pests. Change this line to: RewriteCond %{HTTP_USER_AGENT} (Crawler|spider) [NC]
|
lucy24

msg:4503191 | 9:34 pm on Oct 2, 2012 (gmt 0) |
Score another one for case sensitivity. I took a quick stroll through recent logs. "Crawler", capitalized, seems to be the domain of low-budget robots. But "crawler", lower-case, will also lock out any robot whose UA string includes informational URLs such as ../crawler/ or ../crawlerinfo.html. I'd prefer to take a closer look at those. (Does not apply, of course, if you're a strict whitelister.) To be safe, I'd leave out the [NC].
|
wilderness

msg:4503208 | 10:29 pm on Oct 2, 2012 (gmt 0) |
| To be safe, I'd leave out the [NC]. |
| And I would NOT, however lucy is certainly entitled to her own preferences.
|
|