Forum Moderators: open
x.x.x.x - - - [26/Mar/2018:21:45:01 +0200] "GET /robots.txt HTTP/1.1" 302 154 "-" "crawler (crawler.feedback@gmail.com)" "-"PORT:80 0.000 - . "GZIP:-"
x.x.x.x - - - [26/Mar/2018:21:46:31 +0200] "GET / HTTP/1.1" 302 154 "-" "crawler (crawler.feedback@gmail.com)" "-"PORT:80 0.000 - . "GZIP:-"
SetEnvIfNoCase User-AgentSure, it's easy to physically block them. (Psst! That’s what the BrowserMatch and BrowserMatchNoCase locutions are for.) I don't need to, because I use header-based access controls so almost everything is blocked by default. Except for robots.txt, all Crawler requests I found in recent days' logs received a 403. I haven't bothered to check how many specific access-control rules they violated (or rather, failed to meet).
Disallowing "Crawler" does not disallow "Crawler1"My understanding is that robots are supposed to interpret the Disallow line as broadly as possible: when in doubt, read it as “This means you” rather than “Oh, I had no idea they meant ‘Crawler’ when they said ‘crawler’”.