Forum Moderators: goodroi
Therefore the best way to deal with this problem is as follows:
1) disallow bot in robots.txt and allow anyone to take this file
2) ban requests to all other urls from those bots that should have obeyed robots.txt
[i]SetEnvIf Request_URI "^(403\.html¦robots\.txt)$" allow-it[/i]
SetEnvIf User-Agent "larbin" bad-bot
SetEnvIf User-Agent "psycheclone" bad-bot
SetEnvIf User-Agent "Leacher" bad-bot
#
<Files *>
Order Deny,Allow
[i]Allow from env=allow-it[/i]
Deny from env=bad-bot
Deny from 38.0.0.0/8
</Files>
[i]RewriteCond %{REQUEST_URI} !^(403\.html¦robots\.txt)$[/i]
RewriteCond %{HTTP_USER_AGENT} larbin [OR]
RewriteCond %{HTTP_USER_AGENT} psycheclone [OR]
RewriteCond %{HTTP_USER_AGENT} Leacher
RewriteRule .* - [F]
Replace the broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.
Jim
RewriteCond %{REQUEST_URI} [b]!^/([/b]403\.html¦robots\.txt)$
Jim
That lets you specify the page on your site and the user agent you wish to test.
Try this with both robots.txt and your index.html page and see what happens!
My error documents in htaccess look like this:
ErrorDocument 403 /errors/403.htm
ErrorDocument 404 /errors/404.htm
ErrorDocument 500 /errors/500.htm
ErrorDocument 410 /errors/404.htm
(and I did change 'html' to 'htm' in Jim's code).