Forum Moderators: phranque
There is one particular research bot (X1)who redundantly requests my robots.txt hundreds of times a day. I've decided to ban it until it's developers (we have spoken) rewrite it to behave. I have also decided that letting it load any other file except a default 403 serves no productive purpose, however I wish to offer other offenders (X2, X3) my robots.txt or custom403 page (respectively.)
Is this an appropriate way to do the above?
SetEnvIf Remote_Addr ^XXX\.XX\.XX\.X[b]1[/b]$ ban
<Files *>
Order Deny,Allow
Deny from env=ban
</Files>
SetEnvIf Referer ^XXX\.XX\.XX\.X[b]2[/b]$ ban
SetEnvIf Remote_Addr ^XXX\.XX\.XX\.X[b]3[/b]$ ban
SetEnvIf Request_URI ^(robots\.txt¦custom403\.html)$ allowit
<Files *>
Order Deny,Allow
Deny from env=ban
Allow from env=allowit
</Files>
Just combine all of it:
SetEnvIf Remote_Addr ^XXX\.XX\.XX\.X1$ ban
SetEnvIf Referer ^XXX\.XX\.XX\.X2$ ban
SetEnvIf Remote_Addr ^XXX\.XX\.XX\.X3$ ban
SetEnvIf Request_URI ^(robots\.txt¦custom403\.html)$ allowit
<Files *>
Order Deny,Allow
Deny from env=ban
Allow from env=allowit
</Files>
Jim
I guess I wasn't very clear Jim.
While I want even those UAs I ban from my site (x2, x3) to get my custom403 page, or the robots.txt if they request it, I do not wish to let X1 access any file. It is an experimental bot poorly written by some science lab students at a university which is requesting robots.txt redundantly dozens of times per visit. I want it to go away - LOL
I would avoid having multiple Order statements in one .htaccess file - that seems to cause the second group of denies to be ignored (you can try it, though... I could use a second test case.)
So that leaves either a stand-alone Deny from in a <Files> container (with no Order or Allow from directives in that container), or you could use a mod_rewrite deny to whack that 'bot separately:
RewriteCond %{REMOTE_ADDRESS} ^x\.x\.x\.x$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^bad_bot1$
RewruteRule .* - [F]
Jim
ErrorDocument 403 /forbidden.html
What could I add to this code to stop that?
RewriteCond %{REMOTE_ADDRESS} ^x\.x\.x\.x$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^bad_bot1$
RewruteRule .* - [F]