Forum Moderators: phranque
It looks to me like the part of the picture I need to change is contained in the .htaccess file. Specifically:
[2]SetEnvIf Request_URI "^(/403.*\.htm¦/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>[/2] Some of you would know that the above snippet is amended by the PHP (or perl) program such that a list of IPs is built above these statements. When one of these IPs revisits, the env variable is set to "getout" and therefore a 403 is sent back to the requestor.
Rather than having a 403, I wanted to redirect to a specific page with a blurb explaining what has happened (and with some contact details - obviously not a link to the site).
My reasoning for this is that I do not want to ban ALL users of a particular IP just because one has been naughty.
I thought this might be particularly relevant for websites directed at students - some of whom would happily share IPs with other students innocently playing around with robots.
I guess the "deny from env=" is what causes the 403?
Do I need to completely rethink this part of the puzzle in order to do this redirect?
Rather than having a 403, I wanted to redirect to a specific page with a blurb explaining what has happened
1: Create a custom 403 page that briefly explains why visitors may be denied access (keep to about 2 kb)
2: Create a custom 403 page that says "Access Denied!" "Go here to read about our access control policies"
Provide a link on "Go here" to a "403b" page that explains your policies and what circumstances will get a visitor's IP or User Agent banned. Provide a link on the second page to a form they can use to request removal from the blocklist. Add both the 403b and removal request path/page to the "allowsome" list.
3: Add text to the Banning script that explains why the script was tripped and include a link to request-removal form. Add that form and path to the "allowsome" ENV group.
I use the second 403(b) solution, with a third page for removal requests, and allow these pages to be accesses by IPs in ENV=getout. In the year and a half since I implimented this system not one banned or 403'd visitor has ever filled out the removal request form, although many have followed the links to it and landed there.
You can also get fancy and create a special RewriteRule that redirects visitors who trip your ban script to a special banned explanation page, which has a removal request link or form. This way they never see the 403 page, but I think this is a waste of time. Adding a link from the short 403 page to a second explanation page works better for me.
Wiz
Add text to the Banning script that explains why the script was tripped and include a link to request-removal form. Add that form and path to the "allowsome" ENV group.
In my case the person who is banned does not see the page generated by the Banning script.... and I take your point that it's unlikely that anyone will ever find themselves in this position - I guess I just like to cover my bases (good practice to handle otherwise irrecoverable exceptions as I see it - obviously got too much time on my hands at the moment :)).
Anyway, I'm assuming the <files *> technique is not used in this case.
Do you have an example of some htaccess code that tests the env variable (or sets and tests a custom env variable). Is it something as simple as:
[2]SetEnvIf Remote_Addr ^99.99.99.99$ getout
RewriteEngine on
RewriteCond env=getout
RewriteRule ^.*$ my403bPage.php[/2]
The inevitable has happened and they are converging (and I tried so hard to do the right thing!)
Sorry (slink slink)
There is no need to mix Apache modules. Most of the time you can, but there are module load order dependencies that can trip you up (now or later if you change hosts):
Mod_rewrite Method:
RewriteEngine on
RewriteCond %{Remote_ADDR} ^99\.99\.99\.99$
RewriteCond %{REQUEST_URI} !^(my403bPage\.php¦robots\.txt¦tos\.html)$
RewriteRule .* /my403bPage.php [L]
Mod_access Method:
SetEnvIf Request_URI "(my403bPage\.php¦robots\.txt¦tos.html)$" allowit
<Files *>
Order Deny,Allow
Deny from env=getout
Deny from 99.99.99.99
Allow from env=allowit
</Files>
Change the broken pipe "¦" characters above to solid pipes before use. Posting on this forum modifies them, and they will cause errors.
Jim
In my case the person who is banned does not see the page generated by the Banning script
You can add text printouts to the trap script that tells them whatever you want to. Some here feel that once a trap has been sprung there is no point wasting bandwidth informing the trapped party about your security measures. The reasoning is that this may make them work at hacking your website to get even, or to try to find a workaround.
Here is an example of html output (to screen) that is added to a Perl bot trap:
print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Access Denied</title>\n";
print "<meta name=\"robots\" content=\"noindex,nofollow\">\n";
print "</head>\n";
print "<body>\n";
print "<center><h1>Access Denied</h1></center>\n";
print "<p>To find out what may have caused you to be denied access to our website click here ([i]link to another page with explanations about your access control policies[/i])</p>\n";
print "</body>\n";
print "</html>\n";
SetEnvIf Remote_Addr ^99\.99\.99\.99$ getout (only written after the first violation)
SetEnvIf Request_URI "^(/my403bPage\.php¦robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>
ErrorDocument 403 /my403bPage.php
Options +FollowSymLinks
RewriteEngine on
etc etc etc
Is that "bad" htaccessing?
I don't have a tos yet but it's not a bad idea.
PS. I'm not sure (really) why the robots.txt is there but I've raised this in a thread about the spider trap technique generally [webmasterworld.com] not the micro-htaccess part of it.
I answered your question about robots.txt in the other thread.
Jim
All works fine on the site I put this on some weeks ago but I get an unexpected error on another.
The basic banning works fine but if I then try to get into the site having been banned I get:
You don't have permission to access /getout.php on this server.
Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.
I have identical .htaccess code (above the ErrorDocument 403).
The sites are on different servers. I don't have access to the main apache modules (not that I'm aware of anyway).
Any clues as to what may be set differently on each server?