Forum Moderators: phranque

Message Too Old, No Replies

Custom and default 403s

Friendly 403s for visitors short ones for bots?

         

Hedgehog_UK

3:21 pm on Aug 9, 2010 (gmt 0)

10+ Year Member



Is it possible to serve up a zero length (or very short) 403 page using htaccess if you also have a custom 403 page set up?

I run a website for kids, so there are custom versions of all the relevant error pages. The problem is bad bots. Where appropriate, I'd prefer to give them a zero length (or very short) 403.

I did try

RewriteRule ^.*$ onebyte.txt [R=403,L]

but that returned the custom 403 page. I have seen suggestions for using [G] - gone, but I have a child friendly version of that running already.

I could use php, but I'd prefer to save server loading and use htaccess directly if possible.

Suggestions anyone?

jdMorgan

5:11 pm on Aug 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could use server-side includes, and simply surround the HTML body of the page with <!--#if var="bad-bot" --> <!--#endif --> so that nothing in the body is output.

The variable could be the server's HTTP_USER_AGENT or you could pass a variable from .htaccess or server config file using the [E=] flag in the same RewriteRule that invokes the 403-handling.

Alternately, make the 403 ErrorDocument directive point to a PERL or PHP script, and generate whatever you want.

Jim

Hedgehog_UK

7:05 pm on Aug 9, 2010 (gmt 0)

10+ Year Member



Hello Jim,
I'd heard of server-side includes but had never really bothered to find out what they can do or how to use them.

A quick read through of the SSI tutorial on the Apache site (not sure if I'm allowed to include a link) suggests it could be ideal for this kind of thing. Just enough processing for a task like this.

Building on your suggestion, it seems sensible to use the <!--#else --> option of the conditional to handle the full (custom) version of the page. Possibly using the XBitHack on directive if its available.

I'll keep you informed and let you know how it goes.

Many thanks Jim.

Hedgehog_UK

11:02 am on Aug 11, 2010 (gmt 0)

10+ Year Member



Hello again Jim,
My coding could probably be opsimised a bit, but for the benefit of anyone else who might be interested in letting apache handle this directly instead of using PERL or PHP etc,;

For testing, I set up

# htaccess
# <ifModule mod_include.c>
# rem out for testing etc so all errors show up
#
# add .shtml as a text/html mime type
AddType text/html .shtml
# filter .html files through the SSI's INCLUDES filter before sending to client
AddOutputFilter INCLUDES .shtml
# allow SSI for .shtml files in the directory with this htaccess file
Options +IncludesNOEXEC
# <ifModule>
# remmed out for reasons given above
#
# completely new to SSI, so apply the test rule without [F] to start with
RewriteRule test\.shtml test.shtml [E=badbot:true]
# end

I then had

# test.shtml
<!--#if expr="$badbot != true" -->
<html><body>
<b>IF NOT</b>
<br>bad-bot = <!--#echo var="badbot" -->
</body></html>
<!--#else -->
<html><body>
<b>ELSE</b>
<br>badbot = <!--#echo var="badbot" -->
</body></html>
<!--#endif -->

It worked! The special case would be badbot, and the default would be the custom 403. Just what I wanted. After stripping out the <!--#else --> condition, the log revealed a 200 status return with just the header. Perfect. I transferred the SSI to the 403.shtml page, and included

RewriteCond %{REQUEST_URI} test\.shtml [NC]
RewriteRule .* - [E=badbot:true,F]

in htaccess for testing purposes. When I did that, the custom 403 came back every time, no trace of the special condition. To find out what was going on, I included

<PRE>
<!--#printenv -->
</PRE>

in the 403.shtml page. Apache's internal variables were there but badbot was missing. In its place, I spotted REDIRECT_badbot = true. Of course... while trying to figure this out, I'd seen that [F] used REDIRECT variables as part of its handler. I edited the 403.shtml page to

<!--#if expr = "$REDIRECT_badbot != true" -->

and tried again. The site's log revealed a 403 status return - with just a couple of bytes. I don't know where they come from, but I'll see if I can get rid of them.

Having developed scripts for websites and a log analyser, this initiation into SSI has been a steep (but interesting) learning curve. Now then... what else could I use SSI for.

Many thanks Jim.