Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite blocks 403 pages

         

isitreal

12:38 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm running the standard, WebmasterWorld badbot list:

RewriteEngine On
#bad bots block
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]....

RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [NC]

RewriteRule ^.* - [F,L]

This is correctly creating the 403, but this code earlier in the htaccess is not allowing the custom 403 page to display, instead I get the apache 403, plus the message that the custom 403 page could not be displayed

SetEnvIf Request_URI "^(/site/403\.html¦/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>

I've tried moving the various components around, removing the [L] from the rewrite rule, but the best I can get is the stock apache 403 page.

>>>Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.>>>>

I also tried running the botblocking .htaccess one level above the site root folder, but that didn't work at all, something in the .htaccess cascade completely escapes me, I thought that each set of rewrite rules etc was added to the one above it, something like the way CSS works, but I've been running into consistent failures based on that understanding, so I know I'm missing something.

However, for now I'd be happy to get the badbots sent to the custom 403 page, and worry about understanding .htaccess cascading some other time.

jdMorgan

12:51 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just exclude the 403 page from the rule:

RewriteEngine On
#bad bots block
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
...
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [NC]
RewriteRule [b]!^403\.html$[/b] - [F,L]

Alternatively, you can add a rule at the beginning of your mod_rewrite code:

RewriteRule ^403\.html$ - [L]

This will skip all of the following rewriterules if the custom 403 page "403.html" is requested.

Note that each module is activated in turn, and parses your .htaccess file for directives that it understands and can handle. Therefore, mod_rewrite and mod_access code will run separately, each without any awareness of the other. This is the reason that it does not matter what order you put mod_access code blocks and mod_rewrite code blocks in your .htaccess file; The server will exceute each module in the order (actually, reverse order) specified by the module load list in httpd.conf, not in the order you put the code blocks in your .htaccess file. So the order that your module-specific code blocks take effect is controlled by which module runs first, not by the code order.

[added].htaccess at a higher level may override .htaccess at a lower level, specifically if the higher-level .htaccess does a rewrite before the lower-level .htaccess is ever reached. In effect, the "cascade" of .htaccess is opposite that of CSS; In CSS, the "closest" specification for handling an element applies, whereas in .htaccess, the highest-level .htaccess will pre-empt lower-level files (the highest file in the directory structure wins by running first). [/added]

Jim

[edited by: jdMorgan at 12:55 am (utc) on Sep. 21, 2004]

isitreal

12:55 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How annoying, I was thinking of doing that, but staring at this stuff for a few hours made me start going braindead, as always, thanks a lot.

That explains a bunch of things, I'm going to order the O'Reilly Apache book, I'm sick of not understanding this stuff, this is the only stuff I simply cannot figure out on a consistent basis.

jdMorgan

12:57 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I didn't know there was a book... I thought everyone just trashed their server two or three thousand times, and thereby became an expert! ;)

Jim

isitreal

1:03 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the highest file in the directory structure wins by running first

Oh, that explains it, now I see, I was assuming the request cascaded downwards, but this is exactly the problem I had, I had a search engine friendly .htaccess rewrite thing, but then a bad bot started hitting the site folder that had that .htaccess, and the sitewide bad bot didn't function.

If I don't have the [L] condition on the .htaccess deeper in the site structure, since that runs first, will that then allow the higher level one, above the document root, to then run? That really clears it up though, I should have realized that the cascade ran backwords, that's what i was seeing after all, but I thought all document requests worked their way down, not up in apache, like /usr first, /www second/ /yourfolder third, /yoursite fourth, /folder in your site last, and so on, but now it all makes sense if it's the opposite, this has been driving me crazy for a few months now.

jdMorgan

1:34 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> the higher level one, above the document root

The higher-level what, though? An .htaccess file above the document root won't run at all. If you have rules in a <directory> container in your httpd.conf, they will run, but .htaccess runs in a per-directory context only, and therefore, it has to be in a directory that will be accessed as the server traverses the directory structure from the root directory down to where the file that represents the requested resource is located.

Apache does work from the highest-level down. It's just that it executes each .htaccess in that same order, so therefore any rule which redirects or limits access at a higher level cannot be overridden or reversed by different code at a lower level. The code is executed as the server encounters it, piecewise.

Again, going back to the CSS comparison, CSS works differently, in that the entire set of external, internal, and in-line styles are evaluated completely before any action is taken, and the closest specification to the element being styled wins; on-page code in the <head> section can override external stylesheets, and in-line styles can override styles in the files' <head> section. But Apache executes the code as it finds it, so if a higher-level .htaccess file invokes a redirect that bypasses a lower-level file, then the lower-level file is never even evaluated.

Hopefully, this clarifies things rather than confusing them!

Jim

isitreal

1:47 am on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, it clarifies it enormously, thanks a lot for that explanation, now I see why various things I've been trying haven't been working, not a confusing explanation at all, though I wish I knew where I got the idea that I could put an .htaccess file above the document root directory, maybe I confused the httpd.conf stuff with the .htaccess stuff, I think I assumed you could do pretty much the same in either, working on my test server of course stuff was working in the httd.conf files so I assumed it would also work on the .htaccess files, the light brightens slightly before it dims again next time... :-) At least this time I didn't generate any 500 errors while testing on the live server. Thanks again for sharing your understanding of this stuff, I really appreciate it.

Maybe now I can get some of the stuff working that was broken..