Forum Moderators: phranque
RewriteEngine On
#bad bots block
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]....
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [NC]
RewriteRule ^.* - [F,L]
This is correctly creating the 403, but this code earlier in the htaccess is not allowing the custom 403 page to display, instead I get the apache 403, plus the message that the custom 403 page could not be displayed
SetEnvIf Request_URI "^(/site/403\.html¦/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>
I've tried moving the various components around, removing the [L] from the rewrite rule, but the best I can get is the stock apache 403 page.
>>>Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.>>>>
I also tried running the botblocking .htaccess one level above the site root folder, but that didn't work at all, something in the .htaccess cascade completely escapes me, I thought that each set of rewrite rules etc was added to the one above it, something like the way CSS works, but I've been running into consistent failures based on that understanding, so I know I'm missing something.
However, for now I'd be happy to get the badbots sent to the custom 403 page, and worry about understanding .htaccess cascading some other time.
RewriteEngine On
#bad bots block
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
...
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [NC]
RewriteRule [b]!^403\.html$[/b] - [F,L]
RewriteRule ^403\.html$ - [L]
Note that each module is activated in turn, and parses your .htaccess file for directives that it understands and can handle. Therefore, mod_rewrite and mod_access code will run separately, each without any awareness of the other. This is the reason that it does not matter what order you put mod_access code blocks and mod_rewrite code blocks in your .htaccess file; The server will exceute each module in the order (actually, reverse order) specified by the module load list in httpd.conf, not in the order you put the code blocks in your .htaccess file. So the order that your module-specific code blocks take effect is controlled by which module runs first, not by the code order.
[added].htaccess at a higher level may override .htaccess at a lower level, specifically if the higher-level .htaccess does a rewrite before the lower-level .htaccess is ever reached. In effect, the "cascade" of .htaccess is opposite that of CSS; In CSS, the "closest" specification for handling an element applies, whereas in .htaccess, the highest-level .htaccess will pre-empt lower-level files (the highest file in the directory structure wins by running first). [/added]
Jim
[edited by: jdMorgan at 12:55 am (utc) on Sep. 21, 2004]
That explains a bunch of things, I'm going to order the O'Reilly Apache book, I'm sick of not understanding this stuff, this is the only stuff I simply cannot figure out on a consistent basis.
the highest file in the directory structure wins by running first
Oh, that explains it, now I see, I was assuming the request cascaded downwards, but this is exactly the problem I had, I had a search engine friendly .htaccess rewrite thing, but then a bad bot started hitting the site folder that had that .htaccess, and the sitewide bad bot didn't function.
If I don't have the [L] condition on the .htaccess deeper in the site structure, since that runs first, will that then allow the higher level one, above the document root, to then run? That really clears it up though, I should have realized that the cascade ran backwords, that's what i was seeing after all, but I thought all document requests worked their way down, not up in apache, like /usr first, /www second/ /yourfolder third, /yoursite fourth, /folder in your site last, and so on, but now it all makes sense if it's the opposite, this has been driving me crazy for a few months now.
The higher-level what, though? An .htaccess file above the document root won't run at all. If you have rules in a <directory> container in your httpd.conf, they will run, but .htaccess runs in a per-directory context only, and therefore, it has to be in a directory that will be accessed as the server traverses the directory structure from the root directory down to where the file that represents the requested resource is located.
Apache does work from the highest-level down. It's just that it executes each .htaccess in that same order, so therefore any rule which redirects or limits access at a higher level cannot be overridden or reversed by different code at a lower level. The code is executed as the server encounters it, piecewise.
Again, going back to the CSS comparison, CSS works differently, in that the entire set of external, internal, and in-line styles are evaluated completely before any action is taken, and the closest specification to the element being styled wins; on-page code in the <head> section can override external stylesheets, and in-line styles can override styles in the files' <head> section. But Apache executes the code as it finds it, so if a higher-level .htaccess file invokes a redirect that bypasses a lower-level file, then the lower-level file is never even evaluated.
Hopefully, this clarifies things rather than confusing them!
Jim
Maybe now I can get some of the stuff working that was broken..