Page is a not externally linkable
- Code, Content, and Presentation
-- Apache Web Server
---- Coordinating robots.txt and .htaccess


jdMorgan - 5:14 pm on Nov 3, 2002 (gmt 0)


Busynut,

...if I use another htaccess file in a lower directory (for blocking image hotlinking primarily)... does it completely cancel out the directives in the one in the higher directory...?

No, .htaccess files are applied in order, from the top of your directory hierarchy on down. If this issue is worrisome, simply do your image hot-link block in your top-level directory - it's more efficient doing it there anyway.

403 issue:

One approach you might consider is to serve up a very small generic 403 ErrorDocument regardless of whether the User-agent is known-bad or just possibly-bad. On this page, put a link and a meta-refresh redirect to an "explanatory" page for innocent visitors who get caught in your 403 trap. Generally, bad-bots will not follow the link or the meta-refresh redirect, and the small initial 403 page will save you bandwidth on the bad-bots that are too stupid to quit trying to get in.

With this approach, I believe you can accomplish what you want to do using just:

ErrorDocument 403 /403.htm
RewriteCond %{HTTP_USER_AGENT} <list of bad bots>
RewriteRule !^(403.*\.htm¦robots\.txt)$ - [F,L]

Note that robots.txt and any document which starts with "403" and ends with ".htm" can now be served to any User-agent, so name your 403-explanatory-page-for-innocent-victims 403info.htm, or something like that. This is how I've done it, and it works well.

You could also do this 403 stuff in two steps to more closely approximate what you originally intended: Use the ErrorDocument 403 /403.htm to start. Then add another layer of mod-rewrite internal redirection below that to discriminate between known-bad and possibly-bad user-Agents. In other words, internally rewrite 403.htm to a different URL depending on the user-Agent. The above approach is simpler, easier on your server in case the 'bot won't give up, and avoids any kind of User-agent cloaking.

HTH,
Jim


Thread source:: http://www.webmasterworld.com/apache/262.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com