Page is a not externally linkable
jdMorgan - 1:05 pm on Apr 4, 2010 (gmt 0)
[edited by: jdMorgan at 1:28 pm (utc) on Apr 4, 2010]
tangor,
I don't want to divert this thread too far off track, but there are problems in your file. Some of your 'code blocks' have no "Order" specified within their scope, and there are several HTTP methods which aren't subject to any access controls at all. Further, if you attempt to use a custom 403 page, access to it will be blocked, resulting in an 'infinite loop' of 403 response attempts. I'd suggest:
SetEnvIf Request_URI "(robots\.txt|custom-403-page\.html)$" pass
#
Order Deny,Allow
#
<FilesMatch "\.(htaccess|htpasswd)$">
Deny from all
</FilesMatch>
#
<LimitExcept GET POST>
Deny from all
</LimitExcept>
#
<Limit GET POST>
Deny from 174.129
Deny from env=ban
Allow from env=pass
</Limit>
The SetEnvIf and "Allow from env=pass" directives create an override that allows all requestors to fetch robots.txt and your custom 403 error page. This will prevent problems with user-agents which interpret any failure to fetch robots.txt as carte-blanche to spider your site (likely resulting in a ton of 403s), and prevents the previously-mentioned 'infinite loop' on custom 403 page access.
All Denies are processed first, and Allows can override them. Any access not explicitly denied will be allowed. This is the most useful configuration, and makes the robots.txt and custom 403 page exclusions possible. Note that despite the added functionality the code is now simplified, with three 'blocks' of code instead of four. Note that as documented, "GET" includes "HEAD" in both <Limit> and <LimitExcept>, and therefore no explicit provisions need be made for HEAD requests.
Jim