Forum Moderators: phranque
[edited by: not2easy at 6:18 pm (utc) on Sep 11, 2018]
[edit reason] exemplified mail server [/edit]
#======== # periodic server alive test...
SetEnvIfNoCase User-Agent "urlwatch" dontlog
# only allow mozilla, urlwatch and only https
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_USER_AGENT} ^urlwatch\ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(stuff..more stuff).*$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^ht[tm][lpr] [NC]
RewriteRule . - [F,L]
RewriteCond %{HTTP_REFERER} !^https://mail\.example\.net/.*$ [NC]You might want to make this anonymous
only allow mozillaAt this point, the vast majority of robots have “Mozilla” at the beginning of their UA string, so a check for this element is no longer as useful as it was 10 years ago.
lucy provided the order of Rules so many times previously that she had a working bookmarkHiya, Don, long time no see. I don't know about bookmarks, but I know I've got a few slabs of boilerplate saved. Here's a bit about ordering of RewriteRules, extracted from the middle of a longer document about all kinds of htaccess cleanup:
At the beginning is the single lineRewriteEngine on
A RewriteBase is almost never needed; get rid of any lines that mention it. Instead, make sure every target begins with either protocol-plus-domain or a slash / for the root.
Sort RewriteRules twice.
First group them by severity. Access-control rules (flag [F]) go first. Then any 410s (flag [G]). Not all sites will have these. Then external redirects (flag [R=301,L] unless there is a specific reason to say something different). Then simple rewrite (flag [L] alone). Finally, there may be a few rules without [L] flag, such as cookies or environmental variables.
Function overrides flag. If your redirects are so complicated that they've been exiled to a separate .php file, the RewriteRule will have only an [L] flag. But group it with the external redirects. If certain users are forcibly redirected to an "I don't like your face" page, the RewriteRule will have an R flag. But group it with the access-control [F] rules.
Then, within each functional group, list rules from most specific to most general. In most htaccess files, the second-to-last external redirect will take care of "index.html" requests. The very last one will fix the domain name, such as with/without www.
Leave a blank line after each RewriteRule, and put a# comment
before each ruleset (Rule plus any preceding Conditions). A group of closely related rulesets can share an explanation.
SetEnvIf User-Agent "curl\/" keep_out
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_USER_AGENT} ^urlwatch\ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(stuff..stuff).*$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^ht[tm][lpr] [NC]
RewriteRule . - [F,L]
'ornext|OR' (or next condition)[httpd.apache.org...]
Use this to combine rule conditions with a local OR instead of the implicit AND. Typical example:
RewriteCond %{HTTP_USER_AGENT} ^(urlwatch\.|Mozilla/5\.0)
Mixing [AND] (implicit) and [OR] in the same ruleset can lead to grief. Not because the server gives a hoot, but because you then need to pay extra-close attention to what goes in what order so you don't say “(A and B) or C” when you meant to say “A and (B or C)” SetEnvIf User-Agent "curl/" keep_outOption B, in that case, would be to switch off the variable (note incidentally that you don't need to escape / slashes in mod_setenvif):
This worked for me to ban all curls, until I found out that Drupal.org uses this in their bot, which I need
I have a note in my htaccess file that customlog is not allowed in htaccessYup, everything to do with logging can only be said in the config file--either lying loose for the whole server, or in a vhost envelope. (Apache docs always list these two categories separately, even though things that can be done in config, but can't be done in vhost, can pretty well be counted on the fingers of one hand.) The same applies to LogLevel directives, which apply specifically to the Error Log*, and to RewriteLog, which is What It Says On The Box.
Do undeclared targets get lost anyway?Do you mean, does the server recognize the “dontlog” environmental variable even if you haven't told it what it means? I shouldn't think so. Do you have some independent way of knowing that the requests are in fact coming in, just not getting logged? (You don't have a firewall, do you? If requests are blocked before they even reach the server, then logging preferences wouldn't apply.)