Forum Moderators: phranque
But even with the majority of troublemakers and bad boys blocked from the get-go, my .htaccess tends to teeter on the too-big size (~135-140k, for my server). There's no perceptible slow-down but I know it's getting too big when, for example, a UA at the bottom of the file doesn't get caught, but the same UA does when I move it up top. So-o-o 'tis time to consolidate (again). Help, please?
Are the following acceptable uses of pipes (which appear as broken lines on this board) --
---
1.) SetEnv
Turn these...
SetEnvIfNoCase User-Agent "^AnsearchBot" keep_out
SetEnvIfNoCase User-Agent "^\ AnsearchBot" keep_out
SetEnvIfNoCase User-Agent "^Bot" keep_out
SetEnvIfNoCase User-Agent "^\ Bot" keep_out
SetEnvIfNoCase User-Agent "^cfetch" keep_out
SetEnvIfNoCase User-Agent "^\ cfetch" keep_out
...into this one-liner?
SetEnvIfNoCase User-Agent "^(Agent¦\ Agent¦AnsearchBot¦\ AnsearchBot¦Bot¦\ Bot¦cfetch¦\ cfetch)" keep_out
...Or this couplet?
SetEnvIfNoCase User-Agent "^(Agent¦AnsearchBot¦Bot¦cfetch)" keep_out
SetEnvIfNoCase User-Agent "\ (Agent¦AnsearchBot¦Bot¦cfetch)" keep_out
(Note: Per sightings, UAs are space-escaped. Ditto logged u/l case but UAs are NC.)
---
2.) HTTP_USER_AGENT
Turn these...
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Seekbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Septera [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*ServerGear [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Sextant [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*SezamFile [OR]
...into this?
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*(Seekbot¦Septera¦ServerGear¦Sextant¦SezamFile) [NC,OR]
---
3.) REMOTE_HOST
Turn these...
RewriteCond %{REMOTE_HOST} \.exabot\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.exava\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.fast\.no$ [OR]
RewriteCond %{REMOTE_HOST} \.fastclick\.net$ [OR]
RewriteCond %{REMOTE_HOST} \.fastsearch\.net$ [OR]
...into this?
RewriteCond %{REMOTE_HOST} \.(exabot\.com¦exava\.com¦fast\.no¦fastclick\.net¦fastsearch\.net)$ [127.0.0.1...] [R,L]
(Note: I use good, old 127.0.0.1 instead of [F] so I can log/track accesses.)
.
4.) RewriteRule
Turn these (& I know they look wacky but they work)...
RewriteRule ^FormMail\.cgi(.*) [127.0.0.1...] [R,L]
RewriteRule ^(.*)/FormMail\.cgi(.*) [127.0.0.1...] [R,L]
RewriteRule ^formmail\.cgi(.*) [127.0.0.1...] [R,L]
RewriteRule ^(.*)/formmail\.cgi(.*) [127.0.0.1...] [R,L]
...into this:
RewriteRule ^(FormMail\.cgi\(.*\)¦\(.*\)/FormMail\.cgi\(.*\)) [127.0.0.1...] [NC,R,L]
(Note: That last one makes my head spin:)
Thanks in advance for your help!
[edited by: Pfui at 8:56 pm (utc) on April 17, 2006]
I know it's getting too big when, for example, a UA at the bottom of the file doesn't get caught, but the same UA does when I move it up top.
I'd look carefully at your code for missing (or extra) [OR] flags, and other coding or logical errors. There's no reason Apache should refuse to run the whole file.
To speed things up, you can put things like specific-page rewrites or redirects at the bottom, and IP and user-agent blocking at the top. Try to handle the most-frequent cases first, and use the [L] flag.
SetEnvIfNoCase User-Agent "^(Agent¦\ Agent¦AnsearchBot¦\ AnsearchBot¦Bot¦\ Bot¦cfetch¦\ cfetch)" keep_out
It's not clear why you'd have both "^Agent" and ^\ Agent" in there, and the same kind of repeat for Ansearchbot. If these user-agents don't always start with the specified character, then leave off the ^ start anchor. To be specific:
^Agent = Must start with "Agent" (may be followed by any number of characters)
Agent$ = Must end with "Agent" (may be preceded by any number of characters)
^Agent$ = Must exactly match "Agent" (no additional characters allowed for a match)
Agent = Must contain "Agent" (may be preceded or followed by any number of characters)
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*(Seekbot¦Septera¦ServerGear¦Sextant¦SezamFile) [NC,OR]
RewriteCond %{REMOTE_HOST} \.exabot\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.exava\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.fast\.no$ [OR]
RewriteCond %{REMOTE_HOST} \.fastclick\.net$ [OR]
RewriteCond %{REMOTE_HOST} \.fastsearch\.net$ [OR]...into this?
RewriteCond %{REMOTE_HOST} \.(exabot\.com¦exava\.com¦fast\.no¦fastclick\.net¦fastsearch\.net)$ [127.0.0.1...] [R,L]
> (Note: I use good, old 127.0.0.1 instead of [F] so I can log/track accesses.)
I don't understand that at all -- 403's are logged, just as redirects would be. Most "nasties" don't follow redirects, and with a 301 or 302 you are not telling them they are unwelcome, so they may never go away.
Jim