Forum Moderators: phranque

Message Too Old, No Replies

How to make .htaccess smaller, leaner

Honey, I shrunk the code?

         

Pfui

8:53 pm on Apr 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm a long-time convert to whitelisting -- I automatically deny all agents not including "Mozilla" in their UA strings, then manually block SEs and IPs (etc.) spoofing Mozilla. Then I allow everyone else. (More ways to whitelist here [webmasterworld.com].)

But even with the majority of troublemakers and bad boys blocked from the get-go, my .htaccess tends to teeter on the too-big size (~135-140k, for my server). There's no perceptible slow-down but I know it's getting too big when, for example, a UA at the bottom of the file doesn't get caught, but the same UA does when I move it up top. So-o-o 'tis time to consolidate (again). Help, please?

Are the following acceptable uses of pipes (which appear as broken lines on this board) --

---
1.) SetEnv

Turn these...

SetEnvIfNoCase User-Agent "^AnsearchBot" keep_out
SetEnvIfNoCase User-Agent "^\ AnsearchBot" keep_out
SetEnvIfNoCase User-Agent "^Bot" keep_out
SetEnvIfNoCase User-Agent "^\ Bot" keep_out
SetEnvIfNoCase User-Agent "^cfetch" keep_out
SetEnvIfNoCase User-Agent "^\ cfetch" keep_out

...into this one-liner?

SetEnvIfNoCase User-Agent "^(Agent¦\ Agent¦AnsearchBot¦\ AnsearchBot¦Bot¦\ Bot¦cfetch¦\ cfetch)" keep_out

...Or this couplet?

SetEnvIfNoCase User-Agent "^(Agent¦AnsearchBot¦Bot¦cfetch)" keep_out
SetEnvIfNoCase User-Agent "\ (Agent¦AnsearchBot¦Bot¦cfetch)" keep_out

(Note: Per sightings, UAs are space-escaped. Ditto logged u/l case but UAs are NC.)

---
2.) HTTP_USER_AGENT

Turn these...

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Seekbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Septera [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*ServerGear [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Sextant [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*SezamFile [OR]

...into this?

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*(Seekbot¦Septera¦ServerGear¦Sextant¦SezamFile) [NC,OR]

---
3.) REMOTE_HOST

Turn these...

RewriteCond %{REMOTE_HOST} \.exabot\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.exava\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.fast\.no$ [OR]
RewriteCond %{REMOTE_HOST} \.fastclick\.net$ [OR]
RewriteCond %{REMOTE_HOST} \.fastsearch\.net$ [OR]

...into this?

RewriteCond %{REMOTE_HOST} \.(exabot\.com¦exava\.com¦fast\.no¦fastclick\.net¦fastsearch\.net)$ [127.0.0.1...] [R,L]

(Note: I use good, old 127.0.0.1 instead of [F] so I can log/track accesses.)

.
4.) RewriteRule

Turn these (& I know they look wacky but they work)...

RewriteRule ^FormMail\.cgi(.*) [127.0.0.1...] [R,L]
RewriteRule ^(.*)/FormMail\.cgi(.*) [127.0.0.1...] [R,L]
RewriteRule ^formmail\.cgi(.*) [127.0.0.1...] [R,L]
RewriteRule ^(.*)/formmail\.cgi(.*) [127.0.0.1...] [R,L]

...into this:

RewriteRule ^(FormMail\.cgi\(.*\)¦\(.*\)/FormMail\.cgi\(.*\)) [127.0.0.1...] [NC,R,L]

(Note: That last one makes my head spin:)

Thanks in advance for your help!

[edited by: Pfui at 8:56 pm (utc) on April 17, 2006]

The Contractor

8:55 pm on Apr 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yep, I use those methods. For others, the ¦ is an unbroken pipe, but the forum software breaks it.

jdMorgan

11:34 pm on Apr 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A few comments:

I know it's getting too big when, for example, a UA at the bottom of the file doesn't get caught, but the same UA does when I move it up top.

I'd look carefully at your code for missing (or extra) [OR] flags, and other coding or logical errors. There's no reason Apache should refuse to run the whole file.

To speed things up, you can put things like specific-page rewrites or redirects at the bottom, and IP and user-agent blocking at the top. Try to handle the most-frequent cases first, and use the [L] flag.

SetEnvIfNoCase User-Agent "^(Agent¦\ Agent¦AnsearchBot¦\ AnsearchBot¦Bot¦\ Bot¦cfetch¦\ cfetch)" keep_out

It's not clear why you'd have both "^Agent" and ^\ Agent" in there, and the same kind of repeat for Ansearchbot. If these user-agents don't always start with the specified character, then leave off the ^ start anchor. To be specific:

^Agent = Must start with "Agent" (may be followed by any number of characters)
Agent$ = Must end with "Agent" (may be preceded by any number of characters)
^Agent$ = Must exactly match "Agent" (no additional characters allowed for a match)
Agent = Must contain "Agent" (may be preceded or followed by any number of characters)

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*(Seekbot¦Septera¦ServerGear¦Sextant¦SezamFile) [NC,OR]

Unless theres is a case where you want to *allow* (for example) "Seekbot" when it is *not* preceeded by "Mazilla", you might as well just leave "^Mozilla.*" off the pattern and use the parenthesized part only.


RewriteCond %{REMOTE_HOST} \.exabot\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.exava\.com$ [OR]
RewriteCond %{REMOTE_HOST} \.fast\.no$ [OR]
RewriteCond %{REMOTE_HOST} \.fastclick\.net$ [OR]
RewriteCond %{REMOTE_HOST} \.fastsearch\.net$ [OR]

...into this?

RewriteCond %{REMOTE_HOST} \.(exabot\.com¦exava\.com¦fast\.no¦fastclick\.net¦fastsearch\.net)$ [127.0.0.1...] [R,L]


If you have a lot of these, you could further group them by TLD... That is, all .coms, .orgs,. nets, etc, and combine them to save on matching multiple TLD strings in each sub-pattern. I don't think I'd bother with the leading "\."

> (Note: I use good, old 127.0.0.1 instead of [F] so I can log/track accesses.)

I don't understand that at all -- 403's are logged, just as redirects would be. Most "nasties" don't follow redirects, and with a 301 or 302 you are not telling them they are unwelcome, so they may never go away.

Jim