Forum Moderators: phranque

Message Too Old, No Replies

What to do when your htaccess file grows too large?

I'm not there yet....but....

         

Terabytes

7:09 pm on May 13, 2009 (gmt 0)

10+ Year Member



I use, like most people, a list of denied IP's and bots within my htaccess file....and a few redirects... and a few re-writes... etc.

The currect file is only about 64K at this point, but I see it growing larger as I add future items. Eventually it will begin to degrade the site access...right?

Is there a way to thin out the file once it becomes bloated? Or hope for the best...?

Thanks for your time!
Tera

Samizdata

8:20 pm on May 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



like most people

I would say we were a small minority.

I wouldn't claim to be an expert (particularly in this forum) but offer the following thoughts:

File size isn't everything - a proportion of my .htaccess content consists of commented lines describing what the code is doing. As far as I am aware this has no real impact on performance and I find it essential as a reminder of what the code actually does.

Careful use of pattern matching and regular expressions can also optimize the amount of code used.

I generally use very few file redirects ("cool URLs don't change") so most of my code deals with bot control - I look for common factors (when I have time) and reduce my code accordingly.

I don't think that any answer to "how large is too large" has ever been given in this forum, as much depends on processing power and speed, but your .htaccess is twice the size of mine.

Keep it lean and keep it mean.

...

jdMorgan

10:10 pm on May 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of mine is lean, clean, and mean, and is twice that size (121kB).

My suggestions:

Look for opportunities to "split" the file. For example, if you keep all of your images in a subdirectory, then you can move your anti-hotlinking rule(s) to an .htaccess file in that subdirectory.

As you probably know, I go on and on about not using ".*" patterns, except where it is strictly necessary. An example of the advantages of using more-specific patterns is that if a rule specifies that the filetype must be ".html" to match, then that rule will not be processed any further and none of its RewriteConds will be evaluated (see mod_rewrite documentation) for any requests for resources other than .html pages.

So, for example, instead of checking for a certain query string appended to *any* URL, consider whether it would make more sense to check for it only on .php and/or .pl or .cgi URLs.

I have a lot of access restrictions on certain files near the end of my code. But they are all "pages" and not image or multimedia or CSS files. So right before that restriction code, I have a rule that says, "If the URL-path extension is not blank, .html, or .php, then leave the URL-path alone and quit right here:


RewriteCond !^([^/]+/)*(([^.]*\.)+(html¦php))?$ - [L]

So, split the code into subdirectories where applicable, use RewriteConds or very-specific rule patterns to prevent unnecessary evaluation and execution of RewriteRules and their RewriteConds which are not relevant to the resource being requested, and look for opportunities in the structure of your code to quit early.

If you have a long list of 'banned' IP addresses (e.g. using mod_access "Deny from" directives), consider combining several smaller ranges into one larger range, or removing one or more of the bans if you get at least a little traffic from that range that might make the accompanying abuse tolerable.

Also make full use of regular expressions and local variables. For example:


RewriteCond %{REQUEST_URI} ^/a\.html$ [OR]
RewriteCond %{REQUEST_URI} ^/b\.html$ [OR]
RewriteCond %{REQUEST_URI} ^/c\.html$ [OR]
RewriteCond %{REQUEST_URI} ^/d\.php$ [OR]
RewriteCond %{REQUEST_URI} ^/e\.php$ [OR]
RewriteCond %{REQUEST_URI} ^/f\.php$ [OR]
RewriteCond %{REQUEST_URI} ^/old/
RewriteRule ^(.*)$ /subdir/$1 [L]

Can be rewritten as

RewriteCond $1 ^(a¦b¦c)\.html$ [OR]
RewriteCond $1 ^(d¦e¦f)\.php$ [OR]
RewriteCond $1 ^old/
RewriteRule ^(.*)$ /subdir/$1 [L]

or even as just

RewriteRule ^((a¦b¦c)\.html¦(d¦e¦f)\.php¦old/.*)$ /subdir/$1 [L]

And our famous-but-now-practically-obsolete "close to perfect .htaccess ban list" could be re-coded using a similar technique, moving the %{HTTP_USER_AGENT} value into a local (and shorter-named) variable:


RewriteCond %{HTTP_USER_AGENT} ^(.+)$
RewriteCond %1 ^(BlackWidow¦Crescent¦Disco.?¦ExtractorPro¦HTML.?Works¦Franklin.?Locator) [NC,OR]
RewriteCond %1 ^(Green\ Research¦Harvest¦HLoader¦http.?generic¦Industry.?Program) [NC,OR]
RewriteCond %1 ^(IUPUI.?Research.?Bot¦Mac.?Finder¦NetZIP¦NICErsPRO¦NPBot¦PlantyNet_WebRobot) [NC,OR]
( ... etc.)

If you have a big "pile" of RewriteConds that must be applied to several rules, consider evaluating them and storing the result in an environment variable. Then you can just check that environment variable as a single condition in the following rules, instead of repeating the long list of RewriteConds.

RewriteCond blah-blah
RewriteCond boo-boo
RewriteCond fee-fee
RewriteCond foo-foo
RewriteCond mee-mee
RewriteCond moo-moo
RewriteRule ^ - [E=BigPile:Yes]
#
RewriteCond %{ENV:BigPile} =Yes
RewriteRule ^sumpath /sumnewpath [L]
#
RewriteCond %{ENV:BigPile} =Yes
RewriteRule ^sumuvvapath /sumcoolpath [L]

Just a few tricks to think about...

Jim

Terabytes

1:14 am on May 15, 2009 (gmt 0)

10+ Year Member



Just wanted to say thanks for the incredibly great advise...

I wouldn't even be asking these questions if it weren't for the awesome contributers that have helped so many people here. (myself included)

If it wasn't for your help I'd still be asking "...whats an .htaccess file?"

thanks again!
Tera

jdMorgan

3:31 am on May 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> If it wasn't for your help I'd still be asking "...whats an .htaccess file?"

Yes, and as we've all learned (or will learn), that's a complicated question...

I hang out here because it's a good place to learn new things and new techniques. I have found inspired bits of logic and/or code in even the most badly-broken examples posted here. Many of the questions are about problems I've never faced myself, and don't know the answer to... until after we've all worked through it together, that is.

So thanks for the kind remarks about this forum, but as for myself, I'm here to learn too. ;)

Jim