homepage Welcome to WebmasterWorld Guest from 54.227.11.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess file size
How big is too big?
balinor




msg:1514353
 1:56 pm on Feb 26, 2004 (gmt 0)

I use .htaccess to block IP addresses from known scammers, hackers, harvesters, etc. The list grows daily, and I am worried that my .htaccess file will grow large enough that it will slow the load time for 'regular' visitors. Is this fear justified? Will a browser just 'hang' while scanning through the .htaccess file? How large is too large? Any input would be greatly appreciated. Thanks!

 

jdMorgan




msg:1514354
 4:56 pm on Feb 26, 2004 (gmt 0)

balinor,

This depends on how many visitors you get, how fast your server's processor is, how much cache it has, and how many other sites are hosted on it and their visitor load.

Bear in mind that internet connection speeds and disk access times are measured in milliseconds, and that code execution times are measured in microseconds. Therefore, it takes a pretty large htaccess file to noticeably slow things down.

Taking a quick look at several of my sites, the .htaccess files run from 10k to 30k, and I notice no difference in access time.

That said, try to classify your "undesireable" visitors into groups: The ones that hit only one page and then never come back, the ones that try to download all of your html pages (or scripted equivalents) only, the ones that try to download your whole site, and the ones that come back day after day after day. Each of them represents a different nuisance or threat level. Simliarly, classify the user-agents you are blocking in the same way.

The point is to not waste .htaccess resources (and your time) on low-level nuisances. Accept that there will always be some minimum level of background noise, and spend your time and resources taking care of the really serious problems, not some one-time dial-up connection looking for e-mail addresses that you've long since hidden away in contact forms.

Some sites offer content such as proprietary images, Web page design templates, etc., that represent a significant loss if downloaded without payment or authorization. Those site should use much more sophisticated content distribution control than can be offered by .htaccess policies.

Another thing to consider is whether your "ban list" is static or whether it is dynamically updated, for example, using key_master's bad-bots script [webmasterworld.com] and/or xlcus's runaway bot catcher [webmasterworld.com]. If it is dynamic, you may not need an exhaustive list of denied user-agents, such as that posted in the close to perfect .htaccess ban list [webmasterworld.com]. Instead, you can remove the user-agents that are rarely seen, but that consistently fall into the trap from the static list, and rely on the script to catch the occasional invsader and repel him.

The approach you select depends heavily on what kind of site(s) you administer, the content of those sites, and just how much "trouble" you see in your logs. As such, each webmaster must choose the "right" balance between control and performance.

Just a few thoughts,
Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved