Welcome to WebmasterWorld Guest from 54.163.100.58

Forum Moderators: Ocean10000 & incrediBILL & phranque

.htaccess file size

How big is too big?

   
1:56 pm on Feb 26, 2004 (gmt 0)

10+ Year Member



I use .htaccess to block IP addresses from known scammers, hackers, harvesters, etc. The list grows daily, and I am worried that my .htaccess file will grow large enough that it will slow the load time for 'regular' visitors. Is this fear justified? Will a browser just 'hang' while scanning through the .htaccess file? How large is too large? Any input would be greatly appreciated. Thanks!
4:56 pm on Feb 26, 2004 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



balinor,

This depends on how many visitors you get, how fast your server's processor is, how much cache it has, and how many other sites are hosted on it and their visitor load.

Bear in mind that internet connection speeds and disk access times are measured in milliseconds, and that code execution times are measured in microseconds. Therefore, it takes a pretty large htaccess file to noticeably slow things down.

Taking a quick look at several of my sites, the .htaccess files run from 10k to 30k, and I notice no difference in access time.

That said, try to classify your "undesireable" visitors into groups: The ones that hit only one page and then never come back, the ones that try to download all of your html pages (or scripted equivalents) only, the ones that try to download your whole site, and the ones that come back day after day after day. Each of them represents a different nuisance or threat level. Simliarly, classify the user-agents you are blocking in the same way.

The point is to not waste .htaccess resources (and your time) on low-level nuisances. Accept that there will always be some minimum level of background noise, and spend your time and resources taking care of the really serious problems, not some one-time dial-up connection looking for e-mail addresses that you've long since hidden away in contact forms.

Some sites offer content such as proprietary images, Web page design templates, etc., that represent a significant loss if downloaded without payment or authorization. Those site should use much more sophisticated content distribution control than can be offered by .htaccess policies.

Another thing to consider is whether your "ban list" is static or whether it is dynamically updated, for example, using key_master's bad-bots script [webmasterworld.com] and/or xlcus's runaway bot catcher [webmasterworld.com]. If it is dynamic, you may not need an exhaustive list of denied user-agents, such as that posted in the close to perfect .htaccess ban list [webmasterworld.com]. Instead, you can remove the user-agents that are rarely seen, but that consistently fall into the trap from the static list, and rely on the script to catch the occasional invsader and repel him.

The approach you select depends heavily on what kind of site(s) you administer, the content of those sites, and just how much "trouble" you see in your logs. As such, each webmaster must choose the "right" balance between control and performance.

Just a few thoughts,
Jim

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month