Forum Moderators: phranque

Message Too Old, No Replies

.htaccess short circuit

         

braxtonperry

5:08 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



Hello everyone, I am new..

My background is databases and robotics.

I have been reading about all of the wonderfull blocking .htaccess codes.

Does .htaccess have something like pascals short circuit logic where once something is determineable that it falls to the correct clause without computing the rest of the boolean expression?

It seems that either it should or these long or combined statement would be highly innefficient.

If they are innefficient, would someone break them up into smaller groups so they can fail earlier and get rid of the pests?

Then one would move the most offensive bot to the top of the list so the new kid on the block that the hackers are trying gets kicked out quickly and doesn't run the rest of the codes.

Older bots go to the end because they are rarer.

Ideas, comments

jdMorgan

8:07 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's the [S] (skip) flag and the [L] (last rule) flag. But other than that, yes, in relative terms it's more inefficient to run dozens of rules than just one or two. On the other hand, we're letting the computers do the work and not worrying about it too much, because that's what computers are for.

I recommend putting the 'high runners' first in the list, but often trade that off with the ease-of-maintenance advantages of keeping the list in alphanumeric order.

You can have several hundred lines of code in .htaccess, with each rule being processed for each and every HTTP request, and never notice a performance hit until you start getting tens or hundreds of thousand unique visitors per day. I've seen people fret about a few dozen rewrite rules, and then totally ignore the time required to instantiate a PERL or PHP interpreter to process a dynamic URL request...

Jim

braxtonperry

2:26 am on Nov 9, 2005 (gmt 0)

10+ Year Member



Thank you Jim,

Well being an optimist I want to design for 5,000.
I also thought about cannonizing the variable to lower case to not need NC flags.

And for the top five I would put them with their own fail condition to kick them quickly.

jdMorgan

2:55 am on Nov 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In most cases, the problem is not the 'rule matches' case anyway. The common application is to build rewriterules that check for requests that should be denied, rather than for those that are allowed. So, in effect, the only way a request will be served is if it passes through all of the denial tests. So, there's really no way to 'exit quickly,' since a successful request is one that passes all tests (rules).

It is possible to specify what kinds of requests you'll accept, but it is a maintenance nightmare, since new user-agents and new versions of users-agents appear daily, and because it is hard to build a list of 'acceptable' client IP addresses, etc.

You can save some CPU by --for example-- putting all of your images in a separate subdirectory. Then you need only run code to check for hotlinking in that subdirectory, and not for every request to your server. This also allows you to add cache-control headers specific to those images, etc. in the .htaccess file in the image subdirectory itself.

cannonize = canonicalize? Good idea if it is a new site and you haven't got any non-canonical links out on the Web already pointing to your pages. I don't use, publish, or accept URLs with any uppercase characters in them. Because of this, anyone linking to my sites with an incorrect-case URL will get an immediate 404-Not Found response, and hopefully, they'll go back and check their URL before publishing their Web page. Basically, if your server allows uppercase or mixed-case links, then you will get people using them to link to your site. If you don't allow them, then you won't get many of them.

Jim