Forum Moderators: phranque
My background is databases and robotics.
I have been reading about all of the wonderfull blocking .htaccess codes.
Does .htaccess have something like pascals short circuit logic where once something is determineable that it falls to the correct clause without computing the rest of the boolean expression?
It seems that either it should or these long or combined statement would be highly innefficient.
If they are innefficient, would someone break them up into smaller groups so they can fail earlier and get rid of the pests?
Then one would move the most offensive bot to the top of the list so the new kid on the block that the hackers are trying gets kicked out quickly and doesn't run the rest of the codes.
Older bots go to the end because they are rarer.
Ideas, comments
I recommend putting the 'high runners' first in the list, but often trade that off with the ease-of-maintenance advantages of keeping the list in alphanumeric order.
You can have several hundred lines of code in .htaccess, with each rule being processed for each and every HTTP request, and never notice a performance hit until you start getting tens or hundreds of thousand unique visitors per day. I've seen people fret about a few dozen rewrite rules, and then totally ignore the time required to instantiate a PERL or PHP interpreter to process a dynamic URL request...
Jim
It is possible to specify what kinds of requests you'll accept, but it is a maintenance nightmare, since new user-agents and new versions of users-agents appear daily, and because it is hard to build a list of 'acceptable' client IP addresses, etc.
You can save some CPU by --for example-- putting all of your images in a separate subdirectory. Then you need only run code to check for hotlinking in that subdirectory, and not for every request to your server. This also allows you to add cache-control headers specific to those images, etc. in the .htaccess file in the image subdirectory itself.
cannonize = canonicalize? Good idea if it is a new site and you haven't got any non-canonical links out on the Web already pointing to your pages. I don't use, publish, or accept URLs with any uppercase characters in them. Because of this, anyone linking to my sites with an incorrect-case URL will get an immediate 404-Not Found response, and hopefully, they'll go back and check their URL before publishing their Web page. Basically, if your server allows uppercase or mixed-case links, then you will get people using them to link to your site. If you don't allow them, then you won't get many of them.
Jim