-- Search Engine Spider and User Agent Identification
---- .htaccess BadBot Blocker
yaimapitu - 1:43 am on Aug 4, 2013 (gmt 0)
This looks like a good opportunity to review a few things and revise the ".htaccess" files, based on a better understanding. :)
Any given robot has one correctly cased form of its name. Only that form should be given a pass.
Makes sense... this is on the "to do" list now...
The [G] flag carries an implied [L].
Good to know. Putting an [L] in strategic places is one of the habits I picked up by looking at how others did it (I've learned everything that way, never read the documentation, sorry) - just yesterday I learned that [F] also implies [L]. This is now on the "to do"list for the next update.
I prefer to constrain my access-control RewriteRules to requests in the form (\.html|/|^)$
Isn't .? the most efficient form?
Cases of robots walking in off the street and making "cold" requests for non-page files when they haven't already got the page are so rare that it isn't worth making the server stop and evaluate every single request.
Hm, I get tons of requests by shady bots for files that are associated with assumed blogs. But any changes i might make depend on the answer to the last question.
Just start your RewriteRules with an all-encompassing
Sounds like a good idea - there's much patchwork in these ".htaccess" files that has accumulated over the years - maybe doing a complete rewrite fromscratch is not a bad idea...
But in practice I hardly ever use mod_rewrite for access control. Flies-with-an-elephant-rifle sort of thing. Instead it's mod_authz-thingummy alone for IP-based blocks; mod_setenvif leading to "Deny from" for simple UA checks.
Which access control method is the most efficient (in terms of server resources and time)? Knowing this would help me determine in which order to apply certain rules and which ".htaccess" file to put on which level in the subdirectory structure.
I need to mention here that my sites use a subdirectory structure with several ".htaccess" files, each on a different level. Access control via "Allow/Deny" (domains and ip blocks) happens on one level and "RewriteRule" on another level.