Forum Moderators: phranque

Message Too Old, No Replies

Rewrite Rules

         

Ken_S

2:09 am on Oct 9, 2012 (gmt 0)

10+ Year Member



Greetings,
Been lurking around here a long time now and have learned a lot. I appreciate the help from the many posts that I have read.

I do a lot of denying but don't understand near enough about Mod_Rewrite yet, still very much a newbie.

(Q) My question is, is it better to have few RewriteRules or have one every few lines of blockage, say one rule for each of the following or combine 2 or 3 under one RewriteRule, or does it make a big difference either way?

##########
IP's
Method
The Request
Request URI
Referrer
User_Agent
###########
I am on shared hosting - so I use only a htaccess file..
Appreciate any guidance,
Ken

phranque

2:28 am on Oct 9, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, Ken!

you should use no fewer nor no more rules than is necessary for your site based on its current requirements and past history.
=8)

there are some general guidelines.
g1smd describes this brilliantly and most succinctly here:
http://www.webmasterworld.com/apache/4503261.htm#msg4503607

are you having a specific problem or just trying to understand what your .htaccess file directives do?

wilderness

2:50 am on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(Q) My question is, is it better to have few RewriteRules or have one every few lines of blockage, say one rule for each of the following or combine 2 or 3 under one RewriteRule, or does it make a big difference either way?


phranque has provided a good link to rule order explanation.

As far as a general answer to your question, it depends entirely on your objective.
You might have a rule with multiple conditions to lessen than chance of innocents.

I have a dozen or so that are focused upon both UA and IP and are custom.
You could also add header checks, page request and more, all the while expanding upon the number of conditions.

Each webmaster must decide what is beneficial or detrimental to their own site (s), there's no one-size-fits-all rule.

Ken_S

3:07 am on Oct 9, 2012 (gmt 0)

10+ Year Member



Hi phranque,
Thank you for the reply & welcome..

At the moment I am trying to get my htaccess file running as smooth as I can. My understanding is at every rewrite rule the processing will stop to make a check- so my thought is, perhaps fewer stops may be better - but again if at a check at a rewrite rule there may be too many conditions to go back and check before proceding to the next rule.
#
I am working on another area also, trying to set blocks for bots that are not from a certain IP range - I keep 403ing BingBot,MsnBot, Google, Slurp,Yandex.. I have the IP ranges correct except maybe the 157.*.*.* of Bing/Msn - I have now placed those ranges at the end of my Mod_Rewrite Module and at the moment haven't had a Bot hit to verify they are making it through - before I had them place at the top of my Module before the denying of IP's, etc
#
This is basically how they are set up..
RewriteCond %{HTTP_USER_AGENT} Bot-Name [NC]
RewriteCond %{REMOTE_ADDR} !^000\.000\.(1[2-9])\.
RewriteRule .* - [F,L]

Thanks again,
Ken

wilderness

3:17 am on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am working on another area also, trying to set blocks for bots that are not from a certain IP range - I keep 403ing BingBot,MsnBot, Google, Slurp,Yandex.. I have the IP ranges correct except maybe the 157.*.*.* of Bing/Msn - I have now placed those ranges at the end of my Mod_Rewrite Module and at the moment haven't had a Bot hit to verify they are making it through - before I had them place at the top of my Module before the denying of IP's, etc


There are numerous examples of this going back some years in the SSID forum.

try a search on fake+google

Ken_S

3:29 am on Oct 9, 2012 (gmt 0)

10+ Year Member



Hey Wilderness,

Thanks for the reply. I just checked my log file, Googlebot made it through, so perhaps placing the allows after all the denies was the right approach - I've never been sure which way was the proper way and again I suppose it could depend on how a server is set up. Can't find that info from my Host and I have searched here at Webmaster for months now and could not find a concrete solution, so that causes me to do a lot of trial and error checking, which I don't mind.

Guess I'll head off to bed now, check the log in the morn to see if the other Bots have made it through as well.

Ken

lucy24

5:00 am on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My understanding is at every rewrite rule the processing will stop to make a check- so my thought is, perhaps fewer stops may be better - but again if at a check at a rewrite rule there may be too many conditions to go back and check before proceding to the next rule.

I don't think there is a practical limit to the number of RewriteConds that a single RewriteRule can support. But there is definitely a limit to how many your brain can keep track of before you start misplacing [OR]s or leaving out a crucial ! or, well, forget what the rule was meant to do ;)

You could, in theory, have an htaccess file whose access rules were based entirely on mod_rewrite. But some things really are cleaner and simpler by other means, notably the unconditional IP block

Deny from 12.34.56.67

For simple User-Agent elements I like to use mod_setenvif:

BrowserMatch "El Stinko" keep_out
BrowserMatch uglymug keep_out
BrowserMatch WorldsDumbestRobot keep_out
and of course
BrowserMatch "-?" keep_out

leading to an all-encompassing

Deny from env=keep_out

Save mod_rewrite for the, ahem, interesting stuff. But if you block by more than one method, like Deny from... and RewriteRule...[F] make sure that each method has a corresponding escape clause that lets your server show your custom 403 page if you've got one. Same for robots.txt and probably a few others.

g1smd

8:07 am on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can have one RewriteCond per RewriteRule or dozens or hundreds of RewriteConds per RewriteRule. Each group is a 'ruleset'.

Break the code up in a way that is logical. I'll have one ruleset that deals with looking at various parameters, another ruleset for various user agents, another for various paths, and so on.

Add a blank line after every RewriteRule to make the code easier to read. Add a comment before each ruleset explaining what it does.

phranque

10:19 am on Oct 9, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



At the moment I am trying to get my htaccess file running as smooth as I can. My understanding is at every rewrite rule the processing will stop to make a check- so my thought is, perhaps fewer stops may be better - but again if at a check at a rewrite rule there may be too many conditions to go back and check before proceding to the next rule.


order your rulesets from most specific to most general and try to delay inefficient rulesets, such as file/directory existence checks.

within each ruleset, the RewriteRule pattern is matched first so no RewriteConds will be tested in that group of the pattern doesn't match and the RewriteCond list processing is stopped after the first one that fails.
making your pattern as specific as possible and ordering your RewriteConds for efficiency can help.

once a RewriteRule with an L flag is executed that's the end of rewrite processing for that request, so the rules that follow don't affect anything.

Ken_S

12:04 pm on Oct 9, 2012 (gmt 0)

10+ Year Member



Lucy

Thank you for your response, I have enjoyed reading your comments. For a couple of years I used mod_setenvif, and all went well, then a server upgrade from 1.3 to 2.? Apache came along and mod_setenvif no longer worked. I finally got a support person to hint that it was best to start using mod_rewrite, so the process of trying to wrap my brain around this module has been a challenge to say the least - at one point I was able to, in part, use mod_security, but that was nixed along with mod_setenvif.

G1smd
Add a blank line after every RewriteRule to make the code easier to read. Add a comment before each ruleset explaining what it does.


Thank you for your help, I have learned to make those comments and spaces.

Phranque

Thank you for your help also. GoogleBot has made it thru, however, BingBot got whacked on it's 157 range, other Bots haven't been by as yet. Something odd, "to me", about that 157 Bing range; I'll have to make some tests and see if I am
missing something there.

Wilderness

There are numerous examples of this going back some years in the SSID forum.

try a search on fake+google


Thank you, I will check it out.

wilderness

1:42 pm on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For a couple of years I used mod_setenvif, and all went well, then a server upgrade from 1.3 to 2.? Apache came along and mod_setenvif no longer worked.


If your using a shared host, in which SetEnvIf does not function?

Locate a NEW host immediately.

I've never had a host in which that module did not function.

Ken_S

2:07 pm on Oct 9, 2012 (gmt 0)

10+ Year Member



If your using a shared host, in which SetEnvIf does not function?

Locate a NEW host immediately.


Wilderness, that has been weighing heavy on my mind of late - there have been several server changes made in the past 3 or so months, believe I will test and find out if SetEnvIf is available & working again.


Htaccess Efficiency

To date I am have the following 3 modules in my htaccess file ...

<IfModule mod_deflate.c>

<IfModule mod_headers.c>

<IfModule mod_rewrite.c>

The above is the order in which they are at present, would there be any positive gain in rearranging the order?

I realize

I do realize that you folks are busy, I have a very small site and at times it just about swamps me, therefore I do really appreciate you taking the time to give me a pointer or two. I do enjoy digging but at times I do need help in pointing to the right place to dig.

Ken

wilderness

2:21 pm on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ken,
One or more of the others will be along later to provide and explanation of NOT using module containers in htaccess, especially on shared hosting.

You either have the functioning module (made available by your host) or you don't.
In most instances, there is NOT any need to use the containers in a shared host htaccess.

Don

lucy24

8:05 pm on Oct 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<IfModule mod_deflate.c>

Anything in this form is boilerplate, "one size fits all". It's for pre-installed htaccess files that have to work anywhere, at any time, even if the site owner never looks at them. But the moment you do start digging into your own htaccess, the first thing you do is dump the envelopes. As wilderness says, either you've got the mod or you don't.

In the case of mod_rewrite: The very idea of a site that didn't have-- or didn't allow access to-- mod_rewrite is enough to make your blood run cold.

at one point I was able to, in part, use mod_security, but that was nixed along with mod_setenvif

Don't know about 1.3 but apparently mod_security is currently a third-party mod, so they may not have bothered to upgrade it :( I know my host has it, because once in a blue moon it slaps a 500-class error on some especially atrocious request. I've never dared try to mess with it myself.

It's possible that mod_setenvif is executing at a slightly different point than before, so you have to fiddle with your rules. This is easy to test even if your host is being coy about it. For example:

Redirect foobar.html /widget.html
RewriteRule foobar widget.html [R]
SetEnvIf Request_URI foobar keep_out

If the door gets slammed in your face (assuming, ahem, a "Deny from env=keep_out" line) you know that setenvif executed before either of the other two got a chance to change your request. If you land on the 404 page, your address bar will tell you which redirect got to you first. Modules execute in a fixed order, no matter how you group the rules. The only thing you can control is ordering within each mod. So it's important to keep each package together.

Ken_S

1:02 am on Oct 10, 2012 (gmt 0)

10+ Year Member



Hi Lucy

Ran a test today and it was showing that SetEnvIf & Mod Security2 are enabled on my Apache 2.? server, I tried SetEnvIf and it worked like it did before all the upgrading took place however, when I tested Mod Security2 I got the 500 - checked the log file it said it wasn't allowed. At one point last year it was said that my host was having problems with letting folks use Mod Security - so apparently after the new servers were in place it was decided not to let it be used as it had before. I am getting used to Mod Rewrite now and do enjoy it, even though, at times I "cross the I and dot the T, which makes me scratch my head.
* Also, I believe I have my original delimma with the Bots getting 403's figured out - in trying to stream line my htaccess I had added a Bot to my user agent that didn't belong there.
So I believe I can quit my head scratchin on todays trials.
I would like to thank everyone again for their help today..
Ken

lucy24

7:55 am on Oct 10, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops, er, what I meant to say was

Redirect /foobar.html /widget.html
RewriteRule foobar something-other-than-widget.html [R]
SetEnvIf Request_URI foobar keep_out


I looked up the config file for MAMP. I haven't laid a finger on it, so it's pretty generic. (It's even got the IfModule envelopes, which seems slightly ridiculous for a config file that's packaged with a specific distibution.) Picking out only the ones that ::cough-cough:: I've kinda vaguely ever heard of:

LoadModule authn_file_module modules/mod_authn_file.so
<etc, etc, 87 more variations on this theme>
LoadModule authz_host_module modules/mod_authz_host.so
<snip>
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule dir_module modules/mod_dir.so
LoadModule speling_module modules/mod_speling.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so

... and then php at the end. They execute in reverse order, so in this version, mod_rewrite happens before mod_alias --which is potentially calamitous, and is why g1 is always warning people not to use both-- which happens before mod_dir (directory slash and index.html) which comes before mod_setenvif et cetera, until you get to the whole slew of auth-whatsit mods at the very top. The ones that I always think of as "core" although technically it's just another mod.

Ken_S

7:39 pm on Oct 11, 2012 (gmt 0)

10+ Year Member



Wilderness

One or more of the others will be along later to provide and explanation of NOT using module containers in htaccess, especially on shared hosting.


After some late-nite candle burning I believe I see a little light on the logic of not using those containers - I have made necessary adjustments in my htaccess & all is well.
Thanks,
Ken