homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
RewriteRules and .htaccess
Beginner advice needed
mdsww




msg:4353047
 12:24 pm on Aug 18, 2011 (gmt 0)

I am new to .htaccess and the RewriteEngine. Here is the .htaccess file I have put together:

RewriteEngine On

# RewriteBase /

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www.)?mydomain.com/.*$ [NC]
RewriteRule \.(gif|jpe?g|png)$ - [F]

RewriteCond %{HTTP_HOST} !^www.mydomain.com$
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.html$ http://www.mydomain.com/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^test/(.*)$ test/rewrite.php?url=$1 [L]
RewriteRule ^(.*)$ rewrite.php?url=$1 [L]


From my understanding the the RewriteCond matches the condition and a rule is executed for that condition.

What I don't understand is the flow or order in which the rules occur.

1) Specifically I am trying to understand if the canonical RewriteRule is performed and then a new request is made to the server with the
"www" prefix appended
- or -
The canonical RewriteRule is performed and in the same process the following rules are also applied at the same time?

2) In the final condition, there are 2 rules which I would like to perform:
a) If the address is
http://www.mydomain.com/test/myfile.php I would like to redirect to the path test/rewrite.php
b) Otherwise all other requests that do not have the
test/ prefix are then redirected to root file rewrite.php
What is happening is that ALL requests are being redirected to the root file
rewrite.php.

Please help me understand how the RewriteEngine can achieve what i am trying to do.

Thanks for any assistance.

 

lucy24




msg:4353165
 4:52 pm on Aug 18, 2011 (gmt 0)

Later on you will get a detailed explanation from g1.

Short version: Every time mod_rewrite encounters an [L], it stops what it's doing. Other than that-- barring icky complicated messes like Chain or Skip-- it works through .htaccess from top to bottom, hitting each rule that applies to it. The [F] is a super-L, meaning in effect "drop dead".

Incidentally, it is safer to say

RewriteCond %{HTTP_REFERER} !^-?$

because most places express blanks as a single - (I have seen a genuine blank referer "" very rarely, and a blank user-agent "" never).

In the final condition, there are 2 rules which I would like to perform:

Uh-oh, potentially fatal error. Rules come before Conditions in processing, although they are physically printed after them. When mod_rewrite meets a rule, it then-and-only-then looks at any preceding conditions: that is, conditions that come before the rule but after all other rules. Yes, that means you are continuously moving 2 steps forward and one step back. Each new Rule requires its own set of Conditions.

g1smd




msg:4353220
 7:42 pm on Aug 18, 2011 (gmt 0)

Redirects cause the browser to make a new request and mod_rewrite processing starts again from the top.

Your non-www to www redirect must be listed AFTER the index redirect otherwise non-www index requests will cause an unwanted two-step redirection chain.

You should escape the periods in the
!^www.example.com$ pattern as well as stopping this rule causing an infinite redirect loop for HTTP/1.0 requests. Use: !^(www\.example\.com)?$.

You should add a plain text comment before each block of code explaining what it does. Add a blank line after each RewriteRule.


Use example.com in the forum to stop URL auto-linking.


Every time mod_rewrite encounters an [L], it stops what it's doing.
It only stops if all the patterns for this rule actually matched.
lucy24




msg:4353233
 8:06 pm on Aug 18, 2011 (gmt 0)

Yes, OK, I'll add it to the boilerplate :P

"If it encounters an [L] at the end of a rule that it has actually executed".

mdsww




msg:4353302
 11:46 pm on Aug 18, 2011 (gmt 0)

Thank you for your explanations on this so far. I can understand that the [L} will result in the the current rule applied as the LAST.

I also now understand that some requests may cause an unwanted multi-step redirection.

Based on the suggestions you have provided here is an updated version of my .htaccess file:

RewriteEngine On

# RewriteBase /

##### Protect images from hotlinking
RewriteCond %{HTTP_REFERER} !^-?$
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.com/.*$ [NC]
RewriteRule \.(gif|jpe?g|png)$ - [F]

##### Resolve index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

##### Resolve canonical domains
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301]

##### Rewrite function
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^test/(.*)$ test/rewrite.php?url=$1 [L]
RewriteRule ^(.*)$ rewrite.php?url=$1 [L]


I still am unclear why the "##### Rewrite function" is actioning the 2nd rule, even though the 1st Rule should be matched.

For example:
If I type into a browser
http://www.example.com/test/file.php the rewrite maps to the rewrite.php?url=$1 file. Although from my understanding I would have thought that the 1st rewrite rule would in fact rewrite to test/rewrite.php?url=$1. Is there 2 rewrites going on here?
g1smd




msg:4353310
 12:17 am on Aug 19, 2011 (gmt 0)

In the hotlinking rule, the
.*$ part of the pattern is redundant and can simply be omitted. You should also escape all literal periods in that pattern (2 to do).

In the index rule, you might consider changing
index\.html to index\.html? (2 to do) so that requests for .htm as well as for .html are both redirected.

In the non-www/www rule you also need the [L] flag, not just the R=301 flag. Additionally, you have missed the ( )? construct I mentioned above. This means that HTTP/1.0 requests, which do not send a HOST header, will be stuck in an infinite redirect loop.

RewriteConds apply only to the very next RewriteRule that follows. Your first rewrite has two preceding conditions. The second rewrite has NO preceding conditions. You MUST duplicate those conditions onto the second rewrite so that both conditions appear before both rules.

With no conditions before the second rewrite, requests that have previously been internally rewritten are rewritten again.

It's also very easy to end up with an infinite rewrite loop for these types of request. Add those conditions to the second rewrite.

Add a blank line after each RewriteRule.

mdsww




msg:4353320
 12:50 am on Aug 19, 2011 (gmt 0)

Thank you for all of your recommendations and suggestions. I am starting to understand, but will just make sure I am on the right track.

Once again here is the updated .htaccess with all alterations:

RewriteEngine On

# RewriteBase /

##### Protect images from hotlinking
RewriteCond %{HTTP_REFERER} !^-?$
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.com/.*$ [NC]

RewriteRule \.(gif|jpe?g|png)$ - [F]

##### Resolve index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

##### Resolve canonical domains
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

##### Rewrite function 1
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^test/(.*)$ test/rewrite.php?url=$1 [L]

##### Rewrite function 2
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ rewrite.php?url=$1 [L]


I have split the Rewrite function into 2 separate conditions, and modified the construct as suggested for the HTTP_HOST condition. I also added L to the canonical rule.

I am aware that I can further optimise each expression, however for the sake of this thread I am more concerned about structure and flow.

Is there anything wrong with the above code snippet now?
Much appreciated.

lucy24




msg:4353332
 2:13 am on Aug 19, 2011 (gmt 0)

Last two rules, as translated into English:

#1: If a request comes in for any nonexistent file anywhere within the /test/ directory, send them to example.com/test/rewrite.php, making "url" + the original request into a new query string (and deleting any previous query).

#2: If a request comes in for any nonexistent file anywhere, send them to example.com/rewrite.php, with the same query-string business.

Whether this works or not will depend entirely on whether the /rewrite.php and /test/rewrite.php files actually exist and that they do what they are intended to do, presumably ending with a true redirect.

You don't want users to type in any old garbage, or click on a hopelessly mangled link, and end up in the right place. (You may think you do, because it's a human-friendly approach, but go look up the phrase Duplicate Content.) Hence the final redirect.

If both types of queries are being shipped off to a rewrite function accompanied by their original request in the form of a query string, why do there need to be two separate rewrite.php files? Can't the same file deal with both? If you're trying to avoid / in the query string, you need to take a few more steps to make sure the request doesn't include one. It isn't enough to say that your site simply doesn't have deeper nests of directories-- because these rewrites are by definition only applied to files that don't exist.

mdsww




msg:4353335
 2:32 am on Aug 19, 2011 (gmt 0)

RewriteConds apply only to the very next RewriteRule that follows.


Take the following sample from an old Joomla .htaccess file:

########## Begin - Joomla! core SEF Section
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
#
########## End - Joomla! core SEF Section


I notice there are 2 rules for the conditions. Can you please explain what happens here?

lucy24




msg:4353350
 3:22 am on Aug 19, 2011 (gmt 0)

Ordinarily, the first thing that happens is you get a lecture about the utterly superfluous (.*) since it means "there could be anything or nothing here and it doesn't matter because we're not going to do anything about it".

The second thing that happens is a prolonged shudder as I look more closely. OK, sit tight:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d


These two often come in a package, and mean "If the user has asked for a page or directory that doesn't exist".

RewriteCond %{REQUEST_URI} !^/index.php

AND the user has not asked for example.com/index.php (this kind of exclusion is crucial to avoid an infinite loop if the rule involves redirecting to some specific file that a user might also have requested in the first place)

RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ [NC]

AND the user has asked for something that ends with anything in the pipe-separated list (case-insensitive):

/ (meaning any correctly entered directory)
.php, .html, .htm, .feed, .pdf, .raw (any marginally competent writer would collapse nos. 2 and 3 into html? )
/ followed by zero or more non-periods (making the first item redundant, since / alone obviously counts as / followed by zero-or-more of anything)

RewriteRule (.*) index.php

RULE: If the user has requested anything whatsoever-- or even nothing-- with any of the ordinary extensions, so long as it isn't specifically index.php, then send them to index.php ... except that we began by saying that this rule only applies if the user has asked for a file that doesn't exist, so if index.php does exist, then...

Help! g1!

... then continue, because there was no [L] or anything else after the preceding rule, so it's more of a "stop along the way and do this before you continue"

RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

Make no changes to the request as it currently exists, but set the value of environmental variable HTTP_AUTHORIZATION to %{HTTP:Authorization} -- and then stop.

Note that this second rule applies whether or not the first rule applied.

mdsww




msg:4353354
 4:07 am on Aug 19, 2011 (gmt 0)

I think you have cleared something up for me Lucy.

Going back to my very first post I pasted the following snippet:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^test/(.*)$ test/rewrite.php?url=$1 [L]
RewriteRule ^(.*)$ rewrite.php?url=$1 [L]


The 2nd rule does not have a condition and therefore will always execute when it is reached in the process. Am I correct in understanding it this way?

The confusing thing here is that if the conditions are met, and the path is
http://www.example.com/test/myfile.php, why do I observe the final page served as being the 2nd rule page (rewrite.php) that processes the request?

Shouldn't the first rule execute and finish as per the L directive? Or does the 1st execute, followed by the 2nd rule?

Hope this makes sense.

g1smd




msg:4353387
 7:36 am on Aug 19, 2011 (gmt 0)

I really wish that had used a recent Joomla file as an example.

The code you actually used as an example is several years old and has many problems.

Compare it with the file that comes with current versions of Joomla and use the old version as a lesson in how NOT to do .htaccess files.

g1smd




msg:4353388
 7:42 am on Aug 19, 2011 (gmt 0)

Going back to your post 4353320.

In the hotlinking rule, the .*$ part of the pattern is redundant and can simply be omitted. You should also escape all literal periods in that pattern (2 to do).

In the index rule, you might consider changing index\.html to index\.html? (2 to do - you missed one) so that requests for .htm as well as for .html are both redirected.

mdsww




msg:4371759
 4:43 am on Oct 7, 2011 (gmt 0)

I have made the changes and have completed the file. It is working well on the server:


RewriteEngine On

RewriteBase /

##### Protect images from hotlinking
RewriteCond %{HTTP_REFERER} !^-?$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com\.au/ [NC]
RewriteRule \.(gif|jpe?g|png)$ - [F]

##### Resolve canonical domains
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

##### Resolve index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?\ HTTP/
RewriteRule ^index\.html?$ http://www.example.com.au/ [R=301,L]

##### Resolve index.php to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com.au/ [R=301,L]

##### Rewrite function 1
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^blog/(.*)$ blog-article.php?url=$1 [L]

##### Rewrite function 2
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ article.php?url=$1 [L]


1. Can I simplify the index.html and index.php redirect rules into one rule?

Is this correct?


##### Resolve index.php, index.html and index.htm to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php|index\.html?)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com.au/ [R=301,L]


2. I also have specific pages which will need to be 301 redirected to a new page. These were originally static html pages. e.g. contact.html, about.html, faq.html etc.... Should I create a separate rule in my htaccess file just for these unique pages (5 in total)?

Does this look like right?


##### Resolve contact.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /contact\.html?\ HTTP/
RewriteRule ^contact\.html?$ http://www.example.com.au/contact-us.html [R=301,L]

##### Resolve about.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /about\.html?\ HTTP/
RewriteRule ^about\.html?$ http://www.example.com.au/about-us.html [R=301,L]


3. Finally if my web page name has a space character in it, how would match this in my expression?

e.g.
"our services.html"
lucy24




msg:4371770
 5:45 am on Oct 7, 2011 (gmt 0)

1. Can I simplify the index.html and index.php redirect rules into one rule?

Sure. Use pipes.

##### Resolve index.php, index.html and index.htm to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php|index\.html?)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com.au/ [R=301,L]

You can make it even a little more concise:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php|html?)\ HTTP/

2. I also have specific pages which will need to be 301 redirected to a new page. These were originally static html pages. e.g. contact.html, about.html, faq.html etc.... Should I create a separate rule in my htaccess file just for these unique pages (5 in total)?

Just five? Sure, make separate rules. Not worth the trouble of routing to a php fix-up file. But note that your two examples can be merged. And the Condition doesn't seem to be necessary, unless there's something I missed in the earlier posts. Are there circumstances where files named contact.html etc. would not be redirected?

RewriteRule ^(contact|about)\.html?$ http://www.example.com.au/$1-us.html [R=301,L]

3. Finally if my web page name has a space character in it, how would match this in my expression?

e.g. "our services.html"

The situation will not arise, because your web page will not have space characters in it. Unless, ahem, you are talking about pre-existing pages that you are now getting rid of forever and ever.

In htaccess-- unlike generic RegEx-- literal spaces have to be escaped, because the space itself has semantic meaning. Just like you did in your %{THE_REQUEST} condition.

So it would be

our\ services\.html

but only if you swear in blood to get rid of any and all inherited pages whose names were formerly in this form. There have been several recent heated discussions about optimal naming; the preference seems to be a hyphen, as in "our-services.html". Me, I'd just use "services" alone, bypassing the whole issue. Unless you've got another page talking about someone else's services ;)

mdsww




msg:4371775
 6:25 am on Oct 7, 2011 (gmt 0)

Hi Lucy,

I am ugrading an old website, hence the space character appearing in the page name. I can assure you I would not do this myself. ;)

Here is my final htaccess file with all of the suggestions and help along the way:


RewriteEngine On

RewriteBase /

##### Protect images from hotlinking
RewriteCond %{HTTP_REFERER} !^-?$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com/ [NC]
RewriteRule \.(gif|jpe?g|png)$ - [F]

##### Resolve canonical domains
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

##### Resolve index html/php page to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php|html?)\ HTTP/

##### Resolve static html pages to static version
RewriteRule ^(about|company|services|support|contact)\.html?$ http://www.example.com/$1-static.html [R=301,L]

##### Rewrite function 1
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^blog/(.*)$ blog-article.php?url=$1 [L]

##### Rewrite function 2
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ article.php?url=$1 [L]


Can anyone see an issue with this file in production?

g1smd




msg:4371776
 7:03 am on Oct 7, 2011 (gmt 0)

You must list the index redirect before the www/non-www redirect otherwise there will be an unwanted multiple step redirection chain for a non-www index request.

I pointed this out in post #4353220 but it was overlooked.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved