homepage Welcome to WebmasterWorld Guest from 54.205.106.111
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
What you don't know you need to know...
So you can do what you need to do...
jd01




msg:1495758
 11:27 pm on Apr 29, 2005 (gmt 0)

1. What mod_rewrite does not do:
A. Create anything.
B. Write a 'fake' URL in a browser.
C. Change anything, except the location the request is delivered from, or the location of the information delivered to the page requested.

2. What mod_rewrite does:
A. Delivers a different result when a specific request is made.

EG Someone requests the page yoursite.com/stuff.html

What You Can Do:
You can use mod_rewrite to redirect the browser to any page of your choosing, say: yoursite.com/stuff.php [OR]

You can serve the information from any page of your choosing including: yoursite.com/stuff.php without changing the location in the browser.

What You Cannot Do:
You cannot use mod_rewrite to change the address displayed in the browser... including to a more 'friendly' URL - The location displayed must either be requested or redirected to. The location does not need to contain any information, but must exist in a request.

3. Most common mistake:
People try to go backward. EG They try to put a link to the page where the information is, then try to use mod_rewrite to change the location that appears in the browser.

4. What has to be, before you can use mod_rewrite:
A. Mod_Rewrite the module has to be installed and available on the server.
B. You must have AllowOverride set to FileInfo, or higher (All, etc.).
C. You must be able to follow sym links, usually Options +FollowSymLinks either in the httpd.conf (server configuration) or in the .htaccess file itself.
D. You must precede your rules/conditions with RewriteEngine on

5. When using conditions:
A. Know a condition(s) will only effect the immediately following rule(s).
B. Know a condition(s) will only be read after a rule matches the pattern of a request. If the rule does not match, no condition will ever be checked.

6. When using variables in rules, they are always designated by ().
A. Variables in a rule can be used either in the right side of the rule (new location), or in any conditions.
B. They are gathered in the order they appear, and are retrieved by preceding the number of the variable with a $.
EG RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-$2

Result: to-use-variables-type-var1-and-var2

7.When using variables from conditions they are always designated by ()
A. Variables from a condition can be used only on the right side of the rewrite rule.
B. They are gathered in the order they appear, and are retrieved by preceding the number of the variable with a %.

EG RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^no-var/no-var/no-var$ /to-use-variables-type-%1-and-%2

Result: to-use-variables-type-var1-and-var2

* The only exception to this is you can also use the %{CONDITION_STUFF} in the rule, but it must appear exactly as it does in the condition:

EG RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^no-var/no-var/no-var$ /%{CONDITION_STUFF}-to-use-variables-type-%1-and-%2

8. When using regular expressions and conditions, it is much easier to create an infinite loop... be careful and test all uses before installing in the main directory.

9. If you are using .htaccess, the file must be in a position where the requested file you are wanting to rewrite is located, or 'above'.

From the root yoursite.com, you can write rules for any location in your site.

From the directory yoursite.com/stuff/, you can only affect files and pages that are farther 'in' '/stuff/' or 'below' the directory '/stuff/'.

So if your .htaccess file is in the location yoursite.com/stuff/, and you also have the location yoursite.com/more-stuff/, your rules will only effect the directory '/stuff/', but '/more-stuff/' will function as usual.

EG
yoursite.com/stuff/anything-at-all can be affected,
yoursite.com/more-stuff/anything-at-all will not ever be effected

10. The location of your .htaccess file will effect the left side of the rules involved.

In the home or root directory, you would need the full path (from where you are) to rewrite the page /stuff/yourpage.html. The rule may look like this:

RewriteRule ^stuff/yourpage\.html$ /more-stuff/yourotherpage.html

In the folder or directory 'stuff' you would need the path to the folder or page from where you are, so to rewrite the same location as before from this location your rule might look like this:

RewriteRule ^yourpage\.html$ /more-stuff/yourotherpage.html

* Please notice the right side of the rule, or new location is the same in both cases, because any rewrite starts over from the beginning like it is a new request, so you must have the full path to the 'new' file in the 'new' location'

** Please also notice the left side of the rule does not start with /, while the right side does.

Hopefully this will add some basic clarity to the use of mod_rewrite for those who are new to the subject. I will add more as time allows.

Justin

 

skymanhonor




msg:1495759
 2:40 pm on Apr 30, 2005 (gmt 0)

Thanks! I had some confusion around that (although I did not know it until just reading your post).

Imaster




msg:1495760
 1:29 pm on May 1, 2005 (gmt 0)

Great post, cleared up some confusion that I had regarding this issue!

henry0




msg:1495761
 7:39 pm on May 1, 2005 (gmt 0)

Justin, I second IMaster
very good and concise summary

Henry

Do you have a regex in the making :)

vincevincevince




msg:1495762
 8:16 pm on May 1, 2005 (gmt 0)

Great post - and it covers all the main points of a confusing module.

However - what would be great to clear up is the handling of headers. Just when the extra headers are output, and which ones you should put back in.

Having used mod_rewrite quite a bit, I often find myself using 404 handling functions because I cannot be certain that the mod_rewrite will be totally undetected by search engines, etc.

jd01




msg:1495763
 8:44 pm on May 1, 2005 (gmt 0)

Just wanted to say thanks for all the positive feed back... I will work on answering some of the requests in another post later this week.

Justin

zigx




msg:1495764
 11:05 pm on May 2, 2005 (gmt 0)

hey.. i dont post much, but this is a really great starter for people.

THanks for your time!

jdMorgan




msg:1495765
 3:22 am on May 3, 2005 (gmt 0)

vincevincevince,

> Having used mod_rewrite quite a bit, I often find myself using 404 handling functions because I cannot be certain that the mod_rewrite will be totally undetected by search engines, etc.

I don't want to take this thread off-topic, but it is far more dangerous to your search engine rankings to use a 404 handler than it is to use internal rewrites. Should you wish to discuss this topic, please feel welcome to start another thread.

Jim

EBear




msg:1495766
 10:32 am on May 3, 2005 (gmt 0)

Great post, JD. Immediately flagged. Thanks.

elguiri




msg:1495767
 3:23 pm on May 3, 2005 (gmt 0)

Nice one Justin. Very useful.

Just as a cross-reference, I started a related thread a couple of weeks ago on ISAPI rewrite for the IIS world at this address:

[webmasterworld.com...]

The motivation was different - I was needing help rather than offering it - but it'll help for someone needing a fast start.

Oh, yes, and regex help always welcome!

bostongio




msg:1495768
 8:14 pm on May 3, 2005 (gmt 0)

Actually, of course mod_rewrite can change what is shown in the browser URL line. I use mod_rewrite to do just that all the time. Putting [R] at the end of a rewriterule line will redirect folks to the "new" URL.

And, no, the actual file doesn't have to exist in order to write a mod_rewrite rule to direct someone to it. A perfect example of this is Wordpress, which uses mod_rewrite to make it seem like every blog page is an actual page (like /wordpress/archives/04/05/05/) when it is, in reality, just a database query (like index.php?query=this&date=040505 ).

Putting [QSA] at the end of rewriterule line will engage this behavior and allows you to completely "hide" the fact that your site is a database-driven site.

jd01




msg:1495769
 9:29 pm on May 3, 2005 (gmt 0)

bostongio:

What You Can Do:
You can use mod_rewrite to redirect the browser to any page of your choosing, say: yoursite.com/stuff.php [OR]

>> Actually, of course mod_rewrite can change what is shown in the browser URL line. I use mod_rewrite to do just that all the time. Putting [R] at the end of a rewriterule line will redirect folks to the "new" URL.

You can serve the information from any page of your choosing including: yoursite.com/stuff.php without changing the location in the browser.

>> And, no, the actual file doesn't have to exist in order to write a mod_rewrite rule to direct someone to it.

The location displayed (added: in the browser) must either be requested or redirected to.

Please read and understand what is being said before you confuse people.

Somehow I'm sure Jim would have corrected any errors I made in my post as it is designed to be basic, and leading people in the wrong direction from the start would not be something he would allow to happen.

Justin

jd01




msg:1495770
 11:52 pm on May 3, 2005 (gmt 0)

Well, I needed to free my brain for a while today, so here are the basics of regular expressions

I guess I should start by describing a regular expression. (They aren't too scary once you get to know them.) A regular expression is basically a small piece of code that checks for patterns. The pattern can range from a single character that matches to absolutely everything.

Regular Expression Pre-qualifier... these definitions are how regular expressions are generally used in .htaccess files and though most definitions will be applicable globally, there are some that may not.

There are some predefined 'terms' in regular expressions to make your life easier. (At least, that are supposed to make your life easier.) Here is a short list, with what each does in the mod_rewrite setting.

[ ] enclose the expression or a portion of the expression. (Used for determining the characters, or range of characters to be matched.)

letter-letter (EG [a-z] matches any single lowercase alphabetical character in the range of a to z), so [c-e] will match any single character that is the lowercase letter c, d, or e.

LETTER-LETTER (EG [A-Z] matches any single capital alphabetical character in the range of A to Z), so [C-E] will match any single character that is the capital letter C, D, or E.

number-number (EG [0-9] matches any single number in the range of 0 to 9), so [4-6] would match any single number 4, 5, or 6.

^ has two purposes, when used inside of [ ] it desingates 'not'. (EG [^0-9] would match any character that is not 0 to 9 and [^abc] would match any character that is not a lowercase a, b, or c.) When used in mod_rewrite it also designates the begining of a 'line'.

It is very important to understand and remember [dog] does not match the word 'dog', it matches any individual lowercase letter d, o, or g anywhere in the comparison. In the same way, [^dog] does not exclude the word 'dog' from matching, it excludes the lowercase letters d, o, or g from matching individually.

To match a 'word' or a group of characters in order, you need to use () so (dog) would match the word dog, and not d, o, or g as a single character.

.(dot) matches any single character, except the ending of a line.

+ matches 1 or more of the characters or set of characters immediately before it. (EG a+ would match the lowercase letter 'a' 1 or more times, while [a-z]+ would match 1 or more lowercase letters from 'a to z'.)

? matches 0 or 1 of the characters or set of characters immediately before it. (EG a? would match the lowercase letter 'a' 0 or 1 time, while [a-z]? would match any lowercase letter from 'a to z' 0 or 1 time.)

* matches anything in the string immediately preceding it as many times as it can, but is much less efficient than +, so should only be used if absolutely necessary. (There is not room here for a detailed explaination, just trust me it is not efficient.)

These are the basic building blocks of regular expressions as used in .htaccess and associated with mod_rewrite. By themselves, they do little, but when you put them together, they become very powerful.

Along with regular expressions, mod_rewrite allows for the use of special characters. It's a good thing to understand what these are before you begin writing rules. (Mainly because you need one or more of them in almost every rule.)

RewriteRule tells the server to interpret the following information as a rule.

RewriteCond tells the server to interpret the following information as a condtion of the rule(s) that are immediately after it.

^ defines the begining of a 'line' (starting anchor). Remember, ^ also designates 'not' in a regular expression, so please don't get confused.

( ) creates a variable to be stored and possibly used later, and is also used to group text.

$ defines the ending of a 'line' (ending anchor), and also defines a variable that comes from the RewriteRule (used for variables on the right side of the equasion or to match a variable from the rule in a condition, see example below).

% defines a variable that comes from a rewrite condition. (used for variables on the right side of the equasion only, see example below)

* The right side of the equasion is everything that follows the $ in a RewriteRule.

Examples: All variables are given a number according to the order they appear, the following rule and condition each have two variables, defined by parenthesis, so to use them you would put them where you need them in the results:
(the '-' is for spacing only to make the line more readable, and is not necessary to use variables.)

RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-$2
The final result would look like this:
to-use-variables-type-var1-and-var2

RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^no-var/no-var/no-var$ /to-use-variables-type-%1-and-%2
The final result would look like this:
to-use-variables-type-var1-and-var2

To use a combination of the Condition and Rule Variables
RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-%2-$2
The final result would look like this:
to-use-variables-type-var1-and-var2-var2

The only exception to the above examples is, you can also use the %{CONDITION_STUFF} in the right side of a rule, but it must appear exactly as in the condition:
RewriteRule ^(var1)/no-var/(var2)$ /type-%{CONDITION_STUFF}

¦(bar) stands for 'or', normally used with text or expressions grouped with parenthesis (EG (with¦without) matches the string 'with' or the string 'without'. Keep in mind since these are inside parenthesis, the match is stored as a variable.)

\ is called an escaping character, this removes the function from a 'special character' (EG if you needed to match index.php?, which has both a .(dot) and a?, you would have to 'escape' the special characters .(dot) and? with a \ to remove their 'special' value it looks like this: index\.php\?)

! is like the ^ in a regular expression and stands for 'not', but can only be used at the beginning of a rule or condition, not in the middle.

- on the right side of the equasion stands for 'No Rewrite.' (It is often used in conjunction with a condition to check and see if a file or directory exists.)

Mod_Rewrite Directives for URL Redirection

Directives, in mod_rewrite are what give you the control of the response sent by the server when a specific URL is requested. They are an integral part of the rule writing process, because they designate any special instructions that might be needed. (EG If I want to tell everyone a page is moved permanently, I can add R=301 to my rule and they will know.)

Directives follow the rule and the most often used, are enclosed with [ ] (Not all directives are covered here, but the main and widely used ones are.)

[R] stands for redirect. The default is 302, temporarily moved. This can be set to any number between 300 and 400, by entering it as [R=301] or [R=YourNumberHere], but 301 (permanently moved) and 302 (temporarily moved) are the most common.

(If you just use [R] this will work, and defaults to 302, or temporarily moved)

** Do not use this 'flag' or directive if you are trying to have a 'silent' redirect.

[F] stands for forbiden. Any URL or file that matches the rule (and condition(s) if present) will return FORBIDEN to anyone who tries to access them. (Useful for files that you would like to keep private, or you do not want indexed prior to 'going live' with them.)

[G] stands for gone. (It's like Not Found, only different.) Not recommended for use yet, this is a newer rule/message (410 code) and many browsers and user-agents, like googlebot do not understand them yet.

[P] stands for proxy. This creates a type of 'silent redirect' for files or pages that are not actually part of your site and can be used to serve pages from a different host, as though they were part of your site. (DO NOT mess with copywritten material, some of us get very upset.)

[NC] stands for 'No Case' as applied to letters, so if you use this on a rule, MYsite.com, will match mysite.com... even though they are not the same. (This can also be used with regular expressions, so instead of [a-zA-Z], you can use [a-z] and [NC] at the end of the rule for the same effect.)

[QSA] stands for Query String Append. This means the 'query string' (stuff after the?) should be passed from the original URL (the one we are rewriting) to the new URL.

[L] stands for last rule. As soon as this 'flag' or directive is read, no other rules are processed. (Every rule should contain this flag, until you know exactly what you are doing.)

In an attempt to put together regular expressions and mod_rewrite special characters here are some examples of what they do:

Goal: to match any lowercase words, or group of letters:
Possible Matches: lfie, page, site, or information
Expression: [a-z]+
Explaination: [a-z] matches any single letter. + matches 1 or more of the previous character or string of characters. When you put the two together you have a regular expression that matches any single letter from a to z over and over, until it runs into a character that is not a letter.

Goal: to match any words, or groups of letters, and store them in a variable:
Possible Matches: lfie, Page, site, or InforMation
Expression: ([a-z]+) [NC]
Explaination: Same as above with the addition of () and [NC]. In mod_rewrite, () creates a single variable out of the regular expression, so the word matched is now in a variable. [NC] stands for 'No Case' (from mod_rewrite) makes it so the regular expression or regular text strings, match both upper and lowercase letters, so with this expression you can match any single word.

Goal: to match any word, or group of letters, then any single number, and store them in separate variables:
Possible Matches: lfie1, Page2, site6, or InforMation9
Expression: ([a-z]+)([0-9]) [NC]
Explaination: Same as above, except notice there is no + in the number expression. This way only a single number will match.

Goal: to match any word, or group of letters, then any single number, and store them in the same variable:
Possible Matches: lfie1, Page2, site6, or InforMation9
Expression: ([a-z]+[0-9]) [NC]
Explaination: Same as above, except notice the plus is immediately following (no space) the [a-z], but before the [0-9] (again no space), so the + affects the [a-z], but not the [0-9].

Goal: to match any word, or group of letters, then any group of numbers, and store them in the same variable:
Possible Matches: lfie11, Page2, site642, or InforMation9987653
Expression: ([a-z]+[0-9]+) [NC]
Explaination: Same as above with the addition of a + immediately following to the numerical expression to match 1 or more numbers instead of only 1.

Goal: to match any word, or group of letters, any group of numbers, and any random letters and numbers, which might or might not be mixed together:
Possible Matches: 11, gPaE, s17ite642, or 2CreateInfo4UisCool
Expression: ([a-z0-9]+) [NC]
Explaination: the change here is to the regular expression grouping. Putting a-z and 0-9 in the same grouping followed by [NC] matches any combination of letters and numbers.

Goal: to match any word, or group of letters, then a single /, then any group of numbers, and store only the numbers in a variable.
Possible Matches: lfie/10, gPaE/1, site/642, or CreateInfoUisCool/2474890
Expression: [a-z]+/([0-9]+) [NC]
Explanation: Using the [a-z]+ without () matches the letters as usual. By putting the / outside of any expression, the only thing that will match is the exact character of /. Then using the ([0-9]+) again, stores any group of numbers in a variable.

Goal: to match anything before the / and store it in a variable, then match anything after the / and store it in a separate variable:
Possible Matches: lfie/10.html, gP..aE/1page_two.file, si-te/642-your-site, or
CreateInfo/245390.php
Expression: ([^/]+)/(.+)
Explaination: Using two new forms of regular expressions, this is actually easier than it may seem. Making use of the ^(not) character, matches anything that is not a / and the () again save it in a variable. Then using the same form as above, the single, exact character of / is matched. Finally, the .(dot) character is used, because it matches any single character that is not the end of a line, and when combined with the + character, matches anything up to a line break. Once again () are used to create the variable. *Also, notice the use of a 'catch-alls' eliminates the need for the [NC] 'flag' of mod_rewrite.

Justin

Please, note this is copywritten material and although a soft license is granted to WebmasterWorld, all other duplication is prohibited.

abates




msg:1495771
 1:30 am on May 4, 2005 (gmt 0)

Any idea how evalutation order for RewriteCond works?

E.G. If I have:
RewriteCond A
RewriteCond B [OR]
RewriteCond C
RewriteCond D
RewriteRule Foo

Is the complete condition ((A and B) or (C and D)) or is it (A and (B or C) and D), or possibly (((A and B) or C) and D)?

I'm guessing the third option, but it's never been clear to me.

jdMorgan




msg:1495772
 3:11 am on May 4, 2005 (gmt 0)

abates,

The [OR] is referred to in the documentation as a "local OR" so it operates between B and C in your example:
A & ( B ¦ C ) & D

A note on the above discussion: [QSA] is only needed if you wish to append new information to an existing query string. If no "?" appears in the substitution URL, then [QSA] is not needed.

Jim

henry0




msg:1495773
 11:24 am on May 4, 2005 (gmt 0)

Justin,
Great piece!
I would like saying even more understandable than the first one.
Even I can comprehend it :)

The untold influence of such posts (like those in the “bag o‘ tricks”) is that it encourages WebmasterWorld members to create a tutorial displaying fluency on one’s forte

Thanks
Henry

PS the forum's script uses regex (possibly) to rewrite some terms
for example I wrote only the short of webmasterworld word (I cannot even duplicate it, W as whiskey, M as Mike and W as whiskey) and the result came as webmasterworld in its whole spelling.

gzip




msg:1495774
 8:40 pm on May 4, 2005 (gmt 0)

Just for clarification,
* matches the character(s) before it 0 or more times (as opposed to ? [0 or 1 time] or + [1 or more times]).

jd01




msg:1495775
 6:47 am on May 5, 2005 (gmt 0)

gzip,

Thanks for the clarification, I will remember to edit that on the original.

Justin

solex16




msg:1495776
 11:45 am on May 8, 2005 (gmt 0)

I have mod_rewrite working well but would still like the url to be the pre-redirected string,
ie.
I have
www.mysite.com/categoryname/productnme
redirecting to
www.mysite.com/product.php?cat=catid&prod=prodid

All my internal links are now well optimised but I would like my future partners to be able to deeplink to my pages by copying the url they need in the address bar for the page they want to link to.

Is there a way of keeping the url to how it was before the redirection?

solex16




msg:1495777
 3:05 pm on May 8, 2005 (gmt 0)

Having looked into this further it seems that Firefox will show the unredirected url in the address bar only if it is not a 301 (permanent redirect, i.e. flag set to [r=301])
Whereas IE always shows the unredirected url

Just need to inform partners to use IE when copying the url to create links as would need to set this 301 flag to help convince the bots

jdMorgan




msg:1495778
 3:43 pm on May 8, 2005 (gmt 0)

Firefox versus IE has nothing to do with this. The definition of a redirect is that the address bar will change; You are using a server response -- a message sent by the server to the browser -- that redirects the browser. In order to avoid this, you simply use an internal rewrite -- as opposed to a redirect. An internal rewrite only changes the server file-path associated with a requested URL, and is never seen by the browser.

The reason you see IE keeping the original URL is probably that you did not flush your temporary internet files after changing your server-side code. If you do not flush your cache prior to testing, then the request will be served from the browser's local cache, and no request will be sent to the server. If a request is not sent to the server, then server-side code such as mod_rewrite can have no effect on the browser; it just displays its old copy of the page.

If you follow the best-practice of flushing your browser cache(s) before testing any change to server-side code, you will see all browsers behaving identically for a redirect.

Jim

solex16




msg:1495779
 4:05 pm on May 8, 2005 (gmt 0)

Yes, Jim
Thanks for that, even though I have IE set to
'check for newer versions on every visit'
I was still seeing the cache..

So the problem remains as to how to show the search-friendly URLs in the address bar whilst using the R=301 flag,
any ideas?

jd01




msg:1495780
 5:45 pm on May 8, 2005 (gmt 0)

Hi solex16,

The reality is we have no idea what you are doing now that is producing the results, so the question is impossible to answer.

Are you using RewriteRules, Redirect or RedirectMatch?
Where are the rules located .htaccess, or httpd.conf?
What exactly is happening now that you would like to have different? (I am guessing the external redirection)
What have you changed to try to solve this?
What part of that didn't work and why? (*Hint: logs, look for lines that contain 404, 500, 501, etc.)
What does your actual file look like now? (IOW generalize and post a portion of your code.)
Anything else that has to do with this the navigation of this portion of your site.

Maybe, you could answer the questions above and start another thread, so we can focus on your file?

I am sure we will be able to help you find what you need...

Sorry, if this seems a bit much for what should be a simple change. Keep in mind anything you create/use to do this has to be a perfect match, and without knowing what you are doing now, and how your files are set up, there is no way to answer your question. It may very well be you will get a one or two line response, of 'change this to...', but as soon as it gets more involved than that, all the other information is necessary.

Hope this helps.

Justin

solex16




msg:1495781
 9:19 pm on May 8, 2005 (gmt 0)

OK JD, thanks for the response..
as mentioned earlier
I have rewritten all my links to something like
www.mysite.com/categoryname/productname

and then am have used RewriteRules in an htaccess file (placed in the root directory) so that they find the original pages, located at
www.mysite.com/product.php?cat=catid&prod=prodid

The problem is if I use the R=301 flag then the less friendly URLs appear in the browser.

If I leave out the 301 flag then all is cool, except that I am worried the bots will not treat it as a permanent re-direct...

Need I be worried about this?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved