homepage Welcome to WebmasterWorld Guest from 50.19.199.154
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Questions about .htaccess mod rewrite
Sylver




msg:4560721
 7:12 pm on Apr 2, 2013 (gmt 0)

I just spent the whole day trying to redirect the old design of my website to the new one.

Essentially, it means going from this:

website/filename.extension


to

website/folder/index.php?pge=filename


It looked like an easy task, but after over a hundred attempts, taking my own website down in the process with 505s and redirection loops (some with 301 codes because I am an idiot, apparently), I am beginning to feel slightly frustrated.

Mod_rewrite is activated (it did create infinite loops and even managed to redirect a few things here and there).

Here is one of the many attempts:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^website&TDL [NC]
RewriteRule ^(.*\.?)website&TDL/(.*)\.php $1website&TDL/_YT/index.php?pge=$2 [L,R=301]


Of course, i would be very happy if someone could help me write a proper rewrite rule, but failing that, here are a bunch of questions about mod_rewrite:

1. RewriteBase:
"Sets the base URL for per-directory rewrites". What the F does that mean?
If should I set RewriteBase to the website url? The website url + the new folder where the content is located? The relative path to the old content? the path to the new content? the full physical path?
I am at a loss understanding how this works.

2. Separators:
Spaces or tabs or doesn't matter? I assume it doesn't matter, but I am not sure of anything anymore.

3. mod_rewrite version
Does it make a difference what version of mod_write is running on the server? (PHP runs as CGI so I can't check directly, but I can ask the host).

4. In RewriteRule, is there anyway to see the uri before the pattern is applied?
"Pattern is a perl compatible regular expression, which is applied to the current URL. ``Current'' means the value of the URL when this rule is applied. This may not be the originally requested URL, which may already have matched a previous rule, and have been altered."
What is the format of this URL (assuming no rules have been matched yet)? Is it the website address + path? only the path? The relative path of the resource? the resource name only? the full physical path? Does it change depending on what RewriteCond statements are before the RewriteRule?

5. At this point, I am considering replacing the content of every existing old page with a php redirect.
Is there any downside to that solution? (Apart from having to manually change the content of 200+ files, and presumably a slight performance hit because the entrance page has to be loaded before the redirect).

Thanks a lot for your help.

 

lucy24




msg:4560803
 11:59 pm on Apr 2, 2013 (gmt 0)

Short answer: Why would you want to redirect from a pretty-good URL to a definitely-worse URL?



Long answer:
Of course, i would be very happy if someone could help me write a proper rewrite rule

I'll bet you would, so I'm glad you know it ain't happening :)

1. RewriteBase:
"Sets the base URL for per-directory rewrites". What the F does that mean?
If should I set RewriteBase to the website url? The website url + the new folder where the content is located? The relative path to the old content? the path to the new content? the full physical path?
I am at a loss understanding how this works.

Forget you ever heard the word "RewriteBase". The RewriteBase is attached to the front of the target WHEN AND ONLY WHEN the target you give begins with something other than a / (or full protocol-plus-domain if you're redirecting). Since you will never do this, the RewriteBase doesn't matter. Your target will begin with either / plus rest of path if it's a rewrite, or with http: et cetera if it's a redirect.

2. Separators:
Spaces or tabs or doesn't matter? I assume it doesn't matter, but I am not sure of anything anymore.

Yawp! I had no idea you even COULD use a tab instead of a space. Use a single space.

3. mod_rewrite version
Does it make a difference what version of mod_write is running on the server? (PHP runs as CGI so I can't check directly, but I can ask the host).

I do not want to even consider the possibility that your host is running mod_rewrite from one Apache version with the core from a different Apache version. Yes, later versions have more features. But let's assume for the sake of discussion that you're on 2.2. If it's really 1.something, change hosts. If it's 2.4, good for you, but afaik nothing was removed between 2.2 and 2.4, so just keep it basic and you'll be fine either way.

4. In RewriteRule, is there anyway to see the uri before the pattern is applied?
"Pattern is a perl compatible regular expression, which is applied to the current URL. ``Current'' means the value of the URL when this rule is applied. This may not be the originally requested URL, which may already have matched a previous rule, and have been altered."
What is the format of this URL (assuming no rules have been matched yet)? Is it the website address + path? only the path? The relative path of the resource? the resource name only? the full physical path? Does it change depending on what RewriteCond statements are before the RewriteRule?

The pattern is the "path" part of the URL. Minus the protocol-plus-domain-plus-port at one end, minus the query string if any at the other. In htaccess the pattern begins with the directory name; in config files it begins with a slash.

The words "directory name" are not technically correct
:: looking uneasily at assorted grownups ::
but for present purposes it's the easiest way to understand it.


5. At this point, I am considering replacing the content of every existing old page with a php redirect.
Is there any downside to that solution? (Apart from having to manually change the content of 200+ files, and presumably a slight performance hit because the entrance page has to be loaded before the redirect).

Sounds like enough of a downside to me. Why do it page by page in php when you can do it all in one fell swoop in mod_rewrite?



Now then. When you say
going from:
website/filename.extension
to
website/folder/index.php?pge=filename


please, please say that what you really meant was:

Any requests for
filename.extension
may or may not first be REDIRECTED to
filename
(browser's address bar changes)
and then-- with or without the preceding redirect-- your server will quietly serve content that really comes from
folder/index.php?page=filename
(address bar doesn't change)

Your post doesn't say where the name of /folder/ comes from. Is it always the same or does it depend on /filename ?

Sylver




msg:4560853
 6:38 am on Apr 3, 2013 (gmt 0)

Thanks for the extensive reply. Of course I would love to keep my old, already indexed URLs and simply serve the new design, but I have no clue how to do it.

and then-- with or without the preceding redirect-- your server will quietly serve content that really comes from
folder/index.php?page=filename


That would be absolutely brilliant... How do I do that?

To clarify the previous request:
/folder/ is always the same.

What happened is that I have an old, mostly static website located in the public_html root.

I redesigned it and went with a dynamic model whereby all pages go through index.php, and index.php pulls the content to display based on the ?pge=filename GET variable. I uploaded the new site in a subfolder of the old site /new-site/, and now, I wanted to redirect all the pages from the old website to the new one.

Thanks to your explanations, I managed to write a redirection that nearly works. I say nearly because watching the HTTP traffic on Fiddler I can see this loop happening:

/oldpage.php
http://www.example.com/folder/index.php?pge=oldpage // This is exactly what I wanted and then...
http://www.example.com/folder/index.php?pge=index
http://www.example.com/folder/index.php?pge=index
http://www.example.com/folder/index.php?pge=index
http://www.example.com/folder/index.php?pge=index
... redirection timeout

So apparenty my redirection works, but for some reason, the resulting URL is then passed through mod_rewrite again and again and again...

I am already using the L flag, so I don't understand why the loop is happening. There is nothing else in that htaccess file, and no .htaccess file in /folder/

Here is the content of .htaccess:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^website [NC,OR]
RewriteCond %{HTTP_HOST} ^www.website [NC]
RewriteCond %{REQUEST_URI} !="^/.*?/.*"
RewriteRule ([^/]*)\.php$ protocol://www.website/folder/index.php?pge=$1 [R=302,L]

Obviously, the ideal situation would be to serve the new pages at the old, already indexed URLs silently. However, since I have spent quite a bit of time on this, I would also like to know why this code is causing an infinite loop and how one would go about fixing it.

Sylver




msg:4560861
 7:07 am on Apr 3, 2013 (gmt 0)

Oups, I found out the cause for the loop and made it work. (I had missed the fact that in htaccess the path doesn't start with "/"). Thanks a lot for your explanations!

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^website [NC,OR]
RewriteCond %{HTTP_HOST} ^www.website [NC]
RewriteCond %{REQUEST_URI} !="^.*?/.*"
RewriteRule ^([^/]*)\.php$ protocol://www.website/folder/index.php?pge=$1 [R=302,L]
RewriteRule ^$ protocol://www.website/folder/ [R=302,L]

Now that it works fine... I would like to get rid of it and serve the new pages at the old, already indexed URLs silently, as you suggested.

How is that done?

lucy24




msg:4560871
 7:39 am on Apr 3, 2013 (gmt 0)

This is in reply to your second-to-last post. We overlapped.

Show me an infinite loop and I'll show you a missing

RewriteCond %{THE_REQUEST} {more-stuff-here}

In your case the explanation is different: It's because you forgot to exclude requests for "index.php" from your redirection, so you're getting
www.example.com/folder/index.php?pge=index
looping back on itself forever. But we'll get back to that.

RewriteCond %{HTTP_HOST} ^website [NC,OR]
RewriteCond %{HTTP_HOST} ^www.website [NC]

Have you got other domains passing through the same htaccess? If no, dump these lines entirely. If yes, reduce them to
%{HTTP_HOST} example
or at most
^(www\.)?example
to shut out subdomains.

The REQUEST_URI condition is very rarely necessary. If at all possible, put it into the body of the rule. The one exception is when you need to make it negative.

RewriteCond %{REQUEST_URI} !="^/.*?/.*"


Uhm...

:: detour to text editor to paste in this line and squint at it in a bigger font ::

:: further detour to mod_rewrite docs to confirm that there's no reason for that = sign and the quotation marks, unless they've gone and added something very weird in 2.4 ::

"The requested URI is not equal to some optional stuff-- as little of it as possible-- followed by a directory slash and some more optional stuff".

Is this intended to mean "the requested URI cannot contain two or more directory slashes"? If so, that definitely belongs in the body of the Rule. Except that it doesn't need to be there, because you've already got it. Assuming for the sake of discussion that you forgot the crucial ^ opening anchor in the Rule. Without it, you could get

http://www.example.com/filename.php
>>
RewriteRule (filename)\.php
>>
rewrite involving "pge=filename"

but also
http://www.example.com/directory/subdir/filename.php
>>
RewriteRule directory/subdir/(filename)\.php
>>
rewrite again involving "pge=filename"

Have you already started doing this? If you are redirecting, you want a R=301 permanent redirect; R=302 (or plain R) means "for now you'll find the page over here but it really lives at the old URL, so keep indexing that one".

This, paradoxically, is good if you never wanted a redirect in the first place, because the search engines haven't started racking up duplicate URLs.

What you really want-- assuming I have successfully read your mind w/r/t to depth of URLs involved-- is

RewriteRule ^([^/.]+)\.php /folder/index.php?pge=$1 [L]

Dump the HTTP_HOST bits unless as already noted there are other (sub)domains involved. Dump the REQUEST_URI bit entirely.

Put this rule at the end of your htaccess, after all rules in [F] or [G] and after all redirects.


Wait, you're not done yet. In fact, you haven't started yet. You also need a series of redirects. The last two are:

RewriteCond %{THE_REQUEST} index
RewriteRule ^(([^/.]+/)*)index\.any-extensions-you-use http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Now, if you are already getting requests for "index.php?pge=some-stuff-here" you need to nip them in the bud with a rule that goes something like

--don't cut and paste, because you will probably need more detail--

RewriteCond %{THE_REQUEST} index
RewriteCond %{QUERY_STRING} pge=([^&]+)
RewriteRule ^folder/index\.php http://www.example.com/%1 [R=301,L]

This goes before the previous two redirects. From most specific to most general. Note that most rules involving URLs ending in "index.something" will require a RewriteCond looking at THE_REQUEST. (Some can get away with a NS flag, but it won't work here.) You are only redirecting the people who expressly typed in the words "index.php", not the ones who have been quietly rewritten to "index.php".

g1smd




msg:4560874
 8:10 am on Apr 3, 2013 (gmt 0)

I can't understand why you would want to redirect friendly URLs to parameter-based URLs. Usually it's the other way round.

What you need is an internal rewrite. As long as "folder" name is always the same and your friendly URLs are all lower-case and don't have folders, this will work...
RewriteRule ^([a-z0-9-]+)\.extension$ /folder/index.php?pge=$1 [L]

When you request example.com/foo.extension the server will fetch content from /folder/index.php?pge=foo

If you could go with extensionless URLs, your code will be even easier to manage. You'll need to add redirects from your old URLs to the new format and from the example.com/folder/index.php?pge=pagename format to the new format.

Sylver




msg:4560892
 9:34 am on Apr 3, 2013 (gmt 0)

Got it on the overlap.

Have you got other domains passing through the same htaccess?

Yes. And these addon domains can also be called through addon.example.com (thought I don't think anyone does that).

...confirm that there's no reason for that = sign and the quotation marks

That's because I read in the Apache wiki that it's a good practice:
We should really encourage people to use the lexicographically equal operator instead of a RegEx if they want to ckeck, if test string is lexicographically equal to cond pattern.
E.g. using
RewriteCond %{HTTP_HOST} !=""

I don't think it really makes a difference in my case, but I figured it couldn't hurt.

www.example.com/folder/index.php?pge=index

I took care of that case inside index.php:

if ($_GET['pge'] == "index")
{
$page = "pg_home";
}

Where "pg_home" is of course the home page. index.php, index.php?pge=pg_home and index.php?pge=index all produce the same result. If I am not mistaken, it means I don't need the first RewriteRule below the line, right?

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

I have addon domains in subfolders. Won't this rule cause all those other websites to be redirected to my main website?

Have you already started doing this? If you are redirecting, you want a R=301 permanent redirect; R=302 (or plain R) means "for now you'll find the page over here but it really lives at the old URL, so keep indexing that one".

Yes, this is live and I am using 302 instead of 301 until I am sure everything works well.

What you really want-- assuming I have successfully read your mind w/r/t to depth of URLs involved-- is

RewriteRule ^([^/.]+)\.php /folder/index.php?pge=$1 [L]


Yes, this is what I want BUT it causes all other file types to return as 404: that means no css, no js, etc.

When this rewrite rule is in effect, what happens with the paths in the script called? Does it thinks the base path is still "/" instead of "/folder/" where the script is actually located?


PS:
:: detour to text editor to paste in this line and squint at it in a bigger font ::

In most Windows browsers, you can press CTRL + to increase the size and CTRL+0 to reset the size. That'll save you a trip or two to the editor.

Sylver




msg:4560893
 9:45 am on Apr 3, 2013 (gmt 0)

@G1SMD:
I can't understand why you would want to redirect friendly URLs to parameter-based URLs. Usually it's the other way round.

It's pretty simple: I didn't want that, I just didn't know how to avoid it.

What you need is an internal rewrite. As long as "folder" name is always the same and your friendly URLs are all lower-case and don't have folders, this will work...
RewriteRule ^([a-z0-9-]+)\.extension$ /folder/index.php?pge=$1 [L]

Looks like it's exactly what I need and it does work, except for the paths inside the script: css, js, img, etc. all return 404. Of course, that means all the pages display but they look a mess.

What happens to the base path inside the php files? Is this an easy fix in .htaccess or do I need to fix the resource paths in the php scripts?

lucy24




msg:4560909
 10:22 am on Apr 3, 2013 (gmt 0)

There is nothing in your existing rule that would result in a 404 for non-php files. There must be another rule lurking in the background.

I read in the Apache wiki that it's a good practice:
We should really encourage people to use the lexicographically equal operator instead of a RegEx if they want to ckeck, if test string is lexicographically equal to cond pattern.
E.g. using
RewriteCond %{HTTP_HOST} !=""

I don't think it really makes a difference in my case, but I figured it couldn't hurt.

It makes a hell of a difference, because "lexicographically equal" doesn't mean "it fits this Regular Expression". It means lexicographically equal. So the condition, as written, means

If the request is not exactly equal to the literal string
^/.*?/.*


I think it is safe to say that this condition will ALWAYS succeed, because the request will NEVER be for
http://www.example.com/.*?/.*

That'll save you a trip or two to the editor.

Ahem. I am not on w###, and I should hope that all browsers on the planet allow you to resize text. But they don't let me put it into a variety of fonts so I can stare at it upside-down, backward and sideways ;)

I took care of that case inside index.php:

Except that the request will never reach the php if it keeps getting rewritten or redirected forever. You need something in the htaccess that says "If the request is already for index.php, you do not need to redirect to index.php".

What happens to the base path inside the php files? Is this an easy fix in .htaccess or do I need to fix the resource paths in the php scripts?

Oh, oops, we need to figure out where the mistake is happening. Luckily this will take approximately three seconds. Pull up your error logs and see what files are being requested. If the requested path-plus-filename is correct, the problem is in htaccess. If it is incorrect, the problem is in php.

Oh yes and: The [L] flag doesn't mean "Stop here and proceed directly to the page". It only means "You're done with mod_rewrite for now, so go back to the first mod-- which may happen to be mod_rewrite, but it doesn't matter-- start from the beginning with the newly rewritten URL, and continue until everything rinses clean".

Sylver




msg:4560975
 1:51 pm on Apr 3, 2013 (gmt 0)

It makes a hell of a difference, because "lexicographically equal" doesn't mean "it fits this Regular Expression".

Oups. Got it. Good thing this condition wasn't needed after all.

Ahem. I am not on w###, and I should hope that all browsers on the planet allow you to resize text. But they don't let me put it into a variety of fonts so I can stare at it upside-down, backward and sideways ;)

LOL. You could always create a custom browser CSS to suit your err... unconventional tastes. ;)

Of course all browsers can do that. I only mentionned W.{5}s' browsers because the shortcut (Ctrl +) might be different on other systems.

Except that the request will never reach the php if it keeps getting rewritten or redirected forever. You need something in the htaccess that says "If the request is already for index.php, you do not need to redirect to index.php".

I understand what you mean, but it works properly:
RewriteRule ^([^/]*)\.php$ protocol://www.website/folder/index.php?pge=$1 [R=302,L]
example.com/index.php matches the first time and becomes example.com/folder/index.php?pge=index.
The second time, folder/index.php does not match the pattern, so there is no looping.
Unless I am missing something here?

Oh, oops, we need to figure out where the mistake is happening. Luckily this will take approximately three seconds. Pull up your error logs and see what files are being requested. If the requested path-plus-filename is correct, the problem is in htaccess. If it is incorrect, the problem is in php.

Inside the php files, the paths are relative: "/css/mycssfile.css" or "/images/pic.jpg"

With the external redirect as above, everything displays fine, the new path is: "example.com/folder/images/pic.jpg"

Redirecting internally breaks the paths: "example.com/images/pic.jpg".

PHP include() seems to work regardless, the php script's basepath is still set correctly.

I fixed the problem by adding
<base href="http://www.example.com/folder/">
in the head of the template files.

It looks like everything is working fine now. Thanks for your help.

g1smd




msg:4560999
 2:25 pm on Apr 3, 2013 (gmt 0)

Images, CSS and JS are accessed from the web using URLs.

Rather than using <base>, alter the links in each href to begin with a leading slash and mention the full path to the file. Don't use relative linking.

Let's see the final htaccess code as I'm sure you have several bits you don't need and maybe some stuff you need, but don't know about, is missing.

lucy24




msg:4561127
 8:40 pm on Apr 3, 2013 (gmt 0)

The second time, folder/index.php does not match the pattern, so there is no looping.
Unless I am missing something here?

Yes, and it's important. The body of the RewriteRule looks ONLY at the "path".

:: shuffling papers ::

Thought so. See my first post in this thread, on your question #4. Adding a query string has no effect on the path. So the URL will continue matching until the cows come home.

Seriously, though: I'd get a major headache if I zoomed in on the entire text of the page. (Browsers will happily do "print selection" but I don't think they're at "zoom selection" yet.)

Inside the php files, the paths are relative:

Site-absolute links are safest. I'm not in a position to advise on this, seeing as how I only speak four words of php

and recent detour prompted by a different thread has just revealed that I made a vast booboo when I created a scad of php pages the weekend before last

but so far
$_SERVER['DOCUMENT_ROOT'] . "/blahblah"
has always behaved as expected.

Sylver




msg:4561357
 2:25 pm on Apr 4, 2013 (gmt 0)

Sorry for the delay getting back to the thread. Converting the URLs and making everything work turned out to be gnarly business, especially because I needed everything to work both on the live server and on my local testing server. Anyway, it's done and as far as I can tell everything important is operational.

Images, CSS and JS are accessed from the web using URLs.

Rather than using <base>, alter the links in each href to begin with a leading slash and mention the full path to the file. Don't use relative linking.

Let's see the final htaccess code as I'm sure you have several bits you don't need and maybe some stuff you need, but don't know about, is missing.

Thanks, that's very kind of you to offer.

Here is the content of my .htaccess:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} example.com [NC]
RewriteRule ^([^/]+)\.php /folder/index.php?pge=$1 [L]


I'd be really impressed if you found anything to remove here!

On the other end, there might be some things still missing. One problem I noticed is that apparently the query seems to get lost during the redirect (I can access the $_GET vars on the testing server but not on the live server).

@Lucy: Yes, this works also for example.com/index.php:

Yes, and it's important. The body of the RewriteRule looks ONLY at the "path".

:: shuffling papers ::

Thought so. See my first post in this thread, on your question #4. Adding a query string has no effect on the path. So the URL will continue matching until the cows come home.

I understand that the query string has no effect on the path... but the folder is part of the path, right? Second time around, the path becomes:

folder/index.php

And because there is a folder in the path, ^([^/]*)\.php$ does not match, which is why it works without looping. (it's live and working).

Site-absolute links are safest. I'm not in a position to advise on this, seeing as how I only speak four words of php

and recent detour prompted by a different thread has just revealed that I made a vast booboo when I created a scad of php pages the weekend before last

but so far
$_SERVER['DOCUMENT_ROOT'] . "/blahblah"
has always behaved as expected.

It's a little more complicated in my situation because I need everything to work in 2 different environments, one of which is Windows, with slightly different file structures. Also some of the files are not processed by php (for instance, js files contain links - a fair bit of AJAX is involved).

In the end, I solved the issue by defining different paths in the configuration file and setting a flag to differentiate the live server and the testing server:


// Paths
$liveServer = false;
if ($liveServer)
{
define ('ROOT_PATH', "/folder/");
define ('IMAGE_PATH', "/folder/images");
}
else
{
define ('ROOT_PATH', "");
define ('IMAGE_PATH', "/images");
}

Then I added <?= ROOT_PATH? > before each resource link. Same thing in JS, so that I only need to switch a couple of liveServer variables on the live copy and the resource paths to work.

For the PHP links I ended up replacing all the index.php?pge=mypage with <?php if($liveServer){ echo "mypage.php";}else{ echo "index.php?pge=mypage";} ?>

I still need to do a lot of testing, but so far it looks like everything is behaving as it should and there are no longer any index.php?pge=kjbchz path anywhere on the live site.

However some of these paths might have been spidered within the last 2 days. Does it matter?

Sylver




msg:4561358
 2:27 pm on Apr 4, 2013 (gmt 0)

Oh yes and: The [L] flag doesn't mean "Stop here and proceed directly to the page". It only means "You're done with mod_rewrite for now, so go back to the first mod-- which may happen to be mod_rewrite, but it doesn't matter-- start from the beginning with the newly rewritten URL, and continue until everything rinses clean".

Outch! That's not nice.

g1smd




msg:4561375
 3:12 pm on Apr 4, 2013 (gmt 0)

RewriteEngine on
RewriteCond %{HTTP_HOST} example.com [NC]
RewriteRule ^([^/]+)\.php /folder/index.php?pge=$1 [L]


I'm not sure why you test HTTP_HOST. There's no need.

^([^/]+)\.php is way below optimum. It means "keep on parsing 'not a slash' until you come to a slash, and check that the slash is a literal period". This will always fail because a slash is not a period, and that will result in many "back off and retry" trail match operations being performed to find out what you really meant.

You want ^([^.]+)\.php which means, "keep on parsing 'not a period' until you come to a period, then check it is followed by 'php'".

If the "friendly" URLs can be folder-based, you'll need ^(([^/]+/)*[^.]+)\.php instead.

Do you really need the .php extension in your URLs? I can highly recommend "going extensionless". Your RegEx pattern then becomes
^([a-z0-9-]+)$ or similar (or ^(([a-z0-9-]+/)*[a-z0-9-]+)$ if there are folders).



You should also position a non-www to www or a www-to non-www redirect ahead of all this code.

RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]


OR

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Sylver




msg:4561394
 4:15 pm on Apr 4, 2013 (gmt 0)

I'm not sure why you test HTTP_HOST. There's no need.

There are addon subdomains inside the same webspace and I don't know how they are handled, so I thought it was safer to make sure that this rewrite only applies to that specific domain.

^([^/]+)\.php is way below optimum. It means "keep on parsing 'not a slash' until you come to a slash, and check that the slash is a literal period". This will always fail because a slash is not a period, and that will result in many "back off and retry" trail match operations being performed to find out what you really meant.

You want ^([^.]+)\.php which means, "keep on parsing 'not a period' until you come to a period, then check it is followed by 'php'".

If the "friendly" URLs can be folder-based, you'll need ^(([^/]+/)*[^.]+)\.php instead.

Good suggestion but there are a few requirements that make it unworkable here:

The "friendly" URLs with folders must be excluded from this rule because I don't want files in folders to be affected, particularly the files for other websites which are contained in subfolders for that website.

That's why I use [^/]: if there is a /, there is a folder and itshould fail.

Given that requirement, [^/] is sometimes faster than [^.] because if there is a folder (which will happen regularly) that rule will bail before the "not-a-period".

The second part of the requirement is that I want to allow urls containing periods as this is a requirement for some url-based verification programs. Unless absolutely necessary, I don't want to arbitrarily ban legal characters in file names because I know that 3-4 years from now I will have forgotten all about that and will tear my hair out trying to figure out why those files never work.

There is some "back off and retry" with [^/] but since the extension is always very short (usually 3 chars), that back off and retry is limited to 4 extra loops through the string (which is also very short). The performance hit is not an issue here.

You should also position a non-www to www or a www-to non-www redirect ahead of all this code.

Good point. Thanks!

g1smd




msg:4561399
 4:39 pm on Apr 4, 2013 (gmt 0)

As you have extra hostnames resolving to folders, the non-www/www code I supplied will need an extra Condition:

RewriteCond %{HTTP_HOST} example\.com
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]


"If the requested hostname contains example.com but is not exactly example.com then redirect" will redirect example.com:80, www.example.com, etc.

OR

RewriteCond %{HTTP_HOST} example\.com
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www\.example.com/$1 [R=301,L]


"If the requested hostname contains example.com but is not exactly www.example.com then redirect" will redirect www.example.com:80, example.com, etc.


Noted, your "odd" requirement for root URLs with multiple periods:
^(([^/.]+\.)+)php$
may be what you are looking for, as long as you're aware that the passed parameter will end in a period, which your PHP script will need to strip.

Of course, life is even easier when you use extensionless URLs. :)

Sylver




msg:4561451
 6:29 pm on Apr 4, 2013 (gmt 0)

As you have extra hostnames resolving to folders, the non-www/www code I supplied will need an extra Condition:

Thanks. I missed that and didn't notice my other websites were redirected to www.example.com/example1!

Noted, your "odd" requirement for root URLs with multiple periods:
^(([^/.]+\.)+)php$
may be what you are looking for, as long as you're aware that the passed parameter will end in a period, which your PHP script will need to strip.

This one has an error in it: the period is part of the capture group so it would give "index.php?pge=name."

I am not quite sure why you want to change ^([^/]*)\.php$. I ran a quick benchmark and it handles properly a 250 characters long path (far longer than any path on my site) in 0 ms. It might not be perfect performance-wise, but what difference does it make? If performance was this critical I could never afford an extra roundtrip to the browser for something as trivial as a cannonical address anyway.

Of course, life is even easier when you use extensionless URLs. :)

How does it work? Would I need to redirect existing URLs tp extensionless then rewrite to my parameter urls?

What are the advantages?

g1smd




msg:4561459
 7:03 pm on Apr 4, 2013 (gmt 0)

This one has an error in it: the period is part of the capture group so it would give "index.php?pge=name."

Yes, "as long as you're aware that the passed parameter will end in a period, which your PHP script will need to strip".

I am not quite sure why you want to change

I prefer to write unambiguous code (that can be read left to right in a single parse if possible), so that I don't get caught out in the future when other changes interact in an unexpected way. There's more than one right way to write code but a huge number of wrong ways. I think I have used most of the wrong ways over the years. :)

The advantage of extensionless URLs is simpler patterns in the rewrite, and never a need to use expensive -f and -d tests. URL requests with extensions are served by real files: images, css, js, robots.txt ets. Extensionless requests are rewritten so that they are served by the index.php script that queries a database and builds the page. You can also block direct access to .php scripts and you can block all external requests with parameters. That can be a security gain too.

Sylver




msg:4561490
 8:14 pm on Apr 4, 2013 (gmt 0)

Yes, "as long as you're aware that the passed parameter will end in a period, which your PHP script will need to strip".


Sorry, my bad. I noticed the dot while benchmarking and forgot you already mentionned it

...never a need to use expensive -f and -d tests.

What are -f and -d tests?

lucy24




msg:4561540
 10:55 pm on Apr 4, 2013 (gmt 0)

What are -f and -d tests?

Ignorance is bliss. Stay that way ;) The conventional CMS htaccess is built around a package that goes something like
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !-d
RewriteCond {there's a third piece which I've forgotten}
RewriteRule (.*) /index.php?$1 [L]

This is supposed to mean "all requests for pages get quietly rewritten to a php page that deals with everything". But since the Rule isn't constrained to requests ending in / or .php this test has to run on absolutely all requests all the time. And there are very few situations where a request for, say, an image file are answered by php. (Few but not zero. For example it's how the <noscript> version of piwik works: <img src blahblah piwik.php et cetera>)

Second time around, the path becomes:

folder/index.php

Oops, my bad. So if your RewriteRule has an opening anchor, it should only work once. If you forget the anchor, it will cycle forever.

One problem I noticed is that apparently the query seems to get lost during the redirect (I can access the $_GET vars on the testing server but not on the live server).

Uh-oh. Do you mean that the request already has a query string before it meets the rule whose target includes
?pge=$1
If so, you need to add the flag [QSA] for "query string append". The default behaviors are:
No ? in target = leave existing query, if any, untouched
? in target = delete the existing query, with optional addition of new one.
So [QSA] is only needed if you've added a new query.

phranque




msg:4561544
 11:06 pm on Apr 4, 2013 (gmt 0)

What are -f and -d tests?


the CondPattern of the RewriteCond directive can perform various file attribute tests:
http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond

g1smd




msg:4561560
 1:01 am on Apr 5, 2013 (gmt 0)

The -f and -d tests check whether the URL request resolves to a physical file or to a physical folder. These are very slow server filesystem read operations that should be avoided.

Lucy mentioned how they are used, the idea being that because a request for robots.txt resolves to a real file, the rewrite will not occur, but a request for any other file that does not test true for -f or for -d will be rewritten to be handled by the index .php file.

This is a horrible method that should be avoided. The fact that several popular CMS packages use it is not an endorsement. They have gone for "easy code" with a huge performance hit.

phranque




msg:4561599
 5:40 am on Apr 5, 2013 (gmt 0)

filesystem caching by the OS may reduce the practical inefficiency of the file and directory existence tests in some cases, but you shouldn't count on that unless you have tested it in your specific web server and OS configuration.

in any case, it's theoretically faster to use regex to test for the things you know first than to test for the things you don't know.

Sylver




msg:4561617
 7:04 am on Apr 5, 2013 (gmt 0)

Lucy24:
Ignorance is bliss. Stay that way ;) The conventional CMS htaccess is built around a package that goes something like
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !-d
RewriteCond {there's a third piece which I've forgotten}
RewriteRule (.*) /index.php?$1 [L]

This is supposed to mean "all requests for pages get quietly rewritten to a php page that deals with everything". But since the Rule isn't constrained to requests ending in / or .php this test has to run on absolutely all requests all the time. And there are very few situations where a request for, say, an image file are answered by php. (Few but not zero. For example it's how the <noscript> version of piwik works: <img src blahblah piwik.php et cetera>)

I am a blissful individual sometimes. Fortunately, this is nearly the opposite of what I need: I need to redirect from existing files to folder/index.php

Lucy24:
Uh-oh. Do you mean that the request already has a query string before it meets the rule whose target includes
?pge=$1
If so, you need to add the flag [QSA] for "query string append". ...

Thanks! QSA is indeed the solution.

g1smd:
The -f and -d tests check whether the URL request resolves to a physical file or to a physical folder. These are very slow server filesystem read operations that should be avoided.

Lucy mentioned how they are used, the idea being that because a request for robots.txt resolves to a real file, the rewrite will not occur, but a request for any other file that does not test true for -f or for -d will be rewritten to be handled by the index .php file.

This is a horrible method that should be avoided. The fact that several popular CMS packages use it is not an endorsement. They have gone for "easy code" with a huge performance hit.


I see. That's quite a performance penalty here, but I can see why they would do that. From a CMS viewpoint it probably saves them from handling a lot of support queries that require telling the user to fiddle in their htaccess when they find out they can't access their existing files anymore.

phranque:
the CondPattern of the RewriteCond directive can perform various file attribute tests:
[httpd.apache.org...]

Thanks. I am in the process of reading the documentation but it's not an easy read. However, I think I am starting to figure it out thanks to the outstanding explanations I have received so far.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved