Forum Moderators: phranque

Message Too Old, No Replies

Newb q: rewriting all urls on a site permanently

         

DiscoStu

8:43 pm on Feb 16, 2010 (gmt 0)

10+ Year Member



I have a question that concerns the fundamentals of rewrites vs redirects. Say I have a filestructure on the server that names all the files

domain.com/file-name.php

but I'd rather all URLs be

domain.com/file-name.html

If I use the following:

RewriteEngine On

RewriteRule ^(.-)*(.+).html$ $1$2$3.php

when I type in domain.com/any-url.html it returns the file at domain.com/any-url.php, so the html version now resolves (when using up to 3 words in the URL - but it seems to work for more for some reason). But now I have two urls working to each file (both php and html extensions), and I want ONLY the .html version to work. But this causes me problems as the html version doesn't actually exist. Is there a way to handle this with rewrites or am I going about it the wrong way? Should I instead change all the files on the server to .html instead? redirecting (301) to the html version doesn't work as there is no file at that location...

I guess I'm following the part where you type in one URL and actually display the content from a different URL (rewwrite), but I'm not sure exactly how it works with dupe dontent etc. If I have an extension on all my files on my server that I want to change (like php to html), can that be done with rewrites without creating duplicate URLs to the same file?

jdMorgan

10:49 pm on Feb 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't confuse URLs with files. It would almost be accurate to say that they are two utterly different things. In fact, they are not at all related -- except because of the primary action of the server.

A URL "exists" as soon as you put it in a link on a page. It does not matter if that URL is valid or if it will resolve to a file somewhere, on some server. It has been defined and now exists.

A file exists as soon as you create it on or upload it to your server. It makes no difference if there is a URL associated with that filename on that server -- a link in other words. The file exists independent of the Web.

The server "associates" files with URLs. It has a 'default method' for doing this: Remove the protocol and domain from the requested URL, add the defined DocumentRoot for that domain, and use the result as a filepath. This is the basic function of a server: To translate requested URLs to filepaths.

mod_rewrite is a way to change this default URL-to-filepath translation. It can do three main things:
1) Modify the URL-to-filepath translation.
2) Redirect a request for one URL to another URL by responding with a redirect response and terminating the current HTTP transaction.
3) "Forward" the client's request to another server -- either out on the Web, or inside the server's local network -- perhaps a back-end application server. This is the reverse-proxy through-put function, which we'll retire at this point for the sake of simplicity.

Taking your example above. The correct steps to "change from html URLs to php files" without creating duplicate content would be:
1) Create a RewriteRule to internally rewrite requests for URLs ending in ".html" to filepaths ending in ".php"
2) To prevent duplicate-content, create a second RewriteRule, this one to externally redirect only direct client requests for URLs ending in .php to URLs ending in ".html". Because of other requirements and basic organizational simplicity, this external redirect rule should --along with all other external redirects-- precede any internal rewrites. On a new site or on a site where the change is well-planned, this second rule isn't required. It would serve mainly to guarantee than none of the new .php filepaths would ever get listed as URLs due to coding errors or accidents, and that if they did, the redirect would signal search engines to quickly get rid of these "wrong" URLs.

So an external redirect is a URL-to-URL translation involving the client (the server sends a redirect response containing the new URL and terminates the current HTTP connection, and the client then usually issues a second HTTP request, now using the new URL just provided by the server).

An internal rewrite is a (non-default) URL-to-filepath translation occurring solely inside the server.

Now note that I've used the somewhat-redundant phrases "internal rewrite" and "external redirect" and I've been careful to distinguish URLs "out on the Web" from filepaths inside the server. If you adopt this (or a similar) framework, your experience with mod_rewrite will be much simplified, less "mysterious" and/or stressful, and likely more successful.

You *will* see phrases like "internal redirects" in Apache error messages and logs. Just understand that this is a reference to an internal rewrite, and carry on... They also mis-spelled "referrer" as "referer" in the HTTP header specifications, and this error was carried forward into the the %{HTTP_REFERER} server variable name -- No-one's perfect... :)

Jim

[edited by: jdMorgan at 1:30 am (utc) on Feb 17, 2010]

DiscoStu

12:16 am on Feb 17, 2010 (gmt 0)

10+ Year Member



Jim, thanks for taking the time to explain this. You're right, using the right terminology clears up a lot of the confusions. Actually what I wanted to do was make sure requests for .php files instead yields .html. So trying to follow your explanation (and this is what confused me a bit before):

RewriteEngine On

#redirect direct client requests for URLs ending in .php to URLs ending in ".html":

RewriteRule ^(.+).php$ $1\.html [R=301]

# internally rewrite requests for URLs ending in ".html" to filepaths ending in .php:

RewriteRule ^(.+).html $1\.php [L]


But this creates an infinite redirect loop? I also told you in a different thread that I had a problem with the URL being rewritten to include a bunch of folders from the server (home/site/public_html), and you suggested adding the protocol and domain to avoid it. But that turns it in to a 302 redirect right? I'm having this problem with rewrites too...but it's inconsistent, it works fine for some stuff and then inserts the folders for others. Sometimes it seems like certain rewrites/redirects still stay in effect after the .htaccess file has been updated.

I realize the above code is wrong, but I'm confused about why/how it is

jdMorgan

12:34 am on Feb 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no provision in your redirect rule to detect "direct client requests," so you've created a loop. The rules countermand each other in all cases.

The literal periods in your patterns should be escaped. The literal periods in your substitutions should not.

Missing protocol, domain, and [L] flag on first rule.

Missing end-anchor on 2nd rule's pattern.

# redirect only direct client requests for URLs ending in .php to URLs ending in ".html"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/?#\ ]+/)*[^.?#\ ]+\.php(\?[^#\ ]*)?(#[\ ]*)?\ HTTP/
RewriteRule ^(.+)\.php$ http://www.example.com/$1\.html [R=301,L]
#
# internally rewrite requests for URLs ending in ".html" to filepaths ending in .php
RewriteRule ^(.+)\.html$ /$1.php [L]

Flush (delete) your browser cache to avoid seeing stale cached server responses.

Jim

DiscoStu

1:27 am on Feb 17, 2010 (gmt 0)

10+ Year Member



Thanks, I was pretty close on the RewriteRule, but I have to learn about RewriteCond which I haven't really looked in to specifically yet...anyway, thanks a lot for pointing me in the right direction, I will continue reading

jdMorgan

1:36 am on Feb 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



g1smd pointed out that in my long post above, I had reversed the php/html of your .html URL to .php file rewrite and .php URL to .html URL redirect requirements. I have edited the post to correct that error, so that it won't confuse subsequent readers.

As I said in that post, "nobody's perfect." :)

Jim