Forum Moderators: phranque

Message Too Old, No Replies

htaccess url rewrite from dynamic url to path and 301 redirect

Advise on my solution

         

Jelger

7:03 am on Apr 2, 2015 (gmt 0)

10+ Year Member



The goal is to redirect the dynamic url to a url path and change also the links for google to the new path oriented url.

I have a simple website with an index.php as controller. It processes a query to determine what to send back to the server.
The url looked like this:
www.example.com/?page1
I wanted to rewrite this to www.example.com/page1and send out a 301 for this new URL.

I came up with the following solution after a lot of reading and four days of trial and error:

RewriteEngine On
RewriteBase /

#1 rewrites url without www to an url with www
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301]

#2 rewrites a query to a path and gives a 301 redirect
RewriteCond %{QUERY_STRING} page1
RewriteRule .? page1 [QSD,R=301,L]

#3 redirects path to query and prevents a loop by the END flag, L is definitely not working.
RewriteRule ^(page1)$ \?$1 [END]

Above does work on my local test machine as far as I can see in firebug, or did I miss something or made a novices mistake?

After changing the htaccess the clean up:
- Changing the site map and uploading it to google (possibly also other search engines)
- upate internal links, the internal link just sends the word 'page1' to get the php controller to serve contents for page1. (hope this is the way to go?)
- and when google and other search engines have updated te link delete the 301 redirect listed as 2.

Any advise is welcome
best regards, Jelger Kingma

lucy24

8:11 am on Apr 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#1 rewrites url without www to an url with www
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301]

This rule should be LAST of all your external redirects. The condition is best expressed as
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$

meaning "anything other than the exact form I prefer". The sole purpose of this rule is to pick up requests that have not already been redirected in the course of other rules; that's why it always comes last.

Incidentally, you don't need anchors in the pattern. They will do no harm, they just aren't needed.
#2 rewrites a query to a path and gives a 301 redirect
RewriteCond %{QUERY_STRING} page1
RewriteRule .? page1 [QSD,R=301,L]

Eeuw. Are you in Apache 2.4? The new flag QSD will work, but there's honestly no reason not to stick with putting a ? at the end of the target for a net savings of three bytes. The target also needs a full protocol-plus-hostname. And, finally, the pattern should be constrained to only those URLs where this query string can actually occur. If that's the root, the rule will look like
RewriteCond %{QUERY_STRING} page1
RewriteCond %{THE_REQUEST} \?
RewriteRule ^(index\.php)?$ http://www.example.com/page1? [R=301,L]

The second condition is essential to prevent infinite loops: It says "only execute this rule if the user asked for the form with query string". I added the (index\.php)? in case some annoying search engine makes a request in this form. May as well redirect them with the same rule.

#3 redirects path to query and prevents a loop by the END flag, L is definitely not working.
RewriteRule ^(page1)$ \?$1 [END]

Wtf?
:: detour to docs ::
Oh, I see. I suppose it must be needed for someone, somewhere, but here all you need is [L].
The infinite loop is not caused by this rule; it's caused by the lack of a %{THE_REQUEST} condition in the previous rule. The rewrite will look like this:
RewriteRule ^(page1)$ /index.php?$1 [L]

I assume the point of the capture and reuse is that on your real site there will be lots of options, like (page1|page2|morestuff|etcetera). Otherwise of course you'd just use literal text in the target, with no capture.

Again, the order of these three rules should be #2 #1 #3.

Jelger

12:12 pm on Apr 2, 2015 (gmt 0)

10+ Year Member



Thanks a lot for the reply, took some time to sink in though.
I reordered and edited the rules as you advised and all works fine now.

First did not get why the full protocol-plus-hostname was needed, but this way there will only be one 301 instead of two, when the www is forgotten. (Also for the search engines I guess, to have the complete path of each page)

And I did not know the use of the ? to get rid of the query in the target, the double use is confusing (reg exp and apache)
Thanks again, was a real eye-opener.

Best regards, Jelger

phranque

6:32 pm on Apr 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, Jelger!