Forum Moderators: phranque

Message Too Old, No Replies

Redirect from non www and remove trailing slash and more!

         

maxed

12:15 am on Oct 17, 2008 (gmt 0)

10+ Year Member



Okay so im trying to redirect from

http://example.com/ to http://www.example.com/
http://example.com/name1 to http://www.example.com/name1
so that all non-www urls go to www urls.

also im trying to redirect from
http://www.example.com/name1/ to http://www.example.com/name1
http://example.com/name1/ to http://www.example.com/name1 (no www and unwanted trailing slash)
so that unwanted trailing slashes on these variables are 301 redirected to urls without the slash.

Additionally i wanted to organize my site so that categories are redirected to:
http://www.example.com/+london to http://www.example.com/index.php?r=london
http://www.example.com/+london+restaurant to http://www.example.com/index.php?r=london&c=restaurant
http://example.com/name1 to http://www.example.com/index.php?d=name1

After alot of googling i came up with the rewrite code in my .htacces:
------------------------------------------------------------------------
Options +FollowSymLinks
RewriteEngine On

# Redirect from non www to www version of domain
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,NC]

# Redirect to remove trailing slashes on extensionless URL requests
# If requested URL ending with slash does not resolve to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
# Externally redirect to remove trailing slash
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,NC]

# Redirect from non www to www version of domain
RewriteRule ^+([a-z0-9]+)+([a-z0-9]+)$ index.php?r=$1&c=$2 [NC,L]
RewriteRule ^+([a-z]+)$ index.php?a=$1 [NC,L]
RewriteRule ^([a-z0-9]+)$ index.php?d=$1 [NC,L]
-----------------------------------------------------------------------------

But it dosnt seem to work :(
help please!

Thanks
max

[edited by: jdMorgan at 12:22 am (utc) on Oct. 17, 2008]
[edit reason] Please use example.com only. [/edit]

g1smd

12:59 am on Oct 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I follow what you want to do with most of the redirects, but the very last one should not be a redirect. The last one should be coded as a rewrite, so that users no longer see the dynamic URLs.

There are several threads that cover this, with example code, in the last few week. Check out this one... [webmasterworld.com...] as well as some later threads that build on that example.

There are comments in here that might also be useful: [webmasterworld.com...]

maxed

1:52 am on Oct 17, 2008 (gmt 0)

10+ Year Member



Could you clarify, i dont seem to understand what you mean.... im trying to play around with some of the rules but
----
RewriteRule ^+([a-z0-9]+)+([a-z0-9]+)$ index.php?r=$1&c=$2 [NC,L]
RewriteRule ^+([a-z]+)$ index.php?a=$1 [NC,L]
RewriteRule ^([a-z0-9]+)$ index.php?d=$1 [NC,L]
-----
seems to give 500 errors.

and http://example.com/name1/ is redirected to http://www.example.com/http:/www.example/name1

jdMorgan

3:03 am on Oct 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to take a look at the regular-expressions tutorial linked from our forum charter [webmasterworld.com], since several of those patterns are malformed, and there's little likelihood of correcting them by "playing around."

There is also a link to the mod_rewrite documentation, and some information about how to get the most from this forum.

Jim

maxed

3:24 am on Oct 17, 2008 (gmt 0)

10+ Year Member



hi, thanks i think i got everything to work now using the code below:

--
Options +FollowSymLinks
RewriteEngine On

RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

#get rid of trailing slashes
RewriteCond %{HTTP_HOST} ^(www.)?example\.com$ [NC]
RewriteRule ^(.+)/$ [%{HTTP_HOST}...] [R=301,L]

#page redirect
RewriteRule ^\+([a-zA-Z]+)$ index.php?r=$1 [L]
--

maxed

3:55 am on Oct 17, 2008 (gmt 0)

10+ Year Member



Everything seems to be working fine now but i was just wondering, if i visit http://example.com/name/ the server does 2 rewrites first removing the trailing slash then adding the www. Is this okay? i get these http headers:

#1 Server Response: http://example.com/name/
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Fri, 17 Oct 2008 03:50:50 GMT
Server: Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.8b mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Location: http://www.example.com/name/
Content-Length: 419
Connection: close
Content-Type: text/html; charset=iso-8859-1
Redirect Target: http://www.example.com/name/

#2 Server Response: http://www.example.com/name/
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Fri, 17 Oct 2008 03:50:50 GMT
Server: Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.8b mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Location: http://www.example.com/name
Content-Length: 422
Connection: close
Content-Type: text/html; charset=iso-8859-1
Redirect Target: http://www.example.com/name

#3 Server Response: http://www.example.com/name
HTTP Status Code: HTTP/1.1 200 OK
Date: Fri, 17 Oct 2008 03:50:50 GMT
Server: Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.8b mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By: PHP/5.2.6
Connection: close
Content-Type: text/html

maxed

3:57 am on Oct 17, 2008 (gmt 0)

10+ Year Member



Im sorry for triple posting, but i meant to say it adds www first then removes the trailing slash.... but im worried that its having to redirect 301 two times... is this okay for the search engines?

g1smd

8:27 am on Oct 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is NOT OK. It is called a redirection chain.

You need to have the "/" rule first, and it needs to force www at the same time as fixing the slash, for all the URLs where it operates.

The general non-www to www rule comes next, and that will operate for all remaining non-www requests.

One major point. You tag the last item as

#page redirect
. Please be aware that the last item is *not* a redirect. The last item is a rewrite.

It is crucial that you understand the difference between a redirect and a rewrite. The notes in the two linked examples should clarify that.

A redirect ends the current HTTP transaction, and forces the browser to make a new HTTP request for a new URL.

A rewrite takes a URL request and silently fetches the content from a different location in the server (different to what the URL hints at as being the real location) and does not reveal what that internal location actually is.

Now that you have a rewrite, you also need a redirect ("in the other direction") to stop the "real location" on the server being exposed as an external URL and therefore open to being indexed. The linked examples show the process in full.

jdMorgan

3:29 pm on Oct 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest:

Options +FollowSymLinks
RewriteEngine On
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?r=([a-zA-Z]+)
RewriteRule ^index\.php$ http://www.example.com/+%1? [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^\+([a-z]+)$ index.php?r=$1 [NC,L]

Be aware that "+" is not a valid character in a URL-path-part in HTTP. Therefore, this character will be encoded for transmission, and may appear as "%2B" in search results, bookmarks, and links and citations elsewhere on the Web. I strongly suggest that you use a different character -- One that is unreserved according to the HTTP/1.1 URI specification [faqs.org].

Your choices are limited to these characters:

- _ . ! ~ * ' ( )

but none of these are very attractive, and all would be prone to people typing them in incorrectly or overlooking them if used alone. Therefore, you may want to consider 'tagging' the friendly URLs with a short-but-unique string instead -- Something like "prod", "cat", "pg-" -- anything that won't be likely conflict with existing or future URL-paths on the server, and that "makes sense" to you.

[added] Replace the broken pipe "¦" character above with a solid pipe character before use; Posting on this forum modifies the pipe character. [/added]

Jim

[edit] Corrections/additions as noted in post below. [/edit]

[edited by: jdMorgan at 3:41 am (utc) on Oct. 18, 2008]

maxed

3:08 am on Oct 18, 2008 (gmt 0)

10+ Year Member



Thanks to both of you for taking the time to reply, its really helped me alot.
jdMorgan i tried the code you specified but and everything seems to work fine except i think you mean the (pipe) character not the broken pipe in the line "^www\.example\.com(\.¦\.?:[0-9]+)$ [NC] " ?

I think i shall use the characters "in-" instead of the + sign as you suggested to specify a category.

Also this code seems to redirect www.example.com/index?r=name to www.example.com/in-name?r=name.
I just cant seem to figure out how to fix the "?r=name" that is added on to the url..
The code i have now is:

--
Options +FollowSymLinks
RewriteEngine On
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?r=([a-zA-Z]+)
RewriteRule ^index\.php$ http://www.example.com/in-%1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^in-([a-z]+)$ index.php?r=$1 [NC,L]
--

jdMorgan

3:34 am on Oct 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Blaghh!

Every time I post this rule, I forget to clear that query string. Change that one line to:


RewriteRule ^index\.php$ http://www.example.com/[b]in-%1?[/b] [R=301,L]

That "?" will not appear in the redirected URL; It is a mod_rewrite token meaning "clear the query string by replacing it with a blank one."

[added] and yes, you must replace any broken pipe "¦" character you see in code posted here with a solid pipe character before use; Posting on this forum modifies the pipe characters. And I forgot to add this warning to my post as well... I must've needed a lot more coffee... :o [/added]

Jim

[edited by: jdMorgan at 3:38 am (utc) on Oct. 18, 2008]

maxed

4:07 am on Oct 18, 2008 (gmt 0)

10+ Year Member



Thanks for the fast reply with yours and g1smd's help, now everything seems to be working fine (*gasp*)...
i really don't know how you are able to tame the beast that is mod_rewrite, many sleepless nights and coffee was required for me to get to this stage with mod_rewrite!

jdMorgan

5:06 am on Oct 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Easy... re-read the documentation once a month for ten years. Then write tens of thousands of lines of code, with at least 25% of it screwed-up. Between memorizing stuff, coding practice, and the resulting debugging practice, it gets a lot easier... ;)

"500-Server Error? Great! Only 499 to go, and then it'll probably work!"

Jim

g1smd

9:16 am on Oct 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For the first four years I was completely unaware of this stuff. For the next two years I had no clue what it was all about, looked too difficult.

I then started experimenting. For the next year or two I saw "Error 500" more times than I saw working code - on about a 50 to 1 ratio (perhaps 20 to 1 on a good day). In the last year or two I have become a lot more happy with it, but it requires a lot of thought, and very careful testing - because it is totally unforgiving of typos and logic errors.

I often write code that appears to work, because it does what I want for the particular URL format that I am thinking of - then I test it with other URLs and it does something unintended.

It is good to be selective. If the code only needs to work for URLs in one folder, or only for certain types (html, images, etc), then code it to do only that.

.

RewriteRule ^in-([a-z]+)$ index.php?r=$1 [NC,L]

Your code responds to mixed-case URLs. You need to make sure the PHP script checks the incoming request and if the request was for "thispage" you serve the page, and if the request was for "ThiSpaGE" the PHP script sends

HEADER "404 Not Found"
otherwise you still have a duplicate content issue to contend with.

maxed

10:06 am on Oct 18, 2008 (gmt 0)

10+ Year Member



Thanks for the replies, i think mod_rewrite is powerful but as you say very unforgiving.

It seems as you say different cases are treated differently, but im wondering if it would be better to have the script send a 404 or do a 301 redirect to the proper cased url. But after some reading it seems that since i dont have access to httpd.conf it wont be worth it to use mod_rewrite to convert the string to lowercase and do a redirect, am i right?

g1smd

10:37 am on Oct 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is your choice whether incorrect case results in 404 Not Found or whether you 301 to the correct URL.

You would do the case-checking and error handling all within your PHP script. The script would send the 301 or 404 HEADER out.

maxed

8:43 pm on Oct 18, 2008 (gmt 0)

10+ Year Member



Ah, i was just thinking of the rewritemap feature that http.conf allows but since im using shared hosting i think it will be just fine letting the script redirect by sending a 301 header to the lower case url.

g1smd

8:50 pm on Oct 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PHP really is the best place for doing that.

maxed

10:00 pm on Oct 18, 2008 (gmt 0)

10+ Year Member



Using some php i am able to redirect any uri that are uppercase to lowercase:
$str =$_SERVER["REQUEST_URI"];
$str = strtolower($str);
$check=strcmp($str,$_SERVER["REQUEST_URI"]);
if (!$check==0)
{
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.example.com".$str);
exit();
}

this works great,if for example i was to enter http://www.example.com/NAME it would be 301 redirected to http://www.example.com/name

But if someone was to enter http://example.com/NAME then there would be a redirection chain where mod_rewrite adds does a 301 to http://www.example.com/NAME then another 301 is done by the php script to http://www.example.com/name

I guess this is unavoidable since there is no fast way to do this on shared hosting that does not allow me access to rewritemap in httpd.conf.So i am wondering if i should move to a semi-dedicated to be able to use httpd.conf so there isnt two 301's being perfomed, or just leave it as is?

g1smd

10:49 pm on Oct 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would add a third line to your PHP-generated redirect; one that sends
 HEADER ("Connection: close");
as well.

In the .htaccess file, you need to add an exception to the rule that does the non-www to www redirect, such that stuff with uppercase characters is not fixed by that rule. The rule will be skipped for those. Those requests will instead be passed through into the PHP module where the PHP code will fix the casing and - for those improperly-cased URLs - will also fix the www at the same time.

I would add an ELSE condition to your PHP code such that it sends a 404 response if it isn't doing a 301 and the requested URL still isn't valid in some way.

maxed

2:16 am on Oct 19, 2008 (gmt 0)

10+ Year Member



thanks i added the line to close the connection, thanks for the pm btw.

I added this to my .htaccess but it seems to be giving me some strange results.
# Externally redirect to force canonical hostname
RewriteCond %{REQUEST_URI} ![^A-Z]+
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

My reasoning was that the redirect should only take place if the uri did not match one or more uppercase A-Z letters.

This works fine for http://example.com/name which the rule leaves and is redirected by the php script. However this rule seems to leave http://example.com/ as well every other url from being redirected...

jdMorgan

2:29 am on Oct 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




RewriteCond %{REQUEST_URI} ![^A-Z]+

This says, "Execute the rule only if the requested URI does not contain any characters which are not uppercase letters." So, the rule won't be executed if the URL contains any lowercase letters, numbers, or punctuation...

You probably want to use


RewriteCond %{REQUEST_URI} ![A-Z]+

which says, "Execute the rule only if the requested URI does not contain any uppercase letters."

Jim

maxed

2:49 am on Oct 19, 2008 (gmt 0)

10+ Year Member



That for some strange reason redirects http://example.com/in-NAME to http://www.example.com/index.php?r=NAME which redirects to http://www.example.com/index.php?r=name which then redirects to [example...]

mmmm... i was wondering maybe its better to do
RewriteCond %{REQUEST_URI} ^[a-z0-9-/]+$

this would mean the rule only takes place if the uri contains lowercase a-z, numbers,dashes or slash, everything seems fine but whenever i add /. to allow dots as well other weird things happen....

maxed

4:47 am on Oct 19, 2008 (gmt 0)

10+ Year Member



Okay when i apply :
--
# Externally redirect to force canonical hostname
RewriteCond %{REQUEST_URI} ^[.-/0-9a-z]+$
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
--

it seems that
http://example.com/in-name is redirected to
http://www.example.com/index.php?r=name (*shock*) which then redirects to http://www.example.com/in-name .

if I remove the "." from the ^[.-/0-9a-z]+$ then everything seems fine but then any uri that contains a "." is simply ignored...
I think slowly mod_rewrite is tearing away my sanity...

-edit:
my host has agreed to take a look at if he can edit the httpd.conf for me, if he can i believe it would be better as none of my scripts would have to have the extra code to deal with case issues. I think it would wise to take care of trailing slash and non www urls as well so that there isnt a redirection chain, im wondering if the following code would be good for what i need:

--
RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) http://www.example.com${lc:$1} [R=301,L]
--

Im just wondering how i would take care of any trailing slash.... how would i be able to check if there is a trailing slash and if there was remove it?

maxed

7:23 am on Oct 19, 2008 (gmt 0)

10+ Year Member



Im wondering if this would work:
--
RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^([^/]+)/?$ http://www.example.com/{lc:$1} [R=301,L]
--

g1smd

8:00 am on Oct 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The double redirect you describe happens when the rewrite is listed before the redirect in the .htaccess file.

The rewrite *changes* the URL seen by Mod_Rewrite and that "new" URL is then processed by the redirect, and exposes the internal filepath.

There may be another reason, but that is mostly the issue whenever it comes up in conversation here. Always list all redirects from specific to general, and list all redirects before all rewrites.

Additionally, if you are using RewriteRule, then make sure that none of your other code contains a plain Redirect or Redirectmatch. If you use RewriteRule for one, then use it for all. The others come from a different Apache module, and when you use both together you cannot guarantee the order they are processed in.

maxed

8:27 am on Oct 19, 2008 (gmt 0)

10+ Year Member



mmm im not so sure which rule exaclty causes this madness, however here is my current .htaccess,

--
Options +FollowSymLinks
RewriteEngine On
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?r=([a-z]+)
RewriteRule ^index\.php$ http://www.example.com/in-%1? [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{REQUEST_URI} ^[-/0-9a-z]+$
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^in-([a-z]+)$ index.php?r=$1 [NC,L]
--

the exception i added was
RewriteCond %{REQUEST_URI} ^[.-/0-9a-z]+$

which i thought would cause the rewrite only to take place if the URI contained lowercase letters....

maxed

8:22 pm on Oct 19, 2008 (gmt 0)

10+ Year Member



it seems that i am lucky and my host has added this to my httpd.conf:

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^([^/]+)/?$ http://www.example.com/{lc:$1} [R=301,L]

however this doesnt seem to work as expected this rule dosnt seem to rewrite anything to lowercase....

jdMorgan

8:48 pm on Oct 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That code will only redirect to the lowercase-translated URL if the requested URL is in the top-level directory; No subdirectory URLs will be lowercased.

Also, there is a "$" missing in the substitution, so the RewriteMap will likely never be invoked:

I'd suggest:


RewriteEngine on
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^/(.+)$ http://www.example.com[b]/${l[/b]c:$1} [R=301,L]

Jim

maxed

9:15 pm on Oct 19, 2008 (gmt 0)

10+ Year Member



but that may not remove any trailing slashes from urls?
how about:
RewriteEngine on
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^/(.+)/?$ http://www.example.com/${lc:$1} [R=301,L]

hmm... would that be fine?
-edit
on second thought i guess ^/(.+)/? would still not be great as it would stop at the first subdirectory as well?

[edited by: maxed at 9:31 pm (utc) on Oct. 19, 2008]

This 52 message thread spans 2 pages: 52