Welcome to WebmasterWorld Guest from 174.129.151.95

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Canonicalization: best code to redirect no-www to www & index to /

Canonicalization: best code to redirect no-www to www & index to /

   
8:00 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



Canonicalization: is this the best code to redirect no-www to www & index to root? I've read around the web for a couple days now and this is the best code I can find. Should I include Options +FollowSymLinks? And is this the leanest this can be to accomplish all it does? Thanks!

[size=2]Options +FollowSymLinks
RewriteEngine On
# redirect index.htm and index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.mydomain.com/$1 [R=301,L]

# redirect no-www to www.
RewriteCond %{HTTP_HOST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L][/size]
8:02 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



Should be without size like this actually...

Options +FollowSymLinks
RewriteEngine On
# redirect index.htm and index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ [mydomain.com...] [R=301,L]

# redirect no-www to www.
RewriteCond %{HTTP_HOST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]
8:37 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Use example.com in this forum to suppress URL auto-linking.

There's a lot of errors in there, not least the .* in the middle of the index RewriteCond pattern and the beginning of the index RewriteRule pattern.

The non-www redirect fails to redirect many requests, such as those with port numbers and others.

The correct code has been published several times so far this month. Its a regular question here.
8:46 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



Ive been reading on this for days. All the ones in this forum as well. the .* seems pretty typical I thought. I tested it it works. here is what I have now...

# Engine on only need once
RewriteEngine On
# 301 permanent redirect index.html to root (including subdirectories)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ /$1 [R=301,L]
# 301 permanent redirect from non-www to www
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
8:47 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



fyi the index one targets all indexs in all subdirectories too.
8:49 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Never use .* at the beginning or in the middle of a RegEx pattern. It causes hundreds of "back off and retry" trial match attempts.

If the hostname is missing from the index redirect, non-www requests will cause an unwanted double-redirect chain.

Errors in the HOST condition in the non-www rule prevent many non-canonical hostname requests from being properly redirected.

[edited by: g1smd at 8:52 pm (utc) on Oct 15, 2012]

8:52 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



ok how do i target all index files then?
8:53 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



/.*index
=>
/([^/]+/)*index


^(.*)index
=>
^(([^/]+/)*)index

[edited by: g1smd at 8:55 pm (utc) on Oct 15, 2012]

8:55 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



is this the one you refer by jdmorgan? [webmasterworld.com...]
8:56 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



there are so many versions floating around I'm trying to lock down the best one.
8:59 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



is this the best one? cleanest no errors posted by jdmorgan...

# Externally redirect requests for index.html in any directory to "/" in that directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect requests for *all* non-canonical hostnames to canonical hostname,
# including case errors and appended FQDN indicator and/or port numbers.
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
9:01 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



# Redirect index.html and .htm to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1 [R=301,L]


# Redirect non-canonical to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

[edited by: g1smd at 9:15 pm (utc) on Oct 15, 2012]

9:04 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I predict this question will be asked again within the next ten days. :)
9:11 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member



I predict this question will be asked again within the next ten days. :)


Not by meeeeee. I just copied it into a text file and saved it on my HDD :)

Even though there are many examples they can be difficult to find at times. I didn't in the past but now everytime you post a nugget I put it into my library -- I'm tapping your extensive knowledge. Something the search engines have not been able to duplicate that's why it's hard to find this stuff sometimes.
9:14 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'm finding that we have posted the right answer so many times, that Google now treats them all as Duplicate Content and prefers to show all the various versions of wrong answers instead.
9:46 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



Awesome thank you! So I can stop looking now? That's the single cleanest version on the web?

Could you please indulge me with a couple questions? Yours differs from jdMorgans a little. 1) Why did you add a ? to html?\ HTTP/. And another ? to index\.html?$ and another after example.com)? 2) Why say it like this !^(www\.example\.com when you can seemingly simply say it like this ^example\.com$ 3) and finally why did you exclude the ^ before (.*)
9:51 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



1a) The
\.html?
pattern matches both
.html
and
.htm
requests. Redirect both. I actually use
\.(html?|php)$
or
\.(html?|php[45]?)$


1b) The
?
after hostname paired with
!^
and
$
stops an infinite redirect loop for pure
HTTP/1.0
requests. Pure
HTTP/1.0
requests do not include a host header.

2) The
^example\.com$
pattern doesn't allow a redirect for requests for
example.com:80
or
www.example.com:80
nor many other non-canonical hostnames.

The code above redirects all requests when the requested hostname is not "exactly" www.example.com all in lower case.

3) The
(.*)
is greedy and captures "everything" so you don't need to tell it start at the beginning and continue to the end. It does that on its own.

[edited by: g1smd at 9:59 pm (utc) on Oct 15, 2012]

9:58 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



Lol. I'll just have to take your word for it. I'm a CSS guy. This stuff is Greek to me. Thanks for the explanation though. Also, I'd like to use this on another site of mine that is all php extensions. How does that look? Just replace the HTML with php and remove the question marks? What if I wanted it more universal for both? Thanks a ton!
10:01 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Pick whichever:

\.php$


\.(html?|php)$


\.(html?|php[45]?)$
10:11 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



You are a wealth of knowledge my friend. Well what's better just php or php45?. Also can I safely change this part to read as follows with no problems... RewriteRule ^(([^/]+/)*)index\.html?$ /$1 [R=301,L] and I assume the same abbreviation cant be done for the non-www to www statement.
10:19 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Do you get requests for
index.php4
or for
index.php5
ever? If so, include those as options using
[45]?
here.

If you omit the hostname from the rule target of the redirect some requests will trigger an unwanted multiple step redirection chain.

Each rule shouldn't fix just one issue, it must fix multiple issues...

First rule: redirect index requests but make sure the user ends up on www. in this rule not in a second redirect.

Second rule: redirect requests to www.example.com when original request wasn't for "exactly" that hostname.

The
[L]
flag stops rule processing once "this" rule matches. Every rule should have this flag.
10:30 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



Awesome. Much thanks. Have a good one! I have a good guess as to why your answering this question so often lately. Google Panda update. Me and everyone else is scrabbling looking for fixes for their sudden drop in page rank.
10:32 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This question has come up almost every week for the last ten years.




<--- Certainly boosted that number by a few thousand. :)
10:51 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



# Redirect index.html, .htm and .php to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]


# Redirect non-canonical to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
10:54 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



ha thats funny. Lots of messages. so the php ones should look like this...

# Redirect index.html,.htm, and php to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)$\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

and...

# Redirect index.html,.htm, and php to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php$\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]

seems like a no brainer. But I wont put it past me.

[edited by: ewwatson at 11:09 pm (utc) on Oct 15, 2012]

10:56 pm on Oct 15, 2012 (gmt 0)

5+ Year Member



ahh post at the same time. no $ sign after the php?
10:59 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Edit the comments. They don't match the code.

The $ sign in the condition is wrong in both rulesets.
11:00 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



the .* seems pretty typical I thought.

Yup: In questions, not in answers.

I'm finding that we have posted the right answer so many times, that Google now treats them all as Duplicate Content and prefers to show all the various versions of wrong answers instead.

Hee. This is where I usually quote Tolstoy: Correct answers are all alike. Incorrect answers are all incorrect in their own way.

Do you need all that, uhm, stuff when looking at %{THE_REQUEST}? I just say index\.html and that's it. I guess there's a remote chance the words "index.html" could show up in a query string in some context that doesn't cause the requested URL itself to say "index.html" but, uhm, trying to think how that might work will simply give me a headache.

every week for the last ten years

Memo to self: cite number 520, not 3650, when needed.

no $ sign after the php?

$ means "at the very end of the utterance", as ^ means "at the very beginning", so they have no meaning anywhere else. You might be thinking of \b but that's not necessary here.

On the plus side, it only took me about 12 hours to figure out why one small innocuous SSI refused to play nice on the live site while it worked fine on MAMP, and other more complicated includes-- including the newly constructed php ones-- behaved identically in both places. And I didn't even have to ask anyone.
11:08 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



In the index redirect, all the "stuff" matches specific requests.
11:48 pm on Oct 15, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



:) I mean the beginning stuff
^[A-Z]{3,9}\ /([^/]+/)*
not the
\.(html?|php)
part.
This 77 message thread spans 3 pages: 77