homepage Welcome to WebmasterWorld Guest from 54.237.134.62
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 77 message thread spans 3 pages: 77 ( [1] 2 3 > >     
Canonicalization: best code to redirect no-www to www & index to /
Canonicalization: best code to redirect no-www to www & index to /
ewwatson




msg:4508253
 8:00 pm on Oct 15, 2012 (gmt 0)

Canonicalization: is this the best code to redirect no-www to www & index to root? I've read around the web for a couple days now and this is the best code I can find. Should I include Options +FollowSymLinks? And is this the leanest this can be to accomplish all it does? Thanks!

[size=2]Options +FollowSymLinks
RewriteEngine On
# redirect index.htm and index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.mydomain.com/$1 [R=301,L]

# redirect no-www to www.
RewriteCond %{HTTP_HOST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L][/size]

 

ewwatson




msg:4508254
 8:02 pm on Oct 15, 2012 (gmt 0)

Should be without size like this actually...

Options +FollowSymLinks
RewriteEngine On
# redirect index.htm and index.html to /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ [mydomain.com...] [R=301,L]

# redirect no-www to www.
RewriteCond %{HTTP_HOST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]

g1smd




msg:4508261
 8:37 pm on Oct 15, 2012 (gmt 0)

Use example.com in this forum to suppress URL auto-linking.

There's a lot of errors in there, not least the .* in the middle of the index RewriteCond pattern and the beginning of the index RewriteRule pattern.

The non-www redirect fails to redirect many requests, such as those with port numbers and others.

The correct code has been published several times so far this month. Its a regular question here.

ewwatson




msg:4508264
 8:46 pm on Oct 15, 2012 (gmt 0)

Ive been reading on this for days. All the ones in this forum as well. the .* seems pretty typical I thought. I tested it it works. here is what I have now...

# Engine on only need once
RewriteEngine On
# 301 permanent redirect index.html to root (including subdirectories)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ /$1 [R=301,L]
# 301 permanent redirect from non-www to www
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

ewwatson




msg:4508268
 8:47 pm on Oct 15, 2012 (gmt 0)

fyi the index one targets all indexs in all subdirectories too.

g1smd




msg:4508269
 8:49 pm on Oct 15, 2012 (gmt 0)

Never use .* at the beginning or in the middle of a RegEx pattern. It causes hundreds of "back off and retry" trial match attempts.

If the hostname is missing from the index redirect, non-www requests will cause an unwanted double-redirect chain.

Errors in the HOST condition in the non-www rule prevent many non-canonical hostname requests from being properly redirected.

[edited by: g1smd at 8:52 pm (utc) on Oct 15, 2012]

ewwatson




msg:4508273
 8:52 pm on Oct 15, 2012 (gmt 0)

ok how do i target all index files then?

g1smd




msg:4508274
 8:53 pm on Oct 15, 2012 (gmt 0)

/.*index => /([^/]+/)*index

^(.*)index => ^(([^/]+/)*)index

[edited by: g1smd at 8:55 pm (utc) on Oct 15, 2012]

ewwatson




msg:4508276
 8:55 pm on Oct 15, 2012 (gmt 0)

is this the one you refer by jdmorgan? [webmasterworld.com...]

ewwatson




msg:4508277
 8:56 pm on Oct 15, 2012 (gmt 0)

there are so many versions floating around I'm trying to lock down the best one.

ewwatson




msg:4508278
 8:59 pm on Oct 15, 2012 (gmt 0)

is this the best one? cleanest no errors posted by jdmorgan...

# Externally redirect requests for index.html in any directory to "/" in that directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect requests for *all* non-canonical hostnames to canonical hostname,
# including case errors and appended FQDN indicator and/or port numbers.
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

g1smd




msg:4508280
 9:01 pm on Oct 15, 2012 (gmt 0)

# Redirect index.html and .htm to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1 [R=301,L]


# Redirect non-canonical to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

[edited by: g1smd at 9:15 pm (utc) on Oct 15, 2012]

g1smd




msg:4508282
 9:04 pm on Oct 15, 2012 (gmt 0)

I predict this question will be asked again within the next ten days. :)

SevenCubed




msg:4508284
 9:11 pm on Oct 15, 2012 (gmt 0)

I predict this question will be asked again within the next ten days. :)


Not by meeeeee. I just copied it into a text file and saved it on my HDD :)

Even though there are many examples they can be difficult to find at times. I didn't in the past but now everytime you post a nugget I put it into my library -- I'm tapping your extensive knowledge. Something the search engines have not been able to duplicate that's why it's hard to find this stuff sometimes.

g1smd




msg:4508285
 9:14 pm on Oct 15, 2012 (gmt 0)

I'm finding that we have posted the right answer so many times, that Google now treats them all as Duplicate Content and prefers to show all the various versions of wrong answers instead.

ewwatson




msg:4508290
 9:46 pm on Oct 15, 2012 (gmt 0)

Awesome thank you! So I can stop looking now? That's the single cleanest version on the web?

Could you please indulge me with a couple questions? Yours differs from jdMorgans a little. 1) Why did you add a ? to html?\ HTTP/. And another ? to index\.html?$ and another after example.com)? 2) Why say it like this !^(www\.example\.com when you can seemingly simply say it like this ^example\.com$ 3) and finally why did you exclude the ^ before (.*)

g1smd




msg:4508291
 9:51 pm on Oct 15, 2012 (gmt 0)

1a) The
\.html? pattern matches both .html and .htm requests. Redirect both. I actually use \.(html?|php)$ or \.(html?|php[45]?)$

1b) The
? after hostname paired with !^ and $ stops an infinite redirect loop for pure HTTP/1.0 requests. Pure HTTP/1.0 requests do not include a host header.

2) The
^example\.com$ pattern doesn't allow a redirect for requests for example.com:80 or www.example.com:80 nor many other non-canonical hostnames.

The code above redirects all requests when the requested hostname is not "exactly" www.example.com all in lower case.

3) The
(.*) is greedy and captures "everything" so you don't need to tell it start at the beginning and continue to the end. It does that on its own.

[edited by: g1smd at 9:59 pm (utc) on Oct 15, 2012]

ewwatson




msg:4508294
 9:58 pm on Oct 15, 2012 (gmt 0)

Lol. I'll just have to take your word for it. I'm a CSS guy. This stuff is Greek to me. Thanks for the explanation though. Also, I'd like to use this on another site of mine that is all php extensions. How does that look? Just replace the HTML with php and remove the question marks? What if I wanted it more universal for both? Thanks a ton!

g1smd




msg:4508295
 10:01 pm on Oct 15, 2012 (gmt 0)

Pick whichever:

\.php$

\.(html?|php)$

\.(html?|php[45]?)$
ewwatson




msg:4508296
 10:11 pm on Oct 15, 2012 (gmt 0)

You are a wealth of knowledge my friend. Well what's better just php or php45?. Also can I safely change this part to read as follows with no problems... RewriteRule ^(([^/]+/)*)index\.html?$ /$1 [R=301,L] and I assume the same abbreviation cant be done for the non-www to www statement.

g1smd




msg:4508299
 10:19 pm on Oct 15, 2012 (gmt 0)

Do you get requests for
index.php4 or for index.php5 ever? If so, include those as options using [45]? here.

If you omit the hostname from the rule target of the redirect some requests will trigger an unwanted multiple step redirection chain.

Each rule shouldn't fix just one issue, it must fix multiple issues...

First rule: redirect index requests but make sure the user ends up on www. in this rule not in a second redirect.

Second rule: redirect requests to www.example.com when original request wasn't for "exactly" that hostname.

The
[L] flag stops rule processing once "this" rule matches. Every rule should have this flag.
ewwatson




msg:4508301
 10:30 pm on Oct 15, 2012 (gmt 0)

Awesome. Much thanks. Have a good one! I have a good guess as to why your answering this question so often lately. Google Panda update. Me and everyone else is scrabbling looking for fixes for their sudden drop in page rank.

g1smd




msg:4508302
 10:32 pm on Oct 15, 2012 (gmt 0)

This question has come up almost every week for the last ten years.




<--- Certainly boosted that number by a few thousand. :)

g1smd




msg:4508308
 10:51 pm on Oct 15, 2012 (gmt 0)

# Redirect index.html, .htm and .php to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]


# Redirect non-canonical to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

ewwatson




msg:4508309
 10:54 pm on Oct 15, 2012 (gmt 0)

ha thats funny. Lots of messages. so the php ones should look like this...

# Redirect index.html,.htm, and php to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)$\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

and...

# Redirect index.html,.htm, and php to folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php$\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]

seems like a no brainer. But I wont put it past me.

[edited by: ewwatson at 11:09 pm (utc) on Oct 15, 2012]

ewwatson




msg:4508310
 10:56 pm on Oct 15, 2012 (gmt 0)

ahh post at the same time. no $ sign after the php?

g1smd




msg:4508314
 10:59 pm on Oct 15, 2012 (gmt 0)

Edit the comments. They don't match the code.

The $ sign in the condition is wrong in both rulesets.

lucy24




msg:4508315
 11:00 pm on Oct 15, 2012 (gmt 0)

the .* seems pretty typical I thought.

Yup: In questions, not in answers.

I'm finding that we have posted the right answer so many times, that Google now treats them all as Duplicate Content and prefers to show all the various versions of wrong answers instead.

Hee. This is where I usually quote Tolstoy: Correct answers are all alike. Incorrect answers are all incorrect in their own way.

Do you need all that, uhm, stuff when looking at %{THE_REQUEST}? I just say index\.html and that's it. I guess there's a remote chance the words "index.html" could show up in a query string in some context that doesn't cause the requested URL itself to say "index.html" but, uhm, trying to think how that might work will simply give me a headache.

every week for the last ten years

Memo to self: cite number 520, not 3650, when needed.

no $ sign after the php?

$ means "at the very end of the utterance", as ^ means "at the very beginning", so they have no meaning anywhere else. You might be thinking of \b but that's not necessary here.

On the plus side, it only took me about 12 hours to figure out why one small innocuous SSI refused to play nice on the live site while it worked fine on MAMP, and other more complicated includes-- including the newly constructed php ones-- behaved identically in both places. And I didn't even have to ask anyone.

g1smd




msg:4508316
 11:08 pm on Oct 15, 2012 (gmt 0)

In the index redirect, all the "stuff" matches specific requests.

lucy24




msg:4508320
 11:48 pm on Oct 15, 2012 (gmt 0)

:) I mean the beginning stuff
^[A-Z]{3,9}\ /([^/]+/)*
not the
\.(html?|php)
part.

This 77 message thread spans 3 pages: 77 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved