Forum Moderators: phranque

Message Too Old, No Replies

index.html to /

redirect index.html to directory

         

smallcompany

8:51 pm on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just saw Google reporting duplicated content in Webmaster tools. I know it happens with query strings for a reason, but never had clean

/folder/index.html

and

/folder/

be considered as duplicated content.

Anyhow, since I like the "/" mode, I have a question about the code below:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ http://www.example.com/$1 [R=301,L]

The * prior the index, would that also include a page like /123index.html?

Thanks

g1smd

9:31 pm on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It would, but (.*) might not be the best pattern to capture that with.

However, would you really want to redirect that to www.example.com/123 and without a trailing slash?

The code would also redirect /folder/index.html to www.example.com/folder/

There's also a much more efficient way to capture folder depths for redirecting. It's been posted dozens of times over the last year or two.

jdMorgan

10:04 pm on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also, it is not "*" prior to the index, it is ".*" prior to the index. That is, "Match zero or more of any character."

I like this better:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1? [R=301,L]

Jim

smallcompany

3:48 am on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks.

Let's just put it the other way:

I don't want anything except "bare real" index.html to be redirected to it's folder, aka the same page.

The example I provided is from this forum, and the ".*" has confused me as it looked like more than just "index.html" could be picked, so I wondered about it.

My goal is to resolve an issue of "index.html" pages only, period.

I'll check the previous posts again, and try to figure out the code that looks "index" only to me.

Thanks

TheMadScientist

4:11 am on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don't look too far...
See jdMorgan's post, then copy / paste.

smallcompany

4:39 am on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Funny...

I just found that on one of the shared hostings I use, if I type a full URL with a typo like "5index.html" or "tindex.html", the server "nicely" redirects to index.html

I exclude the code I asked about, and same thing happens again. I check the site that has never had this code, same again.

I check one of other sites with different provider, I get the expected 404.

Is it common that shared hosting providers correct what they think is a mistake? I assume if I had the page "tindex.html" that it would load - I guess, haven't tried.

Anyway, now after I tested this on a different server (that does not guess), it worked as expected, so thank you very much!

I guess my original code would work too, but as one said, listen to Jim! ;)

jdMorgan

1:10 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Exactly "/index.html", not in any subdirectory, no URL-fragment, no query string:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.html\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/$1 [R=301,L]

Exactly "/index.html", no URL-fragment, no query string, but in any subdirectory

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]

The problem you saw with "5index.html" may be due to the actions of mod_speling or mod_negotiation. Both have associated directives to disable them. For mod_speling, it's "CheckSpelling off" and for mod_negotiation, it's "Options -Multiviews".

Jim

g1smd

2:44 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"the server "nicely" redirects to index.html"

For reader benefit, never redirect TO named index pages. Redirect to /folder/ instead.

smallcompany

4:40 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks very much. So Jim's original proposal would include query string, the one with [^\ ]*\ after index\.html on first line.

Now we're covered. No mysteries... ;)

Thanks very much!