homepage Welcome to WebmasterWorld Guest from 50.17.107.233
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess redirects for index.html, index.php and index.htm to /
riospace




msg:4485316
 5:34 am on Aug 16, 2012 (gmt 0)

I have been reading many different opinions and codes for redirecting all of my index files to /. Most of my pages end in index.htm, but some have index.html and index.php.

From what I have read, it is considered duplicate content if you do not redirect all of the index files in all folders/directories to /?

I have seen this used in .htaccess for redirects:

RewriteCond %{THE_REQUEST} ^.*/index\.html
RewriteRule ^(.*)index.html$ http://www.mydomain.com/$1 [R=301,L]


Is this correct and do I need to do this for index.htm and index.php seperately or is there a way to put this all together in one redirect?

I currently only have this in my .htaccess file:

RewriteEngine on

RewriteCond %{HTTP_HOST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ http://www\.mydomain\.com/$1 [R=301,L]


I know that there are many discussions on this, but I can not find one that deals specifically with index.html, index.htm and index.php all together? Like I said earlier most of my files have index.htm, but I also have index.html and index.php files that I want to redirect. I am not even sure that I should be redirecting, but I don't want to have duplicate content problems. This is way over my pay grade.

 

riospace




msg:4485326
 5:53 am on Aug 16, 2012 (gmt 0)

I just found this .htaccess code:

RewriteEngine on

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.{htm|html|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(htm|html|php)$ http://example.com/$1 [R=301,L]


So, if the code above is correct, this is what my .htaccess file will look like:

RewriteEngine on

RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www\.example\.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.{htm|html|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(htm|html|php)$ http://example.com/$1 [R=301,L]

riospace




msg:4485331
 6:04 am on Aug 16, 2012 (gmt 0)

That last code did not work, but this one did:

RewriteEngine on

RewriteCond %{HTTP_HOST} ^website\.com$ [NC]
RewriteRule ^(.*)$ http://www\.website\.com/$1 [R=301,L]

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.website.com/$1? [R=301,L]


Before I use this permanently, is this a good idea and what am I going to be in for with google when I make this change to my entire site?

lucy24




msg:4485333
 6:13 am on Aug 16, 2012 (gmt 0)

#1. Dump the

^.*

anywhere it occurs. If you're not capturing it, you don't need it.

#2. By the usual jaw-dropping, mind-boggling coincidence I have just this minute finished reading a post in this very Forum about redirecting to index files. Putter around and you will find the exact wording for your htaccess. It should be only a few posts away from this one.

Yes, you can combine extensions. You don't even need to be specific about them; if someone comes along and requests /index.bzzt, you can redirect them right along with everyone else. (ONLY for redirects! When rewriting, accept only canonical URLs.)

So instead of {blahblah}index\.(php|html?) you could simply say index\.\w+ It seems safe to say that if someone asks for "index." alone, or "index.one-two-three" they are probably up to no good and can be safely 404'd. You gotta draw the line somewhere. Your RewriteCond can be even more minimalist and just say

RewriteCond %{THE_REQUEST} index

because the more detailed format is already in the Rule. You just need to exclude the ones that asked for something containing "index". That's assuming you have not goofed horribly by having directory names or query terms that also contain the string "index". If they do, the Condition may have to be more exact.

riospace




msg:4485334
 6:24 am on Aug 16, 2012 (gmt 0)

I have also come across these:

RewriteCond %{THE_REQUEST} ^.*\/index\.html?
RewriteRule ^(.*)index\.html?$ http://www.domain.com/$1 [R=301,L]

OR

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.domain.com/$1 [R=301,L]



So basically, what is the best course of action? This can get really confusing and I just want to do the best thing. Thanks!

riospace




msg:4485337
 6:31 am on Aug 16, 2012 (gmt 0)

Also, once I get this right, can I keep linking to the index.htm files instead of the folders when I am building my site? It is much easier for me to manage my links when working on the website when linking to index.htm etc. instead of just /.

lucy24




msg:4485380
 8:54 am on Aug 16, 2012 (gmt 0)

Goodness. I don't think I even saw your in-between posts while composing mine.

You can link to "index.htm" for development purposes, so long as you remember to delete all of them before uploading the files to the real www space. If it's a big site, you may need to locate a doodad that does this globally without having to open every file. (Practice on a backup first!)

Or you can install a pseudo-server like MAMP or WAMP; then everything will work just like on a "real" site. That also includes the, er, opposite end of any link: the part where you generally want to say /directory/overhere rather than ../../directory/overhere. Site-absolute rather than relative links. Not only for other pages but for stylesheets, includes and similar.

g1smd




msg:4485469
 1:30 pm on Aug 16, 2012 (gmt 0)

Any code with .* at the beginning or in the middle of a pattern can and should be changed to be more "specific".

The order of your redirects is important. The index redirect must be before the non-www/www redirect otherwise you introduce an unwanted multiple step redirection chain when a non-www index URL is requested.

riospace




msg:4485536
 4:47 pm on Aug 16, 2012 (gmt 0)

Thanks Lucy24 and g1smd! It is a large site and I have found a "doodad" and have already changed globally to / but have not uploaded. I am still trying to get the redirect correct and honestly I am confused about the fine details and the (.*).

I have this now:

RewriteEngine on

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.website.com/$1? [R=301,L]

RewriteCond %{HTTP_HOST} ^website\.com$ [NC]
RewriteRule ^(.*)$ http://www\.website\.com/$1 [R=301,L]


#1. Dump the ^.*



Yes, you can combine extensions. You don't even need to be specific about them; if someone comes along and requests /index.bzzt, you can redirect them right along with everyone else. (ONLY for redirects! When rewriting, accept only canonical URLs.)

So instead of {blahblah}index\.(php|html?) you could simply say index\.\w+ It seems safe to say that if someone asks for "index." alone, or "index.one-two-three" they are probably up to no good and can be safely 404'd. You gotta draw the line somewhere. Your RewriteCond can be even more minimalist and just say

RewriteCond %{THE_REQUEST} index


I am not sure how to change Lucy24 suggestions above. Could someone give me an example of the code above changed with Lucy24 suggestions. Thanks!

riospace




msg:4485587
 7:14 pm on Aug 16, 2012 (gmt 0)

This is all I want to do, nothing more - using .htaccess:

Here's a few examples:
http://mydomain.com
http://mydomain.com/index.html
http://mydomain.com/index.htm
http://mydomain.com/index.php
http://www.mydomain.com/index.php
http://www.mydomain.com/index.html
http://www.mydomain.com/index.htm
all redirect to:
http://www.mydomain.com/

http://mydomain.com/subdir
http://mydomain.com/subdir/index.html
http://mydomain.com/subdir/index.htm
http://mydomain.com/subdir/index.php
http://www.mydomain.com/subdir/index.php
http://www.mydomain.com/subdir/index.html
http://www.mydomain.com/subdir/index.htm
all redirect to:
http://www.mydomain.com/subdir/


I am just looking for the best code to do this and I have seen it done a bunch of different ways in this forum with many differing suggestions. I can not seem to understand the specifics and I need to get it right as I have changed my entire site to point to / only and I am waiting to upload until I get this right. Thanks!

g1smd




msg:4485595
 7:53 pm on Aug 16, 2012 (gmt 0)

I use something similar to this:

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z](3,9)\ /([^/]+/)*index\.[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]


# Redirect non-canonical hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


(\.[a-z0-9]+)?[^\ ]*\ HTTP/ is "period followed by some characters followed by stuff that is not a space then HTTP/" and simplifies to \.[^\ ]*\ HTTP/

Don't redirect only "example.com", as you miss redirecting "www.example.com:80"

The NC flag is unwanted.

(.*) captures everything, so anchoring is not required.

Do not escape periods in rule target.

riospace




msg:4485671
 10:07 pm on Aug 16, 2012 (gmt 0)

Thanks g1smd!

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z](3,9)\ /([^/]+/)*index\.[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]


Not sure why, but this code did not redirect for me when I tested it.

The following code did, but I am not sure if it needs to be tweaked or not:

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]

g1smd




msg:4485684
 10:37 pm on Aug 16, 2012 (gmt 0)

Typo! Wrong type of brackets: (3,9) should have been {3,9}

Take the time to learn the RegEx syntax. This stuff pops up all the time.

riospace




msg:4486082
 2:56 am on Aug 18, 2012 (gmt 0)

I am working on learning more RegEx syntax, but the code is still not working and I can't figure it out.

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]


This is what I am currently using:

RewriteEngine on

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]

# Redirect non-canonical hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Will the code that I am currently using slow down page loads? It really seems to be slowing down page loads on pages that should be in the cache. Also, how long do I need to leave this redirect up after the change from index.html to / ?

phranque




msg:4486092
 6:47 am on Aug 18, 2012 (gmt 0)

the code is still not working

precisely how is it "not working"?

how long do I need to leave this redirect up after the change from index.html to / ?

as long as your server is receiving requests for index.html you will need a redirect to the canonical url.

lucy24




msg:4486095
 8:50 am on Aug 18, 2012 (gmt 0)

This element

index(\.[a-z0-9]+)?

strikes me as overkill. You said at the beginning that your index files may have php OR html OR htm. That's three. If someone asks for a nonexistent extension, go ahead and dump a 404 on 'em. And if someone asks for "index" without extension, they are probably up to no good. So cut it back to

index\.(php|html?)$

And then, once you've canonicalized all your directory names, you can rename the physical index files so Apache doesn't have to waste time looking for three possible files every time there's a request for a directory. You might possibly need both php and htm(l) but you can definitely regularize to either html or htm throughout. The Apache default is html, so that will save the usual nano-micro-thingie.

as long as your server is receiving requests for index.html you will need a redirect to the canonical url.

If any of those requests are coming from the googlebot, you can condense that sentence to "forever" ;)

riospace




msg:4486164
 8:24 pm on Aug 18, 2012 (gmt 0)

Actually the code that g1smd recommended does work. I made a slight typo.

riospace




msg:4486167
 8:43 pm on Aug 18, 2012 (gmt 0)

This is what I have now and it seems to be doing the trick:

RewriteEngine on

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(php|html?)$ http://www.example.com/$1? [R=301,L]

# Redirect non-canonical hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Thanks for all the help and please let me know if there are any other recommendations on the rewrite codes above.

riospace




msg:4486647
 3:16 am on Aug 21, 2012 (gmt 0)

Now that I have changed all of my links, uploaded and redirected in htaccess, roughly how long does it take for these changes to show up in Google search results?

phranque




msg:4486664
 5:02 am on Aug 21, 2012 (gmt 0)

the last time i measured something like this google was dropping about 50% of the non-canonical urls from the index each month.

g1smd




msg:4486960
 7:40 pm on Aug 21, 2012 (gmt 0)

You'll see one or two effects within weeks but I usually say three to six months, but it can be longer.

Kimkia




msg:4532429
 6:40 am on Jan 2, 2013 (gmt 0)

If I use the above code, but have
link rel="canonical" on http://www.example.com/categoryname/index.html within each directory, am I shooting myself in the foot?

phranque




msg:4532437
 8:30 am on Jan 2, 2013 (gmt 0)

what is the link element's href attribute value?
which url are you linking to internally?
it is technically better if you redirect requests for non-canonical urls using a 301 status code.

g1smd




msg:4532491
 11:22 am on Jan 2, 2013 (gmt 0)

Hopefully, you have

link rel="canonical" href="http://www.example.com/categoryname/"

but you should add the redirect.

Make sure that the site internally links to URLs without the index file name.

Kimkia




msg:4532588
 7:12 pm on Jan 2, 2013 (gmt 0)

Hopefully, you have

link rel="canonical" href="http://www.example.com/categoryname/"


Well, I confess what I currently have is:
link rel="canonical" href="http://www.example.com/categoryname/index.shtml"

with this in htaccess:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=permanent,L]

When linking to the home page of each directory within my site, I always use the full url with the index.shtml extension - with one exception, which is a breadcrumb script that automatically links to http://www.example.com/categoryname/

I think this has lead to some dilution of page importance in Google, but I'm not sure. I just know my 11 year old site is suffering badly after years of doing very well. Is dropping the index.shtml extension recommended for SEO?

If so, then I assume that I need to 1) rewrite all internal links to the new format and 2) change the htaccess to the code cited above. Are there any other steps I should take?

g1smd




msg:4532594
 7:45 pm on Jan 2, 2013 (gmt 0)

Change all internal links to point to URLs without the index filename.

Change the rel="canonical" to point to URLs without the index filename.

Redirect requests for index URLs to the URL without the index filename.

Make sure the index redirect is listed before the non-www redirect.

Update your non-www redirect to use the more inclusive code mentioned a few posts back.

lucy24




msg:4532619
 9:08 pm on Jan 2, 2013 (gmt 0)

Look at it this way: One side of your mouth is calling something canonical, and the other side of your mouth is deploying a 301 to say it's not canonical. Who you gonna believe?

Google will eventually get annoyed if it follows your own internal links and runs smash into a redirect. How annoyed it gets, and how that annoyance manifests itself, are questions for another thread. Or two, or a hundred ;)

Kimkia




msg:4532874
 4:25 pm on Jan 3, 2013 (gmt 0)

Thank you for the helpful responses. I've been busy. My site is static, and there's no easy way to make global changes, but here's what I've done:

- Changed as many of the internal directory links as I could find (navigation, past newsletters, etc)
- Changed the rel="canonical" on individual directory indexes to point to urls without the index.shtml extension
- Updated htaccess to this:

RewriteEngine on
# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(shtml|html?)$ http://www.example.com/$1? [R=301,L]
# Redirect non-canonical hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Should I also put in individual 301 redirects for each category index?

I also regenerated my sitemap and pinged all the major search engines.

The bad news is that my traffic from Google has plummeted overnight, down by about 40%, which is going to cost me a fortune in lost revenue. Any insight as to whether this might improve and, if so, how long it might take?

g1smd




msg:4532898
 5:04 pm on Jan 3, 2013 (gmt 0)

I would have added the redirect some time after fixing the canonical tag and fixing the navigation.

Hopefully, you are now pointing to URLs ending with a slash.

Use the Live HTTP Headers extension for Firefox to check the responses for a few canonical and non-canonical URLs.

Run Xenu LinkSleuth over the site and carefully check the reports.

(shtml|html?) simplifies to s?html?

Add a blank line after each RewriteRule for clarity.

Should I also put in individual 301 redirects for each category index?

That's what the
([^/]+/)* and (([^/]+/)*) bits are for.
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved