Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Redirect "index.html" to "www.example.com"?

         

lee_sufc

1:17 pm on Oct 2, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



Just a quick question:

For the sake of Google, last year, I changed my htaccess file so that http://example.com redirected to the www version.

Should I also be redirecting www.example.com/index.html to just www.example.com? I have changed my internal links to remove the "index.html" but was wondering if I should redirect it completely?

If so, how do i do this?

[edited by: tedster at 7:24 pm (utc) on Oct. 2, 2006]
[edit reason] use example.com [/edit]

WiseWebDude

9:56 pm on Oct 2, 2006 (gmt 0)

10+ Year Member



You can't...it would then be an "infinite loop", trust me, I've tried, LOL...the browser will sit there forever and show nothing :)

g1smd

10:07 pm on Oct 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can redirect index.html to / and do it for index pages in both the root and in any folders. Stick this in your .htaccess file:

RewriteCond %{THE_REQUEST} ^.*\/index\.html?
RewriteRule ^(.*)index\.html?$ http://www.domain.com/$1 [R=301,L]

OR

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.domain.com/$1 [R=301,L]

There is no infinite loop. When you ask for / the server gets the index page, whatever it is called, without telling you what it is actually called.

jdMorgan

10:11 pm on Oct 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're on Apache, and can use mod_rewrite, it's a bit tricky, but possible.

In .htaccess:


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

By testing for "/index.html" in the original request sent by the client, the looping problem is avoided, since the rule won't be retriggered by the action of DirectoryIndex.

Jim

lee_sufc

10:19 pm on Oct 2, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



thanks for the reply - is it actually worth me doing this in the first place or is it a case of "if it aint broke, don't fix it"

g1smd

10:21 pm on Oct 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It will ensure that search engines only ever list www.domain.com/ and www.domain.com/folder/ type URLs, and also ensure that all Pagerank flows to the correct version (without "index" filename) too.

lee_sufc

10:40 pm on Oct 2, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ok - i might do this then. at present, index.html isn't listed in google but example.com and example.com/index.html both have PR5. I am a bit worried that by chaging this, I might affect my Google SERPS in a negative way...

Have many people here actually done this and experienced positive results?

g1smd

10:50 pm on Oct 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Raises Hand!

I wouldn't recommend it, if I hadn't seen it in action. :-)

Oliver Henniges

6:15 am on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, I tried as well. This morning I changed all remaining internal relative links from index.html to the absolute domain name and also added that rewrite condition to .htaccess. I also changed the first line in my sitemaps.xml accordingly.

Yet, I am not too familiar with apache and htacess: There is no problem to have TWO such rewrite conditios, is it (one for the canonical www/non-www and one for this index.html-thing)?

The past weeks I was wondering why google has indexed some pages designed in summer with the same pagerank as the domain itself, though almost all backlinks point to the mere domain-name. Even that current TBPR-update did not seem to change that. I will now see, if this PR-dilution between domain.de and domain.de/index.html was the reason.

However: those changes were only made for search-engines, not for my visitors.

lee_sufc

7:22 am on Oct 3, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



hi oliver - i had the same question: is it ok to have both rewrites in the ht access file; this is what mine looks like now:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.net [NC]
RewriteRule (.*) http://www.example.net/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm
RewriteRule ^index\.htm$ http://www.example.net/ [R=301,L]

jtara

9:49 am on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What's up with this line?

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm

I pondered that a bit, wondering why it is matching from 3 to 9 letters. I guess this is an attempt to skip over the method part of THE_REQUEST.

But why use THE_REQUEST in the first place? Why not just use REQUEST_URI?

This condition also doesn't address multiple directories containing an index.html - only the root instance.

This seems to deal with multiple index.html's on your site:

RewriteRule ^(.*/)index.html$ $1 [L,R=301]

No need for a RewriteCond at all.

WiseWebDude

1:08 pm on Oct 3, 2006 (gmt 0)

10+ Year Member



g1smd,

I appreciate that, I did not realize you could do it like that, my apologies for telling you wrong, lee_sufc...

as a matter of fact, I will try that myself :)-

jdMorgan

1:19 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jtara,

Try your RewriteRule-only code. It will create an infinite redirection loop due to interaction with DirectoryIndex, as described in my first post.

It is true that the version I posted only redirects requests in the root directory. However, g1smd posted two variants that handle all subdirectories as well, if that is needed.

Jim

jtara

4:33 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try your RewriteRule-only code. It will create an infinite redirection loop due to interaction with DirectoryIndex

I did. It seems to do exactly what I want it to do. And I do use DirectoryIndex.

Can you describe under what circumstances this causes an infinate loop?

g1smd

5:58 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It causes an infinite loop if you forget to expressly check that index.htm or index.html is in the original request.

My sample code redirects both index.htm and index.html in both the root and in folders. It preserves the folder name in the redirect too.

One change to your code. You need to do the index redirect first, so that all */index get redirected to www.domain.com/*/

If you do it the other way, then domain.com/*/index first gets redirected to www.domain.com/*/index before being redirected on to www.domain.com/*/ after that. You need to avoid a redirection chain like that.

Do the index redirect to www non-index first (because that one works for all index pages, whether www or non-www and whether in the root or in a folder), and only after that then do the test for non-www and redirect all that remains to www.

jtara

7:10 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are also doing secondary-site redirection, you can avoid a double-redirect thusly:

RewriteRule ^(.*/)index.html$ http://example.com$1 [L,R=301]

(I use the non-www convention.)

Follow this with your secondary-site redirection rule.

Since you are doing a redirect anyway if you find index.html, there is no harm done in adding "http//example.com" to the front of the redirection, whether it is needed or not.

Still not grokking the "infinate loop" issue, as I haven't seen it. I've only addressed index.html, though, not index.htm, index.php, etc. I'm not sure that those are too important.

My goal is to have "clean" URLs. The only reason for redirecting index.html if your internal links are written using just the directories is that many people assume it's existence, and will automatically stick it in when hand-constructing a link, etc. You want to gently correct those links back to the clean version.

I notice some sites go one way or the other with this. Yahoo doesn't bother to redirect, nor does Google. I've seen sites that redirect from / to /index.html.

Wikipedia redirects from / to /wiki/Main_Page. This is pretty typical for CMSs, and makes some sense. Wikipedia wants a permanent URL for the wiki, but allows for the possibility of sticking some other home page in front of it in the future.

ashear

7:40 pm on Oct 3, 2006 (gmt 0)

10+ Year Member



You can always do..
.htaccess
Redirect 301 index.html http://www.example.com/something

[edited by: tedster at 11:06 pm (utc) on Oct. 3, 2006]
[edit reason] use example.com [/edit]

jdMorgan

7:41 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I recall, the circumstances under which an 'infinite loop' can result are as follows:

  • Redirect to "/" is from the page specified as a DirectoryIndex local-URL-path
  • The index page specified by the DirectoryIndex directive exists
  • The redirect code is located in .htaccess, and is therefore executed during the Apache API fix-up phase
  • Server is configured such that mod_dir executes before mod_rewrite (a very common configuration)

    If all of those are true, you'll get a loop.

    So, a simple example (omitting other config/setup directives) would be:


    DirectoryIndex index.html
    RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

    Here, a client request for "/" is internally rewritten to "/index.html" by mod_dir.
    Mod_rewrite is then invoked, redirecting the request back to "/" because the REQUEST_URI is now "/index.html"
    On the resulting subsequent client HTTP request for "/", the process repeats, leading to a loop.
    This looping continues until either the client or the server hits its maximum redirection limit.

    The cure is to use THE_REQUEST to verify that the client originally requested /index.html before invoking the redirect. By doing so, the loop can be broken, since THE_REQUEST will not be updated, as REQUEST_URI is, by the action of mod_dir.

    Many if not most Webmasters are on shared hosting, and are stuck with .htaccess solutions. This is one of the differences between a .htaccess context and that of httpd.conf or conf.d. Since I address many more .htaccess questions, I tend to forget to explicitly mention or describe the differences. Hopefully, that's not the point of confusion here.

    Jim

  • jtara

    8:18 pm on Oct 3, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The redirect code is located in .htaccess, and is therefore executed during the Apache API fix-up phase

    OK, that explains why I am not looping, as I am putting the directive in httpd.conf.

    And thanks for that very through explanation. Now I see why people are using THE_REQUEST.

    g1smd

    10:54 pm on Oct 3, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Please also swap your pairs of lines of code around so that the index stuff is first and the non-www stuff is last; as I described above. Avoid the chain!

    youfoundjake

    8:17 pm on Oct 4, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Ok, this may seem like a silly question, but after reading numerous posts on WebmasterWorld and gathering as much info, is the correct way to have navigation in a website use the FQDN for each link? www.example.com/index.html gets 301'd to www.example.com/ and the link to the about us page is www.example.com/about-us.html instead of just about-us.html?

    g1smd

    8:29 pm on Oct 4, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    You're correct with the index stuff.

    As for other internal pages, you can use full links like "http://www.domain.com/folder/that.page.html" or you can use simpler links like "/folder/that.page.html" that BEGIN with a "/" each time, combined with the <base> tag like this: <base href="http://www.domain.com/"> where that tag appears once in the head section of each and every page of the site.

    Oliver Henniges

    9:22 pm on Oct 4, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    webmasterpearls like this thread should be copied/filed in an extra section for better retrieval besides all the everyday noise. Thank you very very much.

    youfoundjake

    8:49 pm on Oct 6, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Gotta a question
    IBL's are pointing to www.example.com (never to www.example.com/index.htm)
    but I never set up a redirect from /index.htm to /
    and I just redirected example.com to www.example.com
    and nowhere in site:www.example.com does /index.htm show up.
    Will all the redirects I setup help smooth things out, including maybe increasing the results of the link:operator and having more then 2 pages not return supplemental?

    Also

    As for other internal pages, you can use full links like "http://www.domain.com/folder/that.page.html" or you can use simpler links like "/folder/that.page.html" that BEGIN with a "/" each time, combined with the <base> tag like this: <base href="http://www.domain.com/"> where that tag appears once in the head section of each and every page of the site.

    Wouldn't the links then be www.domain.com/folder//that.page.html? with the two leading slashes? which would be a 404?

    jtara

    9:19 pm on Oct 6, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    There's no need for a <base> tag in this case. References starting with a / are always relative to the root of the current site.

    The intended use of the <base> tag is as a shorcut to some directory other than the root.

    For example, perhaps a page contains a lot of images which are not in the current directory or perhaps maybe not even on the current site. Let's say they are all in www.example2.com/project22/results/images.

    You could use a <base> tag:

    <base href="http://www.example2.com/project22/results/images/" />

    Now you can refer to the images simply by their file names, without needing to add all that stuff in front of them.

    jdMorgan

    9:42 pm on Oct 6, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Jake,

    Taking your specific question, there are three main ways you can link to an object on a page (such as an image) or to another page on your site.

    The client (browser or SE robot) will resolve these links as follows:

    <img src="blue_widget.gif"> (Page-relative path) Remove the 'file name' from current URL (in browser address bar) and add "blue_widget.gif"

    <img src="/red_widget.gif"> (Server-relative path) Remove entire local URL from current URL (leaving only domain name), and add "red_widget.gif"

    <img src="http://example.com/green_widget.gif"> (Canonical URL) Use this URL to get the object, disregarding the current URL.


    Using the page-relative linking method, you can also remove and add directory levels as desired:

    <img src="../../images/blue_widget.gif"> (Page-relative path) Remove two subdirectory levels and the 'file name' from the current URL and add "images/blue_widget.gif"

    The methods above will give the stated results, without injecting any extra slashes into the URL.

    Jim

    [edited by: jdMorgan at 9:44 pm (utc) on Oct. 6, 2006]

    youfoundjake

    11:41 pm on Oct 6, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    My page links to images as
    <img src="Images/logo.gif" alt="description" width="371" height="59" longdesc="http://www.domain.com" />

    And the question I brought up earlier is should all my links be relative like above or FQDN such as
    <img src="http://www.domain.com/Images/logo.gif" alt="description" width="371" height="59" longdesc="http://www.domain.com" />

    I ask this because sometimes it seems that people that post here have the FQDN for their internal linking structure, instead of the relative path. Is there some kind of benefit to linking internally with the FQDN?

    g1smd

    11:55 pm on Oct 6, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Two reasons:

    More immune to duplicate content issues. If all internal links include the www, even if there is no "non-www to www redirect" it is much harder for any non-www URLs to be indexed, and then "stick".

    More immune to URL hijacking. That was very common a year or two back. This was where people used to point multiple dodgy 302 redirects at your site, and get your content listed at their URL.

    youfoundjake

    12:01 am on Oct 7, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Cool beans, looks like I'm going to have a long weekend of find and replace to update all my links. It makes sense to me, because it certainly does distingush each page, just not quite sure why I never though of it before probably because I was operating on the relative mindpath set. Of course, adding www.domian.com to all the links is going to screw up the code to text ratio, but ehh.. its worth it if it helps prevent against supp results and dup content, and makes crawling that much easier.
    Have a good weekend yall

    g1smd

    12:03 am on Oct 7, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Instead of adding www.domain.com to every link on the site, you can get away with just adding the <base href="http://www.domain.com/"> tag once in the head section of each page.
    This 59 message thread spans 2 pages: 59