Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

index.html questions - filenames and correct indexing of URLs

         

Marfola

11:34 am on Jun 27, 2007 (gmt 0)

10+ Year Member



I was recently advised to redirect (301) my internal links from http://www.example.com/section/category/index.html to the root of the site or of the folder in our case http://www.example.com/section/category/ to avoid duplicate indexation.

Currently, the correct url is http://www.example.com/section/category/index.html. Thus I would need to rewrite the urls to exclude the filename, implement a site wide 301 redirect and remove the urls with a filename (index.html) from sitemaps.

Is this really necessary? Wouldn’t excluding root urls without the filename (http://www.examplee.com/section/category/) from sitemaps produce the same result? Why would this solution be less effective? Should I be worried about the indexation of duplicate links, root and filename, for each webpage?

[edited by: tedster at 11:45 am (utc) on June 27, 2007]
[edit reason] switch to example.com - it can never be owned [/edit]

Marcia

12:19 pm on Jun 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It does make a difference with the main index page of subdirectories, it's best all around if they're referred to as /subdirectoryname/ with the final forward slash included - for uniformity and consistency.

tedster

12:57 pm on Jun 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some questions:

1. Do you currently see both versions in the Google index?
2. If you only see one version, is it the index.html version or the folder root?
3. How are your internal links set up?
4. Do you have any external backlinks that point to either or both types of URL?

The reason I ask, is that sometimes it's just "borrowing trouble" to place a redirect if there's no problem. You can always keep this information on the back burner in case trouble does show up some day.

And sometimes, when Google already has all the index.html versions indexed, redirecting to the folder root can cause a long process of ranking problems while Google sorts out your new instructions.

In a purely theoretical way, the advice you received is good. If you were setting up a domain for the first time, that would be the best practice. But if things have already been rolling for a while, and if search engines already have indexed your site one way -- then switching over can be a tricky thing, especially if you do not have a real blockbuster of a website, say a high PR7 home page or better.

From what I've been seeing recently, Google has improved their handling of this kind of canonical issue, even for sites that don't explicitly have a fix in place. So unless there is evidence of trouble, I'd say be very cautious about making changes.

Marfola

8:35 am on Jun 28, 2007 (gmt 0)

10+ Year Member



Hi tedster,

thanks for your comments. In answer to your questions:

Google has indexed both versions and in some cases both versions for the same page
our internal links are /index.html
Almost all backlinks point to /index.html

We’ve removed the pages without the filename (index.html) from Google sitemaps. I assume this will facilitate the de-indexation of these pages but will it create other problems? What should I look for as evidence of trouble?

tedster

1:00 pm on Jun 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We’ve removed the pages without the filename (index.html) from Google sitemaps. I assume this will facilitate the de-indexation of these pages but will it create other problems? What should I look for as evidence of trouble?

If you mean that you no longer have those urls in your sitemap.xml file, the answer is no -- that alone will not remove a url from the Google index. If your server still resolves that url with a 200 status, or even a 302 redirect, Google will keep it. A sitemap does not tell Google that "these are the only urls you should index.

In a case of index.html urls, I wouldn't even trust requesting a url removal request -- I've heard reports that trying to remove canonical troubles through the url removal request in Google Webmaster Tools can cause both versions to vanish.

The only thing that will help is if your server now redirects from one version of the url to your chosen version. Since you are seeing both versions currently indexed, that means PageRank may be split between the two versions of those urls instead of being "all in one pile." So I would study the site: results first and see which version predominates, and set all the redirects to that. Also, at the same time, ensure that all your internal links use your chosen version and not the other.

And, as Marcia mentions, I would also suggest a strong preference for the /directrory/ form. Only redirect the other way if you see a significant percentage of indexed urls in the index.html form. And even then, for the best long term results, you may still want to choose the 301 redirect FROM /directory/index.html TO /directory/ -- knowing that it may mean a period of lower traffic for now.

Marfola

3:49 pm on Jun 28, 2007 (gmt 0)

10+ Year Member



The only thing that will help is if your server now redirects from one version of the url to your chosen version.

We removed the 301 redirect earlier this week after finding the following sitemaps warning in webmasters tools.

‘When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot because they contained too many redirects. Please change the URLs in your Sitemap that redirect and replace them with the destination URL (the redirect target). All valid URLs will still be submitted.’

HTTP Error: 301 (Moved permanently)

URL: http://www.example.com/section/category/

Date: June 7, 2007

NB: There were no URLS in our sitemap that redirected on June 7. The only other possible explanation for the problem is that Google doesn’t like more than one redirect on the same page, i.e. http://www.example.com/section/category/index.html 301 redirects to http://www.example.com/section/directory/index.html and http://www.example.com/section/directory/ 301 redirects to http://www.example.com/section/directory/index.html. Or was this the reason for the warning?

So I would study the site: results first and see which version predominates, and set all the redirects to that. Also, at the same time, ensure that all your internal links use your chosen version and not the other.

All internal links and nearly 100% of our external links are to /directory/index.html. More than 50% of the pages indexed in google and yahoo (which brings us as much traffic as google) end in /directory/index.html.

And even then, for the best long term results, you may still want to choose the 301 redirect FROM /directory/index.html TO /directory/ -- knowing that it may mean a period of lower traffic for now.

Why is /directory/ better than /directory/index.html? Is this the best solution for Yahoo as well?

A period of lower traffic isn’t our only concern. Getting thousands of sites to change links to our webpages (we have thousands of links to internal pages) would be a nightmare!

g1smd

9:52 pm on Jun 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The "too many redirects" problem occurs if you have separate redirects for non-www to www and for index.html to "/" -- what you need is for redirects to NOT be chained.

There may be multiple rules, one for index files and another for non-www entries; but each rule should take you to the final destination in just one move. Check that out first.

Marfola

7:22 am on Jul 3, 2007 (gmt 0)

10+ Year Member



Thanks!

Anyone willing to take a stab at the following?

And even then, for the best long term results, you may still want to choose the 301 redirect FROM /directory/index.html TO /directory/ -- knowing that it may mean a period of lower traffic for now.


Why is /directory/ better than /directory/index.html? Is this the best solution for Yahoo as well?

A period of lower traffic isn’t our only concern. Getting thousands of sites to change links to our webpages (we have thousands of links to internal pages) would be a nightmare!

tedster

8:20 am on Jul 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you ever change your website platform/technology -- then urls that do not use file extensions such as .htm, .asp or whatever will not need to change. This means any future re-development you may do has one less hurdle to clear, because all your directory root pages will not change their URL.

Since you have thousands of links to the index.htm version, you may be better off not worrying about this aspect of "future-proofing" your site. It would not have an impact on rankings for the current site.

g1smd

11:07 pm on Jul 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> Why is /directory/ better than /directory/index.html? <<

1. Because search engines generally prefer to list the shorter one, if they find that both exist.

2. You can also future-proof your URLs (see tedster's comments directly above).

3. If people already link to index.html then your redirect will get the traffic over to the correct URL anyway. Do get people to update links to point at the correct URL as soon as possible anyway though.