Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

/directory/ and /directory/index.html are duplicate content?



6:48 pm on Aug 19, 2008 (gmt 0)

5+ Year Member

Hi there,

I hope there are some people around here that are willing to answer my question.

Unfortunately on my website sometimes I have used the url for a webpage including the index.html, because I always assumed that it would not matter. I never had any complaints from Google.
Until shortly I get a message in the webmaster tools about duplicate title and subscription tags. The pages that Google refers to are for example /carnival/ and /carnival/index.html.
What can I do?

Thanks in advance.



2:56 am on Aug 28, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Hey, not fair - that's two questions!

1. Sounds like you've already got trouble. I'd say fix all the parts of the problem as fast as you can. Using a tool that has an "extended find and replace" function can help a lot.

2. No, the number of 301 redirects of this type shouldn't cause a problem. Not this kind of canonical redirect, at least. But how many directory indexes are you talking about?

I'd really suggest looking at ALL the possible canonical problems that your website might have and fixing everything at one time. Otherwise you may get quite tangled up going forward.

jdMorgan has a great thread about that here:


3:23 am on Aug 28, 2008 (gmt 0)

10+ Year Member

I have about 700 indexes on a 6500 page site. I use breadcrumb navigation on all 6500 pages - this is where most of my changes are taking place.

I'm going as fast as I can using search and replace in Dreamweaver - checking one folder at at time so not to trip over which index.html is which. I've also changed my Dreamweaver preferences for internal links to be root relative versus document relative, which helps me see what I'm doing.

Thanks for the link to the other thread, Tedster. My little brain can't quite wrap around it at the moment...I'm sure I'll be pumped once I get through this crisis.


2:52 pm on Aug 29, 2008 (gmt 0)

5+ Year Member

I have a related question, and was hoping to get some feedback. I have set up a new directory on a site's domain containing 'widget' content pages ...which url structure is better for Google rankings:

a) http://www.example.com/widgets/widgets.html
b) http://www.example.com/widgets/ (using index.html in this directory, w/ the redirect described above).

I would think 'a' would allow me to get an additional keyword in the url string, and associate the file name with the theme of the directory. Then I thought 'b' may be more powerful in the eyes of google because it is basically declaring 'index' the main page of all the supporting widget content behind it.

I browsed the SERPs to see patterns, but couldn't come to a conclusion. I would really appreciate any feedback from those of you that have experienced success with one or the other.



7:32 pm on Aug 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

JDMorgan - quite possibly, King of Apache and mod rewrite!


11:32 pm on Aug 29, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Google will often prefer to list the shorter of the two URLs.

You should also use the shorter one in your internal links.


3:08 am on Aug 30, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member


I like:

c) http://www.example.com/widgets/widgets

No need to 'lock yourself in' to any particular page type or server technology by stating it in the URL... If you've already committed to rewriting URLs, you'd might as well take the opportunity to get rid of "file" extensions from your URLs. This will save you much trouble if you later want to use PHP or asp for some of those pages; You won't have to change the URL at all, just the rewrite code.



I've come back a bit late to the party, but the reason that named anchors such as "#myanchor" are problematic is that they are not sent in HTTP requests by most clients (browsers); They are used only within the client itself as an "offset" into a page. This makes sense because the client always requests the entire page from a server; A page is the minimum conventional "serving size" (HTTP partial-content transactions notwithstanding, and keeping it simple here). Since only the client 'cares' about what anchor should be placed at the top of the current display screen, there is no reason for a client to send HTTP anchors to a server.

Since named anchors are not sent, no redirect can include them, and it is up to the client to 're-attach' the named anchor to the redirected URL to find the right 'spot' in the new document. Some browsers handle this fairly well, while others don't. Best advice is to not rely on them as the sole on-page location method; A good document structure and breaking up very-long pages into several smaller pages are ideas worth considering.

Also, the new way forward is to use CSS <div> ids instead of the now-deprecated <a name=""> named anchors, so that's another reason to not rely on named anchors.



3:23 pm on Oct 13, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member

I am hoping someone can please help me with this. I have the homepage of a site indexed by Google. It appears as www.example.com After several months of that being indexed, I am now also seeing it indexed as www.example.com/home.html

So in the Google index the homepage is there twice when you do site:www.example.com Each way once.

Does anyone know why it would appear the second way after the first way was already indexed several months ago? Also, in this situation can I do anything with a 301 redirect from my hosting account or through Google Webmaster so it only appears in the Google index the first way?

I would appreciate if you guys could help me with this.


4:42 pm on Oct 13, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

gouri, the help you need is in the Hot Topics area [webmasterworld.com], which is always pinned to the top of this forum's index page. Look for the Duplicate Content area and start with the Canonical URL Issues thread. That will give you access to many linked discussions that cover the topic very thoroughly.

The fix will depend on your type of server. If you need technical help on that beyond what is already in the posts you read, you can post your questions in the forum that deals with the kind of server you use: Apache [webmasterworld.com] or Windows IIS [webmasterworld.com].


4:48 pm on Oct 13, 2008 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

For those of you on Windows following this topic...

If you are on Windows using ISAPI_Rewrite 2.0 and the httpd method...

RewriteRule (.*/)index\.asp $1 [I,RP,L]

If you are on Windows using ISAPI_Rewrite 3.0 and the htaccess method...

RewriteRule (.*)default.aspx /$1 [R=301,NC,L]

And, on Windows Server 2008 with IIS 8, XML and the Web Config file are your new friends for rewriting. :)


7:41 pm on Oct 13, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member

Now I notice this strange thing happening. Maybe someone can help me to understand it.

Before today, when I typed a keyword in Google one of my inner pages was ranking for it. Then last week, I made some changes to my homepage, and this included mentioning that keyword on the homepage although the inner page has the keyword many more times and that is the page I want to rank for it.

Today, I see the second version of the homepage indexed (mentioned above) and now when I did a search for that keyword that the inner page was ranking for, the new version of the homepage is ranking for it and it is switching back and forth in the Google SERP between that homepage and inner page that originally ranked for the keyword?

Also, after making changes to home page, why does a new version have to be indexed or will there be only the old version with the changes after a few days? The original does not have changes made last week in cache version but the new one does.

I know that I have said a lot but can anyone please help me to make sense of this?


10:41 pm on Oct 13, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member

I wanted to add a little more information. I checked in Google Webmaster and it says that Googlebot has successfuly accessed your home page but it does not provide a date. I think usually it does tell you when it accessed your homepage last.

Maybe this might help to analyze the situation.



9:35 am on Oct 14, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

As of the last 12 hours, Google no longer show the date the Home Page was last accessed in the WMT data.

They now link to the Crawl graphs (which have always been there) instead. Those show you how many pages per day Google has accessed on your site.

There were a number of changes and bug fixes overnight: [webmasterworld.com...]

This 42 message thread spans 2 pages: 42

Featured Threads

Hot Threads This Week

Hot Threads This Month