Page Names & Duplicate Content

I know much has been written here about the duplicate content penalty, and how some people believe it's a major problem and some people feel it's less of a problem.

Here's a question I haven't been able to find an answer for.

I know from experience that if in one place on a site a link to a page refers to "Index.htm" and on another page a link to the index page is referred to as "index.htm" when you do a site: query on Google you will mention of two pages: Index.htm and index.htm

As a programmer, I find this quite surprising as web servers are not case sensitive i.e. both Index.htm and index.htm will resolve to the same page, and it's a no-brainer for a programmer to remove all capitalisation from a string of text which is a long way of saying that Google ought to know that Index.htm and index.htm are the same page.

BUT, I know that it does actually view them as separate pages. My question is this: as a consequence of this, Google has two pages in its index, which are in fact the very same page, so will this lead to the duplicate content penalty?

And the last question I have on this subject is, given Matt Cutts blog posts about duplicate content, is this something I need to worry about?

The main reason I ask is that I am working on a client site and the original developer was REALLY sloppy about the link text used throughout the site, and when I went to build a Google sitemap using a software tool, some pages were appearing four or five times due to inconsistencies in the capitalisation used.

To compound matters, Google knows about both http://www.example.co.uk and http://example.co.uk (both point to the same content) and to make matters worse, they have a .ie domain as well carry the same content (both [yyyy.ie...] and [yyyy.ie)....] This adds up to something potentially 20 pages known in the index all with the same, identical content.

Do I need to worry about this?

[edited by: tedster at 3:31 pm (utc) on April 20, 2007]
[edit reason] switch to example.co.uk [/edit]

Page Names & Duplicate Content

Influence of hyperlink case on duplicate content

hedwig

tedster

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week