Forum Moderators: open
I've been looking for an answer to this problem for some time, hopefully someone here can help.
Just about when Dominic happened I completely redesigned my site, including new content, artwork and...filenames. I changed the filenames from, for example,...My Widgets.html to My_Widgets.html. Alas, I forgot to remove the old filenames from the server, so had duplication for a number of weeks. I realised my mistake and then pulled the "old" pages from the server, not too much damage done. In the meantime, just to make things neater for me, I had changed the filenames again from My_Widgets.html to my_widgets.html. I did an automatic removal of the now very old My Widgets.html pages and assumed Google would see My_Widgets.html and my_widgets.html as one and the same page. Unfortunately Google now has two sets of identical pages in the index, one with caps and one without. I would use an automatic removal but I am sure it would remove both sets of pages as if you click on either when displayed as a result you get the same page, ie the correct one so therefore either page is "valid" and still live according to Google. Can anyone please help...?
Google have suggested using a robots.txt, but the old capitalised pages are no longer on the server, even if they were though, would Googlebot differentiate between caps or no caps?
My guess is that you are not, and are using Windows. Windows, I believe, cares not what the case of the file is. I assume that after some time, the old pages will just disappear from the index after a "deep bot" data update, but who knows if the "deep bot" idea still exists. I would also make sure that all of your links are not pointing to the mixed case urls.
So, make sure that the mixed case urls are gone from all pages and wait for Google to do a deep crawl and major update.
For anyone else out there who has never heard of 'mod-speling" here is an explanation:
Apache Reference: mod_speling
mod_speling - URL Spelling Correction
Since Apache 1.3, src/modules/standard/mod_speling.c
Alexei Kosut, Martin Kraemer (1997)
mod_speling is a very handy module. It corrects minor spelling or capitalization errors in URLs --- indeed its droll name provides a hint as to its task. The module addresses this problem by trying to find a matching document, even after all other modules (such as mod_alias, mod_rewrite, or mod_userdir) give up. It works by comparing each document name in the requested directory against the requested document name without regard to case, allowing a maximum of one misspelling (character insertion, omission, transposition, or wrong character). The drawback of this nice feature is that the complicated disk I/O usually increases the response time. Often, it is a better choice to force the user to fix the reference.
All you need is this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>Page has moved -- it uses all lowercase for the URL</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="Content-Language" content="en-gb">
<meta name="Description" content="Page has moved -- it uses all lowercase for the URL.">
<meta name="robots" content="noindex,follow"></head>
<body><p>Page has moved -- it uses all lowercase for the URL</p></body></html>
You could also include a clickable link back to the site homepage in the body of that file. I wouldn't use a meta redirect though.
The noindex tag will see the page drop out of the Google listings in about 6 to 8 weeks I would think.