joined:June 13, 2006
Not sure if this the right category to post in, as it is both related to webmaster tools and the natural spidering of Google, please move it, if you think it is wrong.
I have a large site that when created 5 years ago, made the mistake of mixing uppcase and lower case letters within all urls which are seprated by directories.
This mish mash of upper and lower caseletters used in the directories and urls meant that as soon as a link went to the wrong sequence of upper case and lower case letters, a new page was formed that was duplicate.
This occurred over many pages in our site that was bearable until Panda came along and took the whole site ranking down, rather than just the pages concerned.
So I got a programmer to add some code that forced all urls to upper case (all 301, no matter what sequence they would be) and then changed the linking structure (menus) thoughout the site to reflect the new upper case version.
The problem with this new menu, was google was no longer finding the old pages to learn to redirect to the new uppercase page, so this process actually made things worse. It was finding the new upper case urls, whilst still keeping the older mish mash.
Now the question
So I started to remove all non upper case directories from webmaster tools (which cleaned the site up), with the site: command you will only see upper case urls now.
I then added as many pages as I could to sitemaps to speed up the process to finding the new uppercase pages.
But rather than google finding new pages, it is in fact showing me that google is REMOVING new upper case pages instead. approx a thousand a day.
This strongly suggests that Google can remvove directories with a mish mash of upper case and lower case letters within webmaster tools (and leave upper case version in the index), but when it comes to NATURAL spidering of the site, it treats any url the same whether it has any sequence of letters.
It sees a sequence of letters is removed in webmaster tools, but even though the url it has visted (naural sidering) is now upper case, it does not define beween the upper case version and mish mash version and removes the upper case version also.
So I am now losing pages through Google just spidering my site, even though the only pages being blocked/redirected are those with a mish mash of letters.
No directory with upper case only letters is being blocked or redirected in any way, either in robots text file, or in .htaccess file. I have checked lots of times.
No upper case page within the index redirects anywhere also.
I can see no reason why google is not increasing the upper case directory pages, rather than losing them.
Thanks for your time 11:41 AM