not2easy - 6:42 am on Sep 17, 2012 (gmt 0)
One of my sites started growing little by little and recently grew again by nearly double the number of pages to about 2,000. When I first started the site I stored images and includes for different areas/departments in their own subdirectories and tucked in a little noindexed index.html file to prevent server listing. When I had nearly twice as many pages all at once I decided to move these pages to the subdirectories where their images and includes had been and it all works fine - but my sitemap script adds the subdirectory folders to the sitemap as: http://www.example.com/subdirectory/ and if that URL is crawled, the dummy index file will show up. As I said, it is meta noindexed, just a useless placeholder page.
I can't prevent the sitemap script from listing the subdirectory without also preventing all pages there from being added to the sitemap. I have not seen any indication in GWT that there is an issue, but almost expect it to be a problem. Each subdirectory has this dummy index.html file, but it also now has an actual named page like http://www.example.com/subdirectory/what-is-here.html that serves the function of linking to the various pages in that subdirectory.
My question: Should I consider a redirect from the index file to the named page or will that get me duplicate content? I could rename the named page to index.html and fix the navigation links with find/replace and just let the named page return a 404. This page has only existed since lat June. All these major changes were done since late June, the page URLs would not change, just the navigation links. There are 3 subdirectories which each contain 3 - 5 sub subdirectories in them. My brain hurts. I am sure that it's all been done before and hope someone who has the experience can share some ideas. I have built a few dozen sites but never one with this kind of structure and if I had had time to spare before doing it this way I would have simply named these pages "index.html" and let them do their job. It is easy to see now what I should have done, but what's the best way to fix it now or should I leave things alone that "aren't broken"?