carfac - 3:34 pm on Jul 30, 2010 (gmt 0)
Thank you all again VERY MUCH for your help... and your insight. I admit, I was paniced for a bit.... but I have looked at my site, gone over it, and I suddenly saw a BIG problem. I THINK I can see where to dump a huge number of pages... duplicate pages... but I am not sure how to remove these pages from Google as they are all dynamically generated. That is, even if I remove the links, the page will still be there if google asks for it.
Lets go back to the Animals site analogy. I have a "Lion" main page, with a point list of facts... and three sub sections.... lets say Images, Coloring and Habitat. Now coloring and habitat are large text fields, so on the main page, the first 200 characters are displayed, with a link to the "Coloring" and "Habitat" sub page for the full summery. I probably have something written up for 30-40 percent of each of these fields.
So my first inclination was to combine "Coloring" and "Habitat" into one page... but then (epifany!) I saw that when I had neither a "Coloring" or "Habitat" page, both those pages were almost exactly the same! ANd this was probably 150K pages... maybe 1/4 or more of the total number of my pages in Google. If I can loose these worthless, empty pages (all of which duplicate each other), I think I would go a LONG way to restoring link juice across the site.
So my idea is to remove links from pages that do not have a "Coloring" or "Habitat" information. Easy enough. The tough question is how to get these pages out of Google. The paging scheme is all done through Mod_Rewrite of dynamic links... so even if I remove the actual page link to the "Coloring" and "Habitat" pages from my site, Google will still see the page if it asks for it out of its own existing database of my site.
So do I rename the "Coloring" and "Habitat" links to something like "Colors" and "Habitats" and then block the "Coloring" and "Habitat" pages from Robots.txt? What is the proper way to get these out of the index (keeping in mind that since they are all dynamic they will still be made if asked for, even if not linked to)?