Uber_SEO - 3:46 pm on Jul 6, 2011 (gmt 0)
I've got another random Panda problem that I'd like to get your advice on. We had a whole bunch of pages that contained duplicate intro copy. For years it was fine, and Google did a great job of ensuring the correct page ranked, but since we've been Pandalized, we've taken the opportunity to clean up. So first up, we removed that duplicate intro copy from all pages apart from the top-level page where it should appear. Following the removal of the dupe content, we realised that the pages were actually pretty weak, so we've just blocked them using the NoIndex meta tag.
This has resulted in some random weirdness. We didn't leave it long enough before implementing the robots.txt file, so this has resulted in Google no longer indexing these pages, but keeping the last copy of those pages (i.e. the ones with the dupe content) in their index. Therefore, if you do a site search looking for dupe content, it's still in Google's index, even though we removed it and blocked those pages over 7 weeks ago.
I'm now really confused as to what to do. We've removed dupe content, and blocked these pages, but potentially this could still be seen as duplicate content if Google are checking their indexed data. Should I reinstate the follow status to allow Google to recrawl these pages without dupe content before blocking them again? Seems couter intuitive, but maybe I just need to flush this dupe content from Google's cache. Thoughts?