Forum Moderators: Robert Charlton & goodroi
I now find myself with a number of 'duplicate' content URLs that have been picked up by the search engines (mostly google due to its rabid spidering). Although I've changed the URLs, in many cases months ago, the old ones still appear in the index.
Sometimes I made mistakes but the main issue is that after employing redirecting using 301s and internal rewriting many of the old URLs still exist.
How do I get rid of them? I'm sure google is penalising me for so-called duplicate content.
They are dynamic URLs in that content is provided by a database into a standard template for each product.
I've seen people talk about using robots noindex etc then go to google and use the remove URL request but that process appears to apply to single URLs. What should I do?
Google won't get rid of them because it says they are functioning even though many have no content but just an outer template.
These pages came back after the 6-month removal was over and I was not able to delete them again using url-removal. (besides, the url-removal doesn't really remove them anyway).
Solution:
I'm using the new sitemap G feature. I hand coded the none existent pages into the site map and within 72 hours they are totally gone from my site: command.
I've recently submitted a full correct sitemap and was waiting for things to settle before submitting a sitemap full of old links. I didn't want to be the guinea pig for such an experiment.
Doing a site:url in google switches between two sets of results, one with tons of old wrong links (including http:// example.co.uk) and one with pretty much just the sitemap submitted URLs. Which will it settle on?
I think I might try your method to flush out the remaining few duff pages after this fiasco has settled.