I am struggeling with a duplicate content issue caused by a site parameter, I noticed this at a late stage, at this point we already had thousands of these pages indexed by google.
On 11 May, using webmaster tools I added the following rule to robots.txt: Disallow: /*?virtuemart=*
This checked out as valid by robots.txt checker tool in webmaster tools, since then I have seen an increase in blocked URL's listed in webmaster tools(tho not even 10% of total), as expected. However, google keeps adding pages to its index including the 'banned' parameter. Here's an overview:
result: 1 - 10 13.000
result: 1 - 10 13.700
What I fail to understand is, why did google add another 700 pages to its index _after_ I added the Disallow rule?
How do I get rid of all these duplicate pages? Our site only has 629 pages.. I'm lost to why this is happening.
ps: I have added a .htacces rewrite as of 18-may-2009, which strips the url of its parameter (even tho the Disallow rule should keep google from indexing, I'm just trying all I can here..)
[edited by: SyntaxTerror at 11:07 am (utc) on May 22, 2009]
[edited by: goodroi at 12:15 pm (utc) on May 22, 2009]
[edit reason] Please no urls [/edit]