I'm in the process of blocking thousands of files on a dynamic site with meta robots noindex.
I want these removed from the index as soon as possible.
The site has a few hundred thousand pages indexed, but is not getting crawled to heavily by Google. I have a feeling that it will take Google months/years to hit some of these files and discover the noindex.
The files I'm blocking are all parameters on URLs, so I can't remove them in batches via Google Webmaster Tools, since it is not possible to remove a directory using wildcards.
So:
I thought of creating an XML sitemap with a dump of all the URLs I blocked with noindex, hoping it will speed up the removal process.
Is this a legitimate approach? Will Google actually hit the files and notice the noindex, or will these files (or the entire XML sitemap) just get ignored?
If this is not a valid approach, does anyone have any ideas on how to speed up the removal process?
Would it make sense to create an HTML index to these files instead?