Dealing with a site that got indexed on both HTTP and HTTPS pages with duplicate content.
So we made HTTPS pages have the meta tag for noindex.
<meta name="robots" content="noindex" />
Except nearly a year later, google still retains those pages.
Google says NOT to use the removal tool to remove https pages
http://support.google.com/webmasters/answer/1269119 [support.google.com]
(at the bottom)
but it wouldn't be an option anyway with 40,000 pages.
Is this possibly because they are orphaned pages as google sees the "noindex" on the parent pages and stops following? Doesn't google visit orphaned pages sooner or later anyway?