partyark - 4:09 pm on Jun 11, 2013 (gmt 0)
They're being served with a 410 http response; when they're crawled they end up in Crawl Errors with a 410. So that's working ok.
Returning a 410 (or 404, or 301) does do what you'd expect: the page no longer shows up, and perhaps some of it's 'juice' flows to where you want it to if you're 301'ing. Except it's not the whole story...
What I know so far:
RemoveURL's documentation is misleading. It says that you should back up any RemoveURL request with any or all of a) a robots.txt exclusion; b) a 410/404; c) a meta NoIndex. However, if you do a) your page will just be added to the Uncrawlable stack. It will go from the results, but Google will say "I can't see this page any more, so I'll assume it's still linking to whatever it did". If you do b) Google says "Hey this page is GONE I better remove any links, but I'll still hang on to the page content just in case."
So how do I encourage Google to go and look again at the pages and completely remove them from its index? I've actually got quite a few of these bogus domains (yeah, yeah) so I can do a bit of testing.
What I've done for one bogus domain is to allow crawling through robots.txt, but to continue to return 410s. What I think will happen is that crawling will happen very slowly, and once Google is sure the page is "gone" its links seem to get culled. However, I'm reasonably sure that the page content is still stored, so the dupe issue might not be solved.
On the second domain, I've allowed crawling and all pages now return 200 with a NoIndex, and an empty BODY. I'm hoping that this new, empty content will replace the old stuff, and that the NoIndex will have the effect of removing links.
On both domains I've put in some entries into sitemaps to see if that encourages crawl rate.