Msg#: 3274474 posted 11:33 am on Mar 9, 2007 (gmt 0)
They check that they are still gone, almost forever.
They do that just in case one day, they are no longer gone, but instead are republished.
Say you bought a domain name from someone and put up a new site. Say that Google refused to pick up your /about.html and /contact.html pages, and that eventually you found that it was all because once a page went 404 Google refused to ever look at that URL again.
You would think that was a bad policy. That is why it doesn't work like that.
Msg#: 3274474 posted 12:13 pm on Mar 9, 2007 (gmt 0)
g1smd -- what if there are no links to the resource on the Internet? I'd think eventually Google would purge it over time and stop checking for it. They'd then pick it up again cleanly someday down the road when one or more links to it popped up.
Msg#: 3274474 posted 2:56 pm on Mar 9, 2007 (gmt 0)
I have a site with 28 pages. Once we had trouble because of a DOS attack and Gs requests timed out. Although all of the 28 pages are listed in sitemap.xml G still refuses to index half of the pages. Due to a lucky mistake index.html wasn't listed in sitemap.xml. By listing it explicitly, I was able to force it back to the index after a month of it's absence. The rest of the lost pages didn't reappear by now.
To make a long story short: If you want to get pages out of the index, you can't, and if you want them in, you can't either ;-)
Maybe you should try timing them out when requested or returning "500 Internal Server Error". That's what happend here.
You could also try to return "410 Gone" via htaccess: