TheOptimizationIdiot - 3:31 pm on Mar 26, 2013 (gmt 0)
So the general line of recommendation is to actually let the Googlebot in again, just to let it know that the stuff to spider is gone?
To be honest I don't fully understand how deliberately placing a robots.txt and deliberately removing pages from Google's index doesn't convey the same message, namely that the removal was intentional.
Because when you block their access they don't know if anything has been removed or not, because they can't access it. When you let them access the URL and find a "removed" notice, then they know.
It's like if you looked at my site one day, found a page and came back the next and I gave you a message that says "you don't have permission to visit this page", would you know if I removed the page or if it was still there and I just didn't want you to see it any more? There would be no way for you to tell.
A robots.txt block and letting them spider URLs then providing them with a status code that tells them what the current status of the information associated with the URL is (200 OK, 301 Permanently Moved, 302/307 Undefined/Temporarily Redirected, 404 Not Found -- could be temporary or permanent, 403 Forbidden, 410 Gone -- purposely removed permanently, etc.) are totally different things.
If they can't get to a URL (blocked by robots.txt) they can't know what the current status of that URL and the associated information is, but letting them spider makes it so you can tell them the status of each.