aakk9999 - 3:19 pm on Mar 16, 2011 (gmt 0)
But noindex pages are a different case.Google need to crawl them to see the noindex meta tag.
I know that, and I know this will still impact crawl budget (i.e. not recover it significantly, as per Tedsteer's reply above.
My question was with what I called "URLs TODO" list.
For those pages that return 404 or 410, I don't see google crawling them as they don't exist anymore.So, they too wouldn't remain on the to-do list.
I am not sure the above is correct because I am seeing some URLs that return 404 being requested for a very long time. It seems that if there is external link to them, they will be re-requested "forever". Without any links pointing to them, I am not sure how long would they be kept in "TODO" list.
I know they can be blocked by robots.txt, the questions are - will they drop off "TODO" list if there is no any links being pointed to them, and if so, what is the best way, robots.txt, returning 404, returning 410 or returning 301 ?
I have to say it makes me a bit uncomfortable to se nnn URLs blocked by robots.txt in WMT!