aakk9999 - 4:13 pm on Mar 16, 2011 (gmt 0)
I agree with both of you and this is pretty much what I am seeing.
But my question is whether the long "URLs TODO" list has impact on crawling budget? And if so, how to reduce it?
Or to give an example:
Lets say you made a mistake and exposed thousands dynamic URLs to Google. Now, lets assume that these dynamic URLs have a friendly URL version, and that a mistake was so bad that there are 10-20 dynamic URLs that resolve to the same friendly (I am inventing a really bad case here!). Obviously, you never wanted that G. come across these dynamic URLs, but a mistake has been made and now you need to find the best way to fix it (here comes Shadow's saying in another thread "You cannot uncook the egg..", but lets try to do the best we can here).
Lets assume these URLs have been exposed for a short time, have not gained external links, then you fixed the problem, and they are not any more interlinked from anywhere within the site. However, lets assume they were visible long enough for G. to find them and put them in its "URLs TODO" list for crawling.
If such mistakenly exposed URLs resolve, then G. will want to crawl them, therefore reducing your crawling budget.
a) noindex, follow (will be crawled)
b) noindex, nofollow (will be crawled)
c) set canonical to friendly (will be crawled)
d) stop them via robots.txt (will not be crawled, but you will end up with a long list of URLs stopped by robots in WMT)
e) set up 301 redirect to its friendly version
f) return 404 (not recommended)
g) return 410 Gone
In the cases of a, b, c, they will definetely stay in "TODO" list and will impact crawl budget.
In cases of d, e, f, g these will not be crawled. However, I am wondering:
1)whether the large "URLs TODO" list impacts crawling or impacts site negatively in any way, even if they are stopped from being crawled?
2) whether in any of these cases G. will drop them from "URLs TODO" list eventually?