|Our Crawl Rate Dropped to Near Zero|
| 7:51 pm on Apr 28, 2011 (gmt 0)|
We've been having some issues, and I'd like to see if anything has experienced anything similar:
Firstly we inadvertently starting serving pages as NoIndex,NoFollow. This lead to many pages dropping out of the index. We fixed this issue, and have been waiting for Google to reindex the pages.
Secondly, we've seen many "Unreachable" errors for URLs submitted in our XML sitemaps via WMT. We've also seen many errors related to Crawl Rate (i.e.: "Crawl rate problem
We were not able to download your Sitemap file due to the crawl rate we are using for your server.") I went ahead and increased the crawl rate to max, but I think we are still in the doghouse with regard to crawl budget.
Thirdly, we've seen the activity from Googlebot (via Google WMT) plummet to near zero levels, when we've seen levels as high as 500k a day.
And as a side note, when using the "Fetch as Googlebot" feature, we can't even fetch our homepage (which is still in the index).
Could this be a weird error on the part of Google Webmaster Tools? Or could our Crawl budget have been severely cut due to the previous problems (noindex'd pages, outages). I.e.: Could the "Unreachable" error via "Fetch as Googlebot" tool but an indication that we don't have crawl budget allocated for our site?
I have a suspicion that the rampant resubmission of XML sitemaps following the noindex debacle could have led to a red-flag for crawl budget, but I don't know anything for sure.
Any help would be greatly appreciated, as always.
| 8:20 pm on Apr 28, 2011 (gmt 0)|
"And as a side note, when using the "Fetch as Googlebot" feature, we can't even fetch our homepage (which is still in the index). "
This is a problem! "Fetch as Googlebot" should be working fine, especially for your home page! Fix that, as I think it will reveal other problems you still have...
| 8:43 pm on Apr 28, 2011 (gmt 0)|
for what it is worth, I had no-indexed my home page by accident. Once I realized that and corrected the issue, it took a week for it to reappear in the SERPs.
As for crawling, I am pretty sure it was still crawled regularly even when it was no indexed.
I hope this helps.
| 8:49 pm on Apr 28, 2011 (gmt 0)|
Helpnow: It's a weird problem, our homepage never got de-indexed, but we cannot fetch it, or any other page, via the fetch from googlebot tool. I suspect it's because we've exhausted our crawl budget, but I can't be sure.
Planet13: We were seeing a steady re-indexation, but some of our most heavily linked pages are still de-index'd, and now our crawl rate is down so low that it might take a very long time for them to be crawled again. I'm trying to figure out if we broke something that killed our crawl budget, and if so, can that be fixed.
Thanks for the feedback, anyone else have any similar experiences?
| 9:07 pm on Apr 28, 2011 (gmt 0)|
I presume you can see your site in browser normally?
So if you try to fetch as Googlebot via WMT, what do you get? Do you get the message "Your request was completed successfully" at the top of the screen? And after this, when you refresh the WMT page, what do you see under "Status" column for this fetch request (if it appears there at all). You could get something like "Success", "Not found", "Denied by robots" or similar.
Also, have you checked your robots.txt? Have you tried checking your homepage in WMT under Site Configuration >> Crawler access to make sure it is allowable?
All this failing, have you tried to change user agent to Googlebot and tried to fetch the page via the browser?