anatomy of deep-crawl

over the past few months i've looked at the deep-crawl to get a better grip of what's happening with all the googlebot that crawl my site.

i'm trying to figure out how the bots decide how deep to crawl, and, how wide to crawl (ie, if you think of your site as a tree, the bot can crawl to deeper nodes or side nodes).
i'm sure there's some kind of "budget" for pages that the bots have for your site, and that budget is probably dependant on the pr of the entry point to the site (homepage, in most cases). but i'm trying to figure out the pr->pages the bots take relation. can anyone shed some light on it? (ie, if you have pr3, you get to have 100 pages crawled or so...)

also, i've noticed that in some cases when the bots got 500 erors, it kept on trying that page again and again (i got hundreds of googlebot requests to that page that returned 500), and eventually the bots just went away - so i assume that your "budget" of pages is penalized in some way when your site replies with 500 err.
The one thing I didn't get, though, is why did the bots kept on trying to get that page that returned 500- any idea from anyone?

anatomy of deep-crawl

how does it work?

flex55

Dayo_UK

brotherhood of LAN

steveb

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week