Forum Moderators: open

Message Too Old, No Replies

anatomy of deep-crawl

how does it work?

         

flex55

6:07 pm on May 24, 2003 (gmt 0)

10+ Year Member



over the past few months i've looked at the deep-crawl to get a better grip of what's happening with all the googlebot that crawl my site.

i'm trying to figure out how the bots decide how deep to crawl, and, how wide to crawl (ie, if you think of your site as a tree, the bot can crawl to deeper nodes or side nodes).
i'm sure there's some kind of "budget" for pages that the bots have for your site, and that budget is probably dependant on the pr of the entry point to the site (homepage, in most cases). but i'm trying to figure out the pr->pages the bots take relation. can anyone shed some light on it? (ie, if you have pr3, you get to have 100 pages crawled or so...)

also, i've noticed that in some cases when the bots got 500 erors, it kept on trying that page again and again (i got hundreds of googlebot requests to that page that returned 500), and eventually the bots just went away - so i assume that your "budget" of pages is penalized in some way when your site replies with 500 err.
The one thing I didn't get, though, is why did the bots kept on trying to get that page that returned 500- any idea from anyone?

Dayo_UK

10:46 pm on May 24, 2003 (gmt 0)



Hi Flex55,

If we are talking about deepbot - IMO it should follow all links on your pages and all pages will get indexed eventually - I dont think a "Budget" number of pages applies to deepbot, deepbot I suppose may have a limit on how deep it crawls a new site and therefore it may take a couple of updates to have the whole site listed.

For Freshbot then you maybe right that dependant on the PR of the domain or page it may effect how deep little freshie goes on your site and external links - but personally I dont think that freshies activities are solely dependent on a sites PR.

Bit of speculation on my part here though.

Cheers

Dayo

brotherhood of LAN

11:09 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i'm trying to figure out how the bots decide how deep to crawl, and, how wide to crawl (ie, if you think of your site as a tree, the bot can crawl to deeper nodes or side nodes).

For freshbot,deepbot, whatever, this doc might help understand the way they go about grabbing pages.

Effective URL ordering [www-db.stanford.edu]

Google only spiders a fraction of the web so they'll have to prioritise which direction the bot goes.

As a general rule of thumb, if you have lots of pagerank and links to your deeper pages the bot will spider them....

steveb

11:28 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"how does it work?"

It doesn't. Not since at least February.