Forum Moderators: open

Message Too Old, No Replies

History of the Deepcrawl.

         

Jesse_Smith

4:24 am on Mar 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does any one have a history of the deepcrawls? This is what I've got so far from looking though the board, and I would love to make it as correct as possible, going back as far as possible. If you got a log file with a year or two of info, you can help by making a Googlebot log and Sticky Mail me the URL and I'll look through it getting the dates. 20 megs? That's OK, I'm on DSL. You can make a Googlebot log by entering this in Telnet.

cat /logs/web.log? grep 216.239.46 > /public_html/deep.log

At 'log? grep' the '?' should be the up and down line above the return button with a space before the '¦'.

Here's what I got so far, probably not 100% correct right now.

August 2002: 8/8-?/8 --- 8/26-9/4 Two deepcrawls?
October 2002: 1-12 Two deepcrawls?
November 2002: 3-15
December 2002: 2-?
January 2003: 4-14
Febrary 2003: 6-16
March 2003: 10-20

vitaplease

9:16 am on Mar 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think much depends on the highest Pagerank of your site. Highest Pagerank tends to get deepcrawled first. I think Ciml once posted something to that effect.

Generally its several days after the update, with often a mid-term mini-deepcrawl.

[webmasterworld.com...]
[webmasterworld.com...]

ciml

7:17 pm on Mar 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since the start of October, the start-of-cycle main crawl and mid-cycle partial crawl seem to have been pretty consistent for me and others who've commented.

These below are from a fairly stable high PR6/low PR7 page. I've deleted fetches occurring on the same or subsequent days; that behaviour is presumably due to different Googlebots following links from different pages.

03-Oct-02
11-Oct-02

04-Nov-02
13-Nov-02

02-Dec-02
11-Dec-02

03-Jan-03
12-Jan-03

06-Feb-03
15-Feb-03

10-Mar-03
21-Mar-03

Pretty similar to Jesse's pattern. For those who want to play along, a simple log check like this does the job, eg.
grep Googlebot access* ¦ grep "GET / " ¦ grep 216.239 > googlebot-deep.txt