homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 58 message thread spans 2 pages: 58 ( [1] 2 > >     
Feb Crawl Has Started
The 216's have started

 1:36 am on Feb 6, 2003 (gmt 0)

While discussion has started in some other misc threads, the February crawl is underway. Here is an organized place to discuss it more.

I have one site with googlebot requests from 216.239.46.*




 1:43 am on Feb 6, 2003 (gmt 0)


Deepcrawler from 216.239.46.* on two of my sites here, too. For reference, both are currently PR5.



 1:45 am on Feb 6, 2003 (gmt 0)

Yeah, Ive just seen the little blighter! Grabbed robots.txt and a few pages from my root directory.


 2:10 am on Feb 6, 2003 (gmt 0)

Good! I was getting worried that I might be missed. Normally, the deep crawl on my site begins right on the first of the month.

Hopefully I'll see the bugger in my logs tomorrow.


 2:34 am on Feb 6, 2003 (gmt 0)

Does it start out slow? So far I've only seen one listing in one of my logs (two year old site), out of 10 domains (nine two month old sites). How long does the deep crawl last?


 3:48 am on Feb 6, 2003 (gmt 0)

Could it be a proxy? Just got hit with 216* and 64*.


 3:53 am on Feb 6, 2003 (gmt 0)

crawl4.googlebot.com ( is on one of my sites right now

interesting thing is that the site was down for 30 minutes just before the bot showed up... hope that doesn't hurt anything...

it also didn't ask for robots.txt, I guess its using a cached version from freshbot which was on the site 12 hours earlier?

troels nybo nielsen

 4:11 am on Feb 6, 2003 (gmt 0)

Me too, PR5 sites.

Welcome to WebmasterWorld, Jesse. On my sites deep crawl usually starts with rather few hits the first day and then come back and finish the job the next day. I have heard about websites where it may take longer than that.


 4:31 am on Feb 6, 2003 (gmt 0)

Only that that I have to add of interest is that my flash file was grabbed for the first time. Normally it's just the pages themselves that get picked up.


 4:34 am on Feb 6, 2003 (gmt 0)

Yep, it's underway here too... not many hits yet, but the freshbot is grabbing a lot right now as well. Googlebots everywhere...


 4:37 am on Feb 6, 2003 (gmt 0)

crawl7.googlebot.com (216 domain) is crawling my PR 6 web site.


 4:49 am on Feb 6, 2003 (gmt 0)

216.239.46.*, 1 personal web site, 3 PR6 pages and 1 PR5 page.


[edited by: GeorgeGG at 5:40 am (utc) on Feb. 6, 2003]


 5:35 am on Feb 6, 2003 (gmt 0)

Two of my very different PR6 sites got hit about an hour apart.

Host: Url: /

Only the top page was taken and there was no interest in robots.txt here either.


 2:12 pm on Feb 6, 2003 (gmt 0)

The bots were having a party at my site overnight. Freshie took 90% of the pages the first while, with a few hits from 216.239, then 216.239 took over and has been staying very busy. Ink dropped by for a few rounds to check out the action and is still lurking about, and a bizarro-bot that disguises itself and doesn't look for a robots.txt crawled the entire place.
Welcome back deepbot, freshie will show you where the bar is. Help yourself to the pretzels.

[edited by: Stefan at 3:35 pm (utc) on Feb. 6, 2003]


 2:27 pm on Feb 6, 2003 (gmt 0)

My site's a PR7 that usually has about 50-150k pages read in during the deep crawl. Several of the deep crawl bots finally started visiting last night, albeit at a rather slow pace. Whereas they usually gobble up pages at a rate measured in thousands/hour, for the past 12 hours it's been a figure in the hundreds/hour. As noted above, the patter is for the deep crawl to start off slow and then ramp up but, if memory serves me correctly, this is a little slower than usual.


 2:31 pm on Feb 6, 2003 (gmt 0)

uber_boy, I would agree with your observation. This deep crawl seems to not be picking up steam like it usually does -- yet.


 2:55 pm on Feb 6, 2003 (gmt 0)

perhaps they are testing some changes and seeing how the algorithms react by doing light reads only for now


 2:57 pm on Feb 6, 2003 (gmt 0)

Usually it starts out slow for me, then finds the right pace, then slows down again near the end. But it does seem pretty darn slow right now.


 3:00 pm on Feb 6, 2003 (gmt 0)

Yeahh...googlebot slow... :(
now I'am doubt that she will crawl all my new site pages (150k pages)...hope google staff give more power to her :)


 6:14 pm on Feb 6, 2003 (gmt 0)

I had 89 pages deep crawled yesterday (2/5) between 7:25PM US Eastern Time and 8:27PM.

Requested one page so far today (2/6) at 8:39am. Not a sign since.

This is a PR5 site with currently over 50k pages in the index. This makes me nervous.



 6:16 pm on Feb 6, 2003 (gmt 0)

Little bit early this month.......


 8:14 pm on Feb 6, 2003 (gmt 0)

I'm getting really crazy "touch-and-go" action from the 216.239.* range. Seems like they are using a different crawler to read different pages on one site, sporadically all day... what's the chance that Google has switched the functions of the 216.239 ranges and 64.68 ranges. Either that or some kind of nettique where they are spreading out the bandwidth load over the day?


 8:43 pm on Feb 6, 2003 (gmt 0)

if google got no response from my server earlier today, is it guaranteed to try at least once more sometime?


 8:47 pm on Feb 6, 2003 (gmt 0)


I would think so. Gbot is persistent. Give it time.


 10:05 pm on Feb 6, 2003 (gmt 0)

Hmm... Got a bunch of picture crawls, only a few page crawls so far.


 11:31 pm on Feb 6, 2003 (gmt 0)

So, I'm not too late? I usually don't pay attention to the full crawl, but I had 20 pages mostly done, so I uploaded them. I'll tidy up over the next few days.


 12:48 am on Feb 7, 2003 (gmt 0)

My site has only about 60 pages, but it's a PR6, and deepbot has seen them all since 02:00 UTC, Feb 06. When I saw it starting, I put up some more pages fast and then it picked them up a few hours later. Freshbot was working away as well and I have Feb 5 tags showing for a lot of pages. Google rocks.

<added>Webmasterworld rocks too.</added>


 1:43 am on Feb 7, 2003 (gmt 0)

About how many times does it show up in logs? So far I've only seen this show up once on each of my domains. - - [05/Feb/2003:18:50:39 -0800] "GET /robots.txt HTTP/1.0" 404 645 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" - - [05/Feb/2003:18:50:39 -0800] "GET / HTTP/1.0" 200 65291 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

About how long does it spend doing deep crawling?


 2:37 am on Feb 7, 2003 (gmt 0)


thank you God.


 2:52 am on Feb 7, 2003 (gmt 0)

In early Jan it went for about 10 days compared to other crawls that were only 3 days. Who knows how long it will go this time. It's early days, and it found you, so it will probably be back to get everything else.

It will show a log entry for every page linked by the time it's done. It might crawl through a few times.

This 58 message thread spans 2 pages: 58 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved