Welcome to WebmasterWorld Guest from 23.22.140.143

Forum Moderators: open

Message Too Old, No Replies

Feb Crawl Has Started

The 216's have started

     
1:36 am on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 30, 2002
posts:436
votes: 0


While discussion has started in some other misc threads, the February crawl is underway. Here is an organized place to discuss it more.

I have one site with googlebot requests from 216.239.46.*

-Pete

1:43 am on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes,

Deepcrawler from 216.239.46.* on two of my sites here, too. For reference, both are currently PR5.

Jim

1:45 am on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 4, 2002
posts:666
votes: 0


Yeah, Ive just seen the little blighter! Grabbed robots.txt and a few pages from my root directory.
2:10 am on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 24, 2002
posts:485
votes: 5


Good! I was getting worried that I might be missed. Normally, the deep crawl on my site begins right on the first of the month.

Hopefully I'll see the bugger in my logs tomorrow.

2:34 am on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 5, 2003
posts:807
votes: 0


Does it start out slow? So far I've only seen one listing in one of my logs (two year old site), out of 10 domains (nine two month old sites). How long does the deep crawl last?
3:48 am on Feb 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 29, 2002
posts:130
votes: 0


Could it be a proxy? Just got hit with 216* and 64*.
3:53 am on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


crawl4.googlebot.com (216.239.46.104) is on one of my sites right now

interesting thing is that the site was down for 30 minutes just before the bot showed up... hope that doesn't hurt anything...

it also didn't ask for robots.txt, I guess its using a cached version from freshbot which was on the site 12 hours earlier?

4:11 am on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 23, 2002
posts:1052
votes: 0


Me too, PR5 sites.

Welcome to WebmasterWorld, Jesse. On my sites deep crawl usually starts with rather few hits the first day and then come back and finish the job the next day. I have heard about websites where it may take longer than that.

4:31 am on Feb 6, 2003 (gmt 0)

Moderator

WebmasterWorld Administrator 10+ Year Member

joined:July 2, 2000
posts:2454
votes: 0


Only that that I have to add of interest is that my flash file was grabbed for the first time. Normally it's just the pages themselves that get picked up.
4:34 am on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


Yep, it's underway here too... not many hits yet, but the freshbot is grabbing a lot right now as well. Googlebots everywhere...
4:37 am on Feb 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 22, 2002
posts:65
votes: 0


crawl7.googlebot.com (216 domain) is crawling my PR 6 web site.
4:49 am on Feb 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 6, 2002
posts:191
votes: 0


216.239.46.*, 1 personal web site, 3 PR6 pages and 1 PR5 page.

GeorgeGG

[edited by: GeorgeGG at 5:40 am (utc) on Feb. 6, 2003]

5:35 am on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 25, 2002
posts:407
votes: 0


Two of my very different PR6 sites got hit about an hour apart.

Host: 216.239.46.19 Url: /

Only the top page was taken and there was no interest in robots.txt here either.

2:12 pm on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


The bots were having a party at my site overnight. Freshie took 90% of the pages the first while, with a few hits from 216.239, then 216.239 took over and has been staying very busy. Ink dropped by for a few rounds to check out the action and is still lurking about, and a bizarro-bot that disguises itself and doesn't look for a robots.txt crawled the entire place.
Welcome back deepbot, freshie will show you where the bar is. Help yourself to the pretzels.

[edited by: Stefan at 3:35 pm (utc) on Feb. 6, 2003]

2:27 pm on Feb 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 3, 2003
posts:58
votes: 0


My site's a PR7 that usually has about 50-150k pages read in during the deep crawl. Several of the deep crawl bots finally started visiting last night, albeit at a rather slow pace. Whereas they usually gobble up pages at a rate measured in thousands/hour, for the past 12 hours it's been a figure in the hundreds/hour. As noted above, the patter is for the deep crawl to start off slow and then ramp up but, if memory serves me correctly, this is a little slower than usual.
2:31 pm on Feb 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:May 19, 2002
posts:61
votes: 0


uber_boy, I would agree with your observation. This deep crawl seems to not be picking up steam like it usually does -- yet.
2:55 pm on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


perhaps they are testing some changes and seeing how the algorithms react by doing light reads only for now
2:57 pm on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:May 1, 2002
posts:439
votes: 0


Usually it starts out slow for me, then finds the right pace, then slows down again near the end. But it does seem pretty darn slow right now.
3:00 pm on Feb 6, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 6, 2003
posts:22
votes: 0


Yeahh...googlebot slow... :(
now I'am doubt that she will crawl all my new site pages (150k pages)...hope google staff give more power to her :)
6:14 pm on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 30, 2002
posts:436
votes: 0


I had 89 pages deep crawled yesterday (2/5) between 7:25PM US Eastern Time and 8:27PM.

Requested one page so far today (2/6) at 8:39am. Not a sign since.

This is a PR5 site with currently over 50k pages in the index. This makes me nervous.

-Pete

6:16 pm on Feb 6, 2003 (gmt 0)

Preferred Member

joined:Oct 23, 2002
posts:449
votes: 0


Little bit early this month.......
8:14 pm on Feb 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 16, 2002
posts:2010
votes: 0


I'm getting really crazy "touch-and-go" action from the 216.239.* range. Seems like they are using a different crawler to read different pages on one site, sporadically all day... what's the chance that Google has switched the functions of the 216.239 ranges and 64.68 ranges. Either that or some kind of nettique where they are spreading out the bandwidth load over the day?
8:43 pm on Feb 6, 2003 (gmt 0)

New User

10+ Year Member

joined:Jan 24, 2003
posts:40
votes: 0


if google got no response from my server earlier today, is it guaranteed to try at least once more sometime?
8:47 pm on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:May 1, 2002
posts:439
votes: 0


SubZeroGTS,

I would think so. Gbot is persistent. Give it time.

10:05 pm on Feb 6, 2003 (gmt 0)

Full Member

10+ Year Member

joined:May 8, 2002
posts:325
votes: 0


Hmm... Got a bunch of picture crawls, only a few page crawls so far.
11:31 pm on Feb 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 2, 2001
posts:597
votes: 0


So, I'm not too late? I usually don't pay attention to the full crawl, but I had 20 pages mostly done, so I uploaded them. I'll tidy up over the next few days.
12:48 am on Feb 7, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


My site has only about 60 pages, but it's a PR6, and deepbot has seen them all since 02:00 UTC, Feb 06. When I saw it starting, I put up some more pages fast and then it picked them up a few hours later. Freshbot was working away as well and I have Feb 5 tags showing for a lot of pages. Google rocks.

<added>Webmasterworld rocks too.</added>

1:43 am on Feb 7, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 5, 2003
posts:807
votes: 0


About how many times does it show up in logs? So far I've only seen this show up once on each of my domains.

216.239.46.204 - - [05/Feb/2003:18:50:39 -0800] "GET /robots.txt HTTP/1.0" 404 645 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
216.239.46.204 - - [05/Feb/2003:18:50:39 -0800] "GET / HTTP/1.0" 200 65291 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

About how long does it spend doing deep crawling?

2:37 am on Feb 7, 2003 (gmt 0)

New User

10+ Year Member

joined:Jan 24, 2003
posts:40
votes: 0


GOOGLEBOT CAME BACK! AHAHHAHAFGHAHSdjasoidisdjs;'lk'dfh

thank you God.

2:52 am on Feb 7, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


In early Jan it went for about 10 days compared to other crawls that were only 3 days. Who knows how long it will go this time. It's early days, and it found you, so it will probably be back to get everything else.

It will show a log entry for every page linked by the time it's done. It might crawl through a few times.

This 58 message thread spans 2 pages: 58