homepage Welcome to WebmasterWorld Guest from 54.166.110.222
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 58 message thread spans 2 pages: 58 ( [1] 2 > >     
Feb Crawl Has Started
The 216's have started
peterdaly




msg:111129
 1:36 am on Feb 6, 2003 (gmt 0)

While discussion has started in some other misc threads, the February crawl is underway. Here is an organized place to discuss it more.

I have one site with googlebot requests from 216.239.46.*

-Pete

 

jdMorgan




msg:111130
 1:43 am on Feb 6, 2003 (gmt 0)

Yes,

Deepcrawler from 216.239.46.* on two of my sites here, too. For reference, both are currently PR5.

Jim

Krapulator




msg:111131
 1:45 am on Feb 6, 2003 (gmt 0)

Yeah, Ive just seen the little blighter! Grabbed robots.txt and a few pages from my root directory.

jimh009




msg:111132
 2:10 am on Feb 6, 2003 (gmt 0)

Good! I was getting worried that I might be missed. Normally, the deep crawl on my site begins right on the first of the month.

Hopefully I'll see the bugger in my logs tomorrow.

Jesse_Smith




msg:111133
 2:34 am on Feb 6, 2003 (gmt 0)

Does it start out slow? So far I've only seen one listing in one of my logs (two year old site), out of 10 domains (nine two month old sites). How long does the deep crawl last?

Bio4ce




msg:111134
 3:48 am on Feb 6, 2003 (gmt 0)

Could it be a proxy? Just got hit with 216* and 64*.

amznVibe




msg:111135
 3:53 am on Feb 6, 2003 (gmt 0)

crawl4.googlebot.com (216.239.46.104) is on one of my sites right now

interesting thing is that the site was down for 30 minutes just before the bot showed up... hope that doesn't hurt anything...

it also didn't ask for robots.txt, I guess its using a cached version from freshbot which was on the site 12 hours earlier?

troels nybo nielsen




msg:111136
 4:11 am on Feb 6, 2003 (gmt 0)

Me too, PR5 sites.

Welcome to WebmasterWorld, Jesse. On my sites deep crawl usually starts with rather few hits the first day and then come back and finish the job the next day. I have heard about websites where it may take longer than that.

eljefe3




msg:111137
 4:31 am on Feb 6, 2003 (gmt 0)

Only that that I have to add of interest is that my flash file was grabbed for the first time. Normally it's just the pages themselves that get picked up.

Stefan




msg:111138
 4:34 am on Feb 6, 2003 (gmt 0)

Yep, it's underway here too... not many hits yet, but the freshbot is grabbing a lot right now as well. Googlebots everywhere...

johnraphone




msg:111139
 4:37 am on Feb 6, 2003 (gmt 0)

crawl7.googlebot.com (216 domain) is crawling my PR 6 web site.

GeorgeGG




msg:111140
 4:49 am on Feb 6, 2003 (gmt 0)

216.239.46.*, 1 personal web site, 3 PR6 pages and 1 PR5 page.

GeorgeGG

[edited by: GeorgeGG at 5:40 am (utc) on Feb. 6, 2003]

quotations




msg:111141
 5:35 am on Feb 6, 2003 (gmt 0)

Two of my very different PR6 sites got hit about an hour apart.

Host: 216.239.46.19 Url: /

Only the top page was taken and there was no interest in robots.txt here either.

Stefan




msg:111142
 2:12 pm on Feb 6, 2003 (gmt 0)

The bots were having a party at my site overnight. Freshie took 90% of the pages the first while, with a few hits from 216.239, then 216.239 took over and has been staying very busy. Ink dropped by for a few rounds to check out the action and is still lurking about, and a bizarro-bot that disguises itself and doesn't look for a robots.txt crawled the entire place.
Welcome back deepbot, freshie will show you where the bar is. Help yourself to the pretzels.

[edited by: Stefan at 3:35 pm (utc) on Feb. 6, 2003]

uber_boy




msg:111143
 2:27 pm on Feb 6, 2003 (gmt 0)

My site's a PR7 that usually has about 50-150k pages read in during the deep crawl. Several of the deep crawl bots finally started visiting last night, albeit at a rather slow pace. Whereas they usually gobble up pages at a rate measured in thousands/hour, for the past 12 hours it's been a figure in the hundreds/hour. As noted above, the patter is for the deep crawl to start off slow and then ramp up but, if memory serves me correctly, this is a little slower than usual.

coolshop




msg:111144
 2:31 pm on Feb 6, 2003 (gmt 0)

uber_boy, I would agree with your observation. This deep crawl seems to not be picking up steam like it usually does -- yet.

amznVibe




msg:111145
 2:55 pm on Feb 6, 2003 (gmt 0)

perhaps they are testing some changes and seeing how the algorithms react by doing light reads only for now

taxpod




msg:111146
 2:57 pm on Feb 6, 2003 (gmt 0)

Usually it starts out slow for me, then finds the right pace, then slows down again near the end. But it does seem pretty darn slow right now.

Albaba




msg:111147
 3:00 pm on Feb 6, 2003 (gmt 0)

Yeahh...googlebot slow... :(
now I'am doubt that she will crawl all my new site pages (150k pages)...hope google staff give more power to her :)

peterdaly




msg:111148
 6:14 pm on Feb 6, 2003 (gmt 0)

I had 89 pages deep crawled yesterday (2/5) between 7:25PM US Eastern Time and 8:27PM.

Requested one page so far today (2/6) at 8:39am. Not a sign since.

This is a PR5 site with currently over 50k pages in the index. This makes me nervous.

-Pete

EquityMind




msg:111149
 6:16 pm on Feb 6, 2003 (gmt 0)

Little bit early this month.......

amznVibe




msg:111150
 8:14 pm on Feb 6, 2003 (gmt 0)

I'm getting really crazy "touch-and-go" action from the 216.239.* range. Seems like they are using a different crawler to read different pages on one site, sporadically all day... what's the chance that Google has switched the functions of the 216.239 ranges and 64.68 ranges. Either that or some kind of nettique where they are spreading out the bandwidth load over the day?

SubZeroGTS




msg:111151
 8:43 pm on Feb 6, 2003 (gmt 0)

if google got no response from my server earlier today, is it guaranteed to try at least once more sometime?

taxpod




msg:111152
 8:47 pm on Feb 6, 2003 (gmt 0)

SubZeroGTS,

I would think so. Gbot is persistent. Give it time.

WindSun




msg:111153
 10:05 pm on Feb 6, 2003 (gmt 0)

Hmm... Got a bunch of picture crawls, only a few page crawls so far.

Craig_F




msg:111154
 11:31 pm on Feb 6, 2003 (gmt 0)

So, I'm not too late? I usually don't pay attention to the full crawl, but I had 20 pages mostly done, so I uploaded them. I'll tidy up over the next few days.

Stefan




msg:111155
 12:48 am on Feb 7, 2003 (gmt 0)

My site has only about 60 pages, but it's a PR6, and deepbot has seen them all since 02:00 UTC, Feb 06. When I saw it starting, I put up some more pages fast and then it picked them up a few hours later. Freshbot was working away as well and I have Feb 5 tags showing for a lot of pages. Google rocks.

<added>Webmasterworld rocks too.</added>

Jesse_Smith




msg:111156
 1:43 am on Feb 7, 2003 (gmt 0)

About how many times does it show up in logs? So far I've only seen this show up once on each of my domains.

216.239.46.204 - - [05/Feb/2003:18:50:39 -0800] "GET /robots.txt HTTP/1.0" 404 645 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
216.239.46.204 - - [05/Feb/2003:18:50:39 -0800] "GET / HTTP/1.0" 200 65291 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

About how long does it spend doing deep crawling?

SubZeroGTS




msg:111157
 2:37 am on Feb 7, 2003 (gmt 0)

GOOGLEBOT CAME BACK! AHAHHAHAFGHAHSdjasoidisdjs;'lk'dfh

thank you God.

Stefan




msg:111158
 2:52 am on Feb 7, 2003 (gmt 0)

In early Jan it went for about 10 days compared to other crawls that were only 3 days. Who knows how long it will go this time. It's early days, and it found you, so it will probably be back to get everything else.

It will show a log entry for every page linked by the time it's done. It might crawl through a few times.

This 58 message thread spans 2 pages: 58 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved