googlebot gone crazy - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

googlebot gone crazy

website under siege; 250k requests in one day, next day vanished

1
2
»

uioreanu

5:38 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

Googlebot behaved yesterday in a completelly irrational way. Googlebot usually “reads” about 1-200 pages a day. My traffic is quite low also: around 250 visitors per day.

Yesterday, a Googlebot/2.1 crawler coming from 66.249.66.205 literally sieged my website. In one day it requested more than 250,000 pages from this website. If I were to calculate the average requests per second, (250,000/(24*3600)), this issues around 2,89 requests per second, which is not that much. However this particular crawler performed batched requests; this means that it paused for minutes, and then ferociously read dozens of webpages again. I hoped it does not exceed a threshold of 10 pages per second; or 20, you would think this would kill any database driven website. My hopes were wrong:

cat my-really-nice-website.com.20041029 ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
47 [29/Oct/2004:21:33:44
44 [29/Oct/2004:08:23:21
41 [29/Oct/2004:21:33:42
41 [29/Oct/2004:08:36:29
41 [29/Oct/2004:08:23:19
40 [29/Oct/2004:08:43:23
40 [29/Oct/2004:08:36:25
40 [29/Oct/2004:08:34:20
39 [29/Oct/2004:10:12:13
39 [29/Oct/2004:09:22:17

As this small measurement shows, there were seconds when the crawler exceeded 40 requests per second; and once issued 47 requests per second

Note: this IP definitelly belongs to Google . There are no new links pointing to my website, nothing new happening there since quite sometime, nothing that I know of that would justify such a rage.

Today everything is calme again; actually Googlebot is completelly vanished from the crawlers list for today.
This never happened in the past, and I guess won’t happen again soon. Any similar experiences?

JeremyL

6:26 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

I doubt the guys at google would ever have thier bots running at that rate on purpose. A temporary glitch maybe?

uioreanu

6:51 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

i did email them at webmaster@g.. but i guess they never read that.

Critter

7:02 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

BAHAHA. That's nothing :)

196 [29/Oct/2004:11:13:47
193 [29/Oct/2004:11:13:48
192 [29/Oct/2004:11:13:46
184 [29/Oct/2004:18:48:26
182 [29/Oct/2004:18:48:25
178 [29/Oct/2004:11:13:51
160 [29/Oct/2004:18:48:24
160 [29/Oct/2004:18:48:18
158 [29/Oct/2004:11:13:45
156 [29/Oct/2004:11:13:44

Top rate: 196 pages per second.

Total Pages: 400,000+

Beat that!

:D

uioreanu

7:07 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

hi Critter,

my point was not not to beat you or anybody in numbers:) My website traffic is really low anyway.

I was signalizing an anomaly, and my concerns actually are what is to next. do you know what follows this demential crawl? because as i said, today Googlebot completelly vanished.

does this happen to your website everyday?

Critter

7:10 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

Well actually the new bot has been poking around for some time, around two months I guess. Day before last it got some 65,000 pages but nothing substantial so far today.

I would imagine that Google's either doing a full run test of their new bot, or building an new index, or cleaning house on their existing indecies--getting rid of old pages, or (likely) some combination of the three.

I'm also expecting some major changes in the coming weeks.

uioreanu

7:34 pm on Oct 30, 2004 (gmt 0)

10+ Year Member

if this is the new bot, I hope not every webmaster will experience this knock-out crawling series.

are you suggesting that another florida is on its way?
i think they are busy integrating their new acquisitions and don't plan something major in their core area.

walkman

11:13 pm on Oct 30, 2004 (gmt 0)

expect huge changes in the first week of November. I just have this feeling

Jesse_Smith

3:50 am on Oct 31, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

you would think this would kill any database driven website.

You don't have session IDs in the URL, right?! Session ID= Possible Google bombs!

Critter

5:35 pm on Nov 1, 2004 (gmt 0)

10+ Year Member

It's at it again :)

Looks like it's getting mostly newer URLs this time though, and not quite as ferocious in it's rate, but that may change.

uksports

6:20 pm on Nov 1, 2004 (gmt 0)

10+ Year Member

Googlebot has pulled 48,000 pages in 13 hours today - might make 100K by the end of the day..... that's about 2 gig of bandwidth on it's own today, probably 4 gig for the whole day - will get expensive if it carries on for very long!

Critter

6:28 pm on Nov 1, 2004 (gmt 0)

10+ Year Member

Something big is coming...

SEOPutte

6:59 pm on Nov 1, 2004 (gmt 0)

10+ Year Member

Total Pages: 500,000+ in 15 hours and counting..i´ve never seen anything like this.

And it´s 98% Google Mozilla bot..been awhile since i saw that one.

encyclo

7:10 pm on Nov 1, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

uioreanu, if Googlebot is hitting your site too hard and affecting it's operations, you need to email googlebot@google.com rather than the general address.

[google.com...]

Blackguy

9:59 pm on Nov 1, 2004 (gmt 0)

10+ Year Member

I had the same thing and it is the mozilla thing. What does it mean? anybody else had his site crawled the same?

StupidScript

10:40 pm on Nov 1, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'm seeing normal (up to 8 hits per second) activity until Oct. 28 with a max of 79 h-p-s, then back to normal.

I'm only seeing a couple of instances of the Mozilla Googlebot, which reports itself as being v.2.1, just like the "original" non-Moz bot.

I DO note, however that with the exception of a Dutch bot, I've had NO requests for my robots.txt files since July. Yahoo's bot is freaking out, Googlebot is freaking out ... no rest for the wicked.

StupidScript

10:48 pm on Nov 1, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

What do you guys think about this ...?

Somebody is spoofing Yahoo and Google bot IPs.

The stuff you are reporting is very similar to a situation I reported about the Yahoo bot, late last week ... hundreds of thousands of hits per day in what looks like a bad programming reaction (same pages over and over, etc.)

All of my log entries for the Yahoo-bot-gone-out-of-control are from one IP address, which is definitely one of the Yahoo bot addresses, but just the one, not a series of IPs like the normal bot uses.

Visi

11:06 pm on Nov 1, 2004 (gmt 0)

10+ Year Member

Google has been checking at a high rate older 404 pages, from nearly a year ago. Interesting the rate it is going at and also that the age of the database they are using is pre Oct 2003.

wellzy

12:57 am on Nov 2, 2004 (gmt 0)

10+ Year Member

It's easy to say big changes are coming, etc. Happens every year around the beginning of Nov. The reason: Christmas

Sparrow Nine

1:00 am on Nov 2, 2004 (gmt 0)

10+ Year Member

Funny, I was reading this yesterday and hoping Googlebot might pay me a similar visit since it's been pretty boring on my website lately. (whoops!)

Today I woke up and found an 18mb log file, up from the usually 2-3mb (and this includes *zero* image requests logged, css, js etc) mostly filled with Googlebot requests from 66.249.65.101.

I noticed several points where it was crawling at 30/pages second, which is far too much even if I do want google to crawl my website quickly and completely.

All my URL's are presented as static pages using mod_rewrite, and there's nothing the bot could trip up on. I can't remember the last time Google crawled like this, it's not like them!

Critter

1:06 am on Nov 2, 2004 (gmt 0)

10+ Year Member

The number of requests per second that the "new" Googlebot is making is probably some way for Google to determine how optimized/fast/responsive your site is.

I noticed that the connections that Mozilla/Googlebot was making specified a KeepAlive option, so requests were basically filed one after the other on the same connection, rather than being piled up in parallel and overwhelming the server.

Seriously, I see no other way to explain why I was getting 200 requests per second, which is practically the *maximum* amount of serial requests that would be possible to fulfil given the latency from Googlebot to my server (30ms or so). I would imagine that the better your site responds the more apt Google would be to send you more traffic (ie. higher ranking)

[edited by: Critter at 1:08 am (utc) on Nov. 2, 2004]

iblaine

1:08 am on Nov 2, 2004 (gmt 0)

10+ Year Member

I do not think there is anything too abnormal going on. The googlebot comes and goes and it just so happens that today it's very busy. If this trend continues for a few days, then I would say google is up to something. Slurp activity also increased sharply today, which is a strange coincidence.

Blackguy

1:32 am on Nov 2, 2004 (gmt 0)

10+ Year Member

but does anybody know what is the difference between this mozilla bot and the other bot? I mean does it have a different job than the non mozilla?

any comments?

Note that it crawled my site from two different ip

uioreanu

1:46 am on Nov 2, 2004 (gmt 0)

10+ Year Member

again, this is definitelly google. run a whois check on the IP if you want, but this thread is not about halloween Googlebot spoofing crawlers:)

the Google team did reply, here's their answer:
“We are sorry about any excessive strain Google is causing your web servers. If you would like us to slow down the rate at which Googlebot crawls your site, please send us a copy of your most recent weblog that lists Googlebot, and we will pass your request on to our engineers. “

I had very little Googlebot requests for a few days. Then Today I had 30k requests again, at this rate:
% cat access_log ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
40 [01/Nov/2004:18:17:28
39 [01/Nov/2004:18:16:50
36 [01/Nov/2004:18:17:22
35 [01/Nov/2004:19:17:06
35 [01/Nov/2004:19:15:51
35 [01/Nov/2004:18:17:32
35 [01/Nov/2004:18:17:29
35 [01/Nov/2004:18:17:21
34 [01/Nov/2004:19:17:09

I should email them with the access logs. Your growing thread here makes me think my website was by accident one of their first targets; now the crawler targets other websites as well at the same greediness.

what is interesting though is that they don't have throttling built into the new crawler. Or Maybe this is the next generation crawler, adapting its crawling speed to the speed of the website reply.

Critter

2:12 am on Nov 2, 2004 (gmt 0)

10+ Year Member

Well I just wish they'd crawl 1000 pages a second with multiple crawlers and get this over with fast. :)

My site is database-driven, but is uber-optimized so it can handle the load...while my competitors, by necessity, are also db-driven but have pages that take 4 to 5 seconds to load. I'm HOPING that the new bot does rank me higher because of the performance disparity.

uioreanu

2:30 am on Nov 2, 2004 (gmt 0)

10+ Year Member

Everyday rumours of new algos pop up, with a reason or without it; I hope this thread won't continue fueling such rumours (because it already started). I mean from a new crawler behaviour to new SERPs there's some way to go

If they would start differentiating the websites based on their speed, as one side effect this would rank much lower any website that is hosted outside US (because of the delays). Actually, since they launched, they never expanded their crawler centers; only the frontend datacenters, so crawler speaking, they are still US-centrics.

Blackguy

3:32 am on Nov 2, 2004 (gmt 0)

10+ Year Member

did anybody whose site been crawled by this mozilla see those crawled pages on google index? or we just have to forget about seeing those pages being indexed?

Critter

3:53 am on Nov 2, 2004 (gmt 0)

10+ Year Member

Man, it just started crawling like this on Friday. Be patient.

ericli

5:44 am on Nov 2, 2004 (gmt 0)

10+ Year Member

googlebot crazly crawled my website 20820 pages on Oct 28, 2004.

freeflight2

6:07 pm on Nov 3, 2004 (gmt 0)

10+ Year Member

it's getting insane here... 30+ googlebot hits per second from 66.249.65.72 crawl-66-249-65-72.googlebot.com on a highly database driven site.
Critter: it does not make sense what G is doing here... your 200 requests/second would translate into 1/4B+ requests per month... it's easy to serve that through a static setup but for a site with dynamic backend it would result in a DOS attack.

This 35 message thread spans 2 pages: 35

1
2
»