Welcome to WebmasterWorld Guest from 54.196.153.46

Forum Moderators: open

Message Too Old, No Replies

googlebot gone crazy

website under siege; 250k requests in one day, next day vanished

     
5:38 pm on Oct 30, 2004 (gmt 0)

New User

10+ Year Member

joined:Sept 6, 2002
posts:27
votes: 0


Googlebot behaved yesterday in a completelly irrational way. Googlebot usually “reads” about 1-200 pages a day. My traffic is quite low also: around 250 visitors per day.

Yesterday, a Googlebot/2.1 crawler coming from 66.249.66.205 literally sieged my website. In one day it requested more than 250,000 pages from this website. If I were to calculate the average requests per second, (250,000/(24*3600)), this issues around 2,89 requests per second, which is not that much. However this particular crawler performed batched requests; this means that it paused for minutes, and then ferociously read dozens of webpages again. I hoped it does not exceed a threshold of 10 pages per second; or 20, you would think this would kill any database driven website. My hopes were wrong:

cat my-really-nice-website.com.20041029 ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
47 [29/Oct/2004:21:33:44
44 [29/Oct/2004:08:23:21
41 [29/Oct/2004:21:33:42
41 [29/Oct/2004:08:36:29
41 [29/Oct/2004:08:23:19
40 [29/Oct/2004:08:43:23
40 [29/Oct/2004:08:36:25
40 [29/Oct/2004:08:34:20
39 [29/Oct/2004:10:12:13
39 [29/Oct/2004:09:22:17

As this small measurement shows, there were seconds when the crawler exceeded 40 requests per second; and once issued 47 requests per second

Note: this IP definitelly belongs to Google . There are no new links pointing to my website, nothing new happening there since quite sometime, nothing that I know of that would justify such a rage.

Today everything is calme again; actually Googlebot is completelly vanished from the crawlers list for today.
This never happened in the past, and I guess won’t happen again soon. Any similar experiences?

6:26 pm on Oct 30, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 22, 2000
posts:487
votes: 0


I doubt the guys at google would ever have thier bots running at that rate on purpose. A temporary glitch maybe?
6:51 pm on Oct 30, 2004 (gmt 0)

New User

10+ Year Member

joined:Sept 6, 2002
posts:27
votes: 0


i did email them at webmaster@g.. but i guess they never read that.
7:02 pm on Oct 30, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


BAHAHA. That's nothing :)

196 [29/Oct/2004:11:13:47
193 [29/Oct/2004:11:13:48
192 [29/Oct/2004:11:13:46
184 [29/Oct/2004:18:48:26
182 [29/Oct/2004:18:48:25
178 [29/Oct/2004:11:13:51
160 [29/Oct/2004:18:48:24
160 [29/Oct/2004:18:48:18
158 [29/Oct/2004:11:13:45
156 [29/Oct/2004:11:13:44

Top rate: 196 pages per second.

Total Pages: 400,000+

Beat that!

:D

7:07 pm on Oct 30, 2004 (gmt 0)

New User

10+ Year Member

joined:Sept 6, 2002
posts:27
votes: 0


hi Critter,

my point was not not to beat you or anybody in numbers:) My website traffic is really low anyway.

I was signalizing an anomaly, and my concerns actually are what is to next. do you know what follows this demential crawl? because as i said, today Googlebot completelly vanished.

does this happen to your website everyday?

7:10 pm on Oct 30, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


Well actually the new bot has been poking around for some time, around two months I guess. Day before last it got some 65,000 pages but nothing substantial so far today.

I would imagine that Google's either doing a full run test of their new bot, or building an new index, or cleaning house on their existing indecies--getting rid of old pages, or (likely) some combination of the three.

I'm also expecting some major changes in the coming weeks.

7:34 pm on Oct 30, 2004 (gmt 0)

New User

10+ Year Member

joined:Sept 6, 2002
posts:27
votes: 0


if this is the new bot, I hope not every webmaster will experience this knock-out crawling series.

are you suggesting that another florida is on its way?
i think they are busy integrating their new acquisitions and don't plan something major in their core area.

11:13 pm on Oct 30, 2004 (gmt 0)

Senior Member

joined:Dec 29, 2003
posts:5428
votes: 0


expect huge changes in the first week of November. I just have this feeling
3:50 am on Oct 31, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 5, 2003
posts:807
votes: 0



you would think this would kill any database driven website.

You don't have session IDs in the URL, right?! Session ID= Possible Google bombs!

5:35 pm on Nov 1, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


It's at it again :)

Looks like it's getting mostly newer URLs this time though, and not quite as ferocious in it's rate, but that may change.

6:20 pm on Nov 1, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 31, 2002
posts:48
votes: 0


Googlebot has pulled 48,000 pages in 13 hours today - might make 100K by the end of the day..... that's about 2 gig of bandwidth on it's own today, probably 4 gig for the whole day - will get expensive if it carries on for very long!
6:28 pm on Nov 1, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


Something big is coming...
6:59 pm on Nov 1, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 9, 2003
posts:29
votes: 0


Total Pages: 500,000+ in 15 hours and counting..i´ve never seen anything like this.

And it´s 98% Google Mozilla bot..been awhile since i saw that one.

7:10 pm on Nov 1, 2004 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9068
votes: 4


uioreanu, if Googlebot is hitting your site too hard and affecting it's operations, you need to email googlebot@google.com rather than the general address.

[google.com...]

9:59 pm on Nov 1, 2004 (gmt 0)

New User

10+ Year Member

joined:Oct 14, 2004
posts:40
votes: 0


I had the same thing and it is the mozilla thing. What does it mean? anybody else had his site crawled the same?
10:40 pm on Nov 1, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 20, 2004
posts:1475
votes: 0


I'm seeing normal (up to 8 hits per second) activity until Oct. 28 with a max of 79 h-p-s, then back to normal.

I'm only seeing a couple of instances of the Mozilla Googlebot, which reports itself as being v.2.1, just like the "original" non-Moz bot.

I DO note, however that with the exception of a Dutch bot, I've had NO requests for my robots.txt files since July. Yahoo's bot is freaking out, Googlebot is freaking out ... no rest for the wicked.

10:48 pm on Nov 1, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 20, 2004
posts:1475
votes: 0


What do you guys think about this ...?

Somebody is spoofing Yahoo and Google bot IPs.

The stuff you are reporting is very similar to a situation I reported about the Yahoo bot, late last week ... hundreds of thousands of hits per day in what looks like a bad programming reaction (same pages over and over, etc.)

All of my log entries for the Yahoo-bot-gone-out-of-control are from one IP address, which is definitely one of the Yahoo bot addresses, but just the one, not a series of IPs like the normal bot uses.

11:06 pm on Nov 1, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 23, 2002
posts:602
votes: 0


Google has been checking at a high rate older 404 pages, from nearly a year ago. Interesting the rate it is going at and also that the age of the database they are using is pre Oct 2003.
12:57 am on Nov 2, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 20, 2002
posts:352
votes: 0


It's easy to say big changes are coming, etc. Happens every year around the beginning of Nov. The reason: Christmas
1:00 am on Nov 2, 2004 (gmt 0)

New User

10+ Year Member

joined:June 22, 2004
posts:9
votes: 0


Funny, I was reading this yesterday and hoping Googlebot might pay me a similar visit since it's been pretty boring on my website lately. (whoops!)

Today I woke up and found an 18mb log file, up from the usually 2-3mb (and this includes *zero* image requests logged, css, js etc) mostly filled with Googlebot requests from 66.249.65.101.

I noticed several points where it was crawling at 30/pages second, which is far too much even if I do want google to crawl my website quickly and completely.

All my URL's are presented as static pages using mod_rewrite, and there's nothing the bot could trip up on. I can't remember the last time Google crawled like this, it's not like them!

1:06 am on Nov 2, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


The number of requests per second that the "new" Googlebot is making is probably some way for Google to determine how optimized/fast/responsive your site is.

I noticed that the connections that Mozilla/Googlebot was making specified a KeepAlive option, so requests were basically filed one after the other on the same connection, rather than being piled up in parallel and overwhelming the server.

Seriously, I see no other way to explain why I was getting 200 requests per second, which is practically the *maximum* amount of serial requests that would be possible to fulfil given the latency from Googlebot to my server (30ms or so). I would imagine that the better your site responds the more apt Google would be to send you more traffic (ie. higher ranking)

[edited by: Critter at 1:08 am (utc) on Nov. 2, 2004]

1:08 am on Nov 2, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 11, 2003
posts:427
votes: 0


I do not think there is anything too abnormal going on. The googlebot comes and goes and it just so happens that today it's very busy. If this trend continues for a few days, then I would say google is up to something. Slurp activity also increased sharply today, which is a strange coincidence.
1:32 am on Nov 2, 2004 (gmt 0)

New User

10+ Year Member

joined:Oct 14, 2004
posts:40
votes: 0


but does anybody know what is the difference between this mozilla bot and the other bot? I mean does it have a different job than the non mozilla?

any comments?

Note that it crawled my site from two different ip

1:46 am on Nov 2, 2004 (gmt 0)

New User

10+ Year Member

joined:Sept 6, 2002
posts:27
votes: 0


again, this is definitelly google. run a whois check on the IP if you want, but this thread is not about halloween Googlebot spoofing crawlers:)

the Google team did reply, here's their answer:
“We are sorry about any excessive strain Google is causing your web servers. If you would like us to slow down the rate at which Googlebot crawls your site, please send us a copy of your most recent weblog that lists Googlebot, and we will pass your request on to our engineers. “

I had very little Googlebot requests for a few days. Then Today I had 30k requests again, at this rate:
% cat access_log ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
40 [01/Nov/2004:18:17:28
39 [01/Nov/2004:18:16:50
36 [01/Nov/2004:18:17:22
35 [01/Nov/2004:19:17:06
35 [01/Nov/2004:19:15:51
35 [01/Nov/2004:18:17:32
35 [01/Nov/2004:18:17:29
35 [01/Nov/2004:18:17:21
34 [01/Nov/2004:19:17:09

I should email them with the access logs. Your growing thread here makes me think my website was by accident one of their first targets; now the crawler targets other websites as well at the same greediness.

what is interesting though is that they don't have throttling built into the new crawler. Or Maybe this is the next generation crawler, adapting its crawling speed to the speed of the website reply.

2:12 am on Nov 2, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


Well I just wish they'd crawl 1000 pages a second with multiple crawlers and get this over with fast. :)

My site is database-driven, but is uber-optimized so it can handle the load...while my competitors, by necessity, are also db-driven but have pages that take 4 to 5 seconds to load. I'm HOPING that the new bot does rank me higher because of the performance disparity.

2:30 am on Nov 2, 2004 (gmt 0)

New User

10+ Year Member

joined:Sept 6, 2002
posts:27
votes: 0


Everyday rumours of new algos pop up, with a reason or without it; I hope this thread won't continue fueling such rumours (because it already started). I mean from a new crawler behaviour to new SERPs there's some way to go

If they would start differentiating the websites based on their speed, as one side effect this would rank much lower any website that is hosted outside US (because of the delays). Actually, since they launched, they never expanded their crawler centers; only the frontend datacenters, so crawler speaking, they are still US-centrics.

3:32 am on Nov 2, 2004 (gmt 0)

New User

10+ Year Member

joined:Oct 14, 2004
posts:40
votes: 0


did anybody whose site been crawled by this mozilla see those crawled pages on google index? or we just have to forget about seeing those pages being indexed?
3:53 am on Nov 2, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


Man, it just started crawling like this on Friday. Be patient.
5:44 am on Nov 2, 2004 (gmt 0)

New User

10+ Year Member

joined:July 24, 2003
posts:39
votes: 0


googlebot crazly crawled my website 20820 pages on Oct 28, 2004.
6:07 pm on Nov 3, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:522
votes: 0


it's getting insane here... 30+ googlebot hits per second from 66.249.65.72 crawl-66-249-65-72.googlebot.com on a highly database driven site.
Critter: it does not make sense what G is doing here... your 200 requests/second would translate into 1/4B+ requests per month... it's easy to serve that through a static setup but for a site with dynamic backend it would result in a DOS attack.
This 35 message thread spans 2 pages: 35
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members