homepage Welcome to WebmasterWorld Guest from 54.166.111.111
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 35 message thread spans 2 pages: 35 ( [1] 2 > >     
googlebot gone crazy
website under siege; 250k requests in one day, next day vanished
uioreanu




msg:178997
 5:38 pm on Oct 30, 2004 (gmt 0)

Googlebot behaved yesterday in a completelly irrational way. Googlebot usually “reads” about 1-200 pages a day. My traffic is quite low also: around 250 visitors per day.

Yesterday, a Googlebot/2.1 crawler coming from 66.249.66.205 literally sieged my website. In one day it requested more than 250,000 pages from this website. If I were to calculate the average requests per second, (250,000/(24*3600)), this issues around 2,89 requests per second, which is not that much. However this particular crawler performed batched requests; this means that it paused for minutes, and then ferociously read dozens of webpages again. I hoped it does not exceed a threshold of 10 pages per second; or 20, you would think this would kill any database driven website. My hopes were wrong:

cat my-really-nice-website.com.20041029 ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
47 [29/Oct/2004:21:33:44
44 [29/Oct/2004:08:23:21
41 [29/Oct/2004:21:33:42
41 [29/Oct/2004:08:36:29
41 [29/Oct/2004:08:23:19
40 [29/Oct/2004:08:43:23
40 [29/Oct/2004:08:36:25
40 [29/Oct/2004:08:34:20
39 [29/Oct/2004:10:12:13
39 [29/Oct/2004:09:22:17

As this small measurement shows, there were seconds when the crawler exceeded 40 requests per second; and once issued 47 requests per second

Note: this IP definitelly belongs to Google . There are no new links pointing to my website, nothing new happening there since quite sometime, nothing that I know of that would justify such a rage.

Today everything is calme again; actually Googlebot is completelly vanished from the crawlers list for today.
This never happened in the past, and I guess won’t happen again soon. Any similar experiences?

 

JeremyL




msg:178998
 6:26 pm on Oct 30, 2004 (gmt 0)

I doubt the guys at google would ever have thier bots running at that rate on purpose. A temporary glitch maybe?

uioreanu




msg:178999
 6:51 pm on Oct 30, 2004 (gmt 0)

i did email them at webmaster@g.. but i guess they never read that.

Critter




msg:179000
 7:02 pm on Oct 30, 2004 (gmt 0)

BAHAHA. That's nothing :)

196 [29/Oct/2004:11:13:47
193 [29/Oct/2004:11:13:48
192 [29/Oct/2004:11:13:46
184 [29/Oct/2004:18:48:26
182 [29/Oct/2004:18:48:25
178 [29/Oct/2004:11:13:51
160 [29/Oct/2004:18:48:24
160 [29/Oct/2004:18:48:18
158 [29/Oct/2004:11:13:45
156 [29/Oct/2004:11:13:44

Top rate: 196 pages per second.

Total Pages: 400,000+

Beat that!

:D

uioreanu




msg:179001
 7:07 pm on Oct 30, 2004 (gmt 0)

hi Critter,

my point was not not to beat you or anybody in numbers:) My website traffic is really low anyway.

I was signalizing an anomaly, and my concerns actually are what is to next. do you know what follows this demential crawl? because as i said, today Googlebot completelly vanished.

does this happen to your website everyday?

Critter




msg:179002
 7:10 pm on Oct 30, 2004 (gmt 0)

Well actually the new bot has been poking around for some time, around two months I guess. Day before last it got some 65,000 pages but nothing substantial so far today.

I would imagine that Google's either doing a full run test of their new bot, or building an new index, or cleaning house on their existing indecies--getting rid of old pages, or (likely) some combination of the three.

I'm also expecting some major changes in the coming weeks.

uioreanu




msg:179003
 7:34 pm on Oct 30, 2004 (gmt 0)

if this is the new bot, I hope not every webmaster will experience this knock-out crawling series.

are you suggesting that another florida is on its way?
i think they are busy integrating their new acquisitions and don't plan something major in their core area.

walkman




msg:179004
 11:13 pm on Oct 30, 2004 (gmt 0)

expect huge changes in the first week of November. I just have this feeling

Jesse_Smith




msg:179005
 3:50 am on Oct 31, 2004 (gmt 0)


you would think this would kill any database driven website.

You don't have session IDs in the URL, right?! Session ID= Possible Google bombs!

Critter




msg:179006
 5:35 pm on Nov 1, 2004 (gmt 0)

It's at it again :)

Looks like it's getting mostly newer URLs this time though, and not quite as ferocious in it's rate, but that may change.

uksports




msg:179007
 6:20 pm on Nov 1, 2004 (gmt 0)

Googlebot has pulled 48,000 pages in 13 hours today - might make 100K by the end of the day..... that's about 2 gig of bandwidth on it's own today, probably 4 gig for the whole day - will get expensive if it carries on for very long!

Critter




msg:179008
 6:28 pm on Nov 1, 2004 (gmt 0)

Something big is coming...

SEOPutte




msg:179009
 6:59 pm on Nov 1, 2004 (gmt 0)

Total Pages: 500,000+ in 15 hours and counting..i´ve never seen anything like this.

And it´s 98% Google Mozilla bot..been awhile since i saw that one.

encyclo




msg:179010
 7:10 pm on Nov 1, 2004 (gmt 0)

uioreanu, if Googlebot is hitting your site too hard and affecting it's operations, you need to email googlebot@google.com rather than the general address.

[google.com...]

Blackguy




msg:179011
 9:59 pm on Nov 1, 2004 (gmt 0)

I had the same thing and it is the mozilla thing. What does it mean? anybody else had his site crawled the same?

StupidScript




msg:179012
 10:40 pm on Nov 1, 2004 (gmt 0)

I'm seeing normal (up to 8 hits per second) activity until Oct. 28 with a max of 79 h-p-s, then back to normal.

I'm only seeing a couple of instances of the Mozilla Googlebot, which reports itself as being v.2.1, just like the "original" non-Moz bot.

I DO note, however that with the exception of a Dutch bot, I've had NO requests for my robots.txt files since July. Yahoo's bot is freaking out, Googlebot is freaking out ... no rest for the wicked.

StupidScript




msg:179013
 10:48 pm on Nov 1, 2004 (gmt 0)

What do you guys think about this ...?

Somebody is spoofing Yahoo and Google bot IPs.

The stuff you are reporting is very similar to a situation I reported about the Yahoo bot, late last week ... hundreds of thousands of hits per day in what looks like a bad programming reaction (same pages over and over, etc.)

All of my log entries for the Yahoo-bot-gone-out-of-control are from one IP address, which is definitely one of the Yahoo bot addresses, but just the one, not a series of IPs like the normal bot uses.

Visi




msg:179014
 11:06 pm on Nov 1, 2004 (gmt 0)

Google has been checking at a high rate older 404 pages, from nearly a year ago. Interesting the rate it is going at and also that the age of the database they are using is pre Oct 2003.

wellzy




msg:179015
 12:57 am on Nov 2, 2004 (gmt 0)

It's easy to say big changes are coming, etc. Happens every year around the beginning of Nov. The reason: Christmas

Sparrow Nine




msg:179016
 1:00 am on Nov 2, 2004 (gmt 0)

Funny, I was reading this yesterday and hoping Googlebot might pay me a similar visit since it's been pretty boring on my website lately. (whoops!)

Today I woke up and found an 18mb log file, up from the usually 2-3mb (and this includes *zero* image requests logged, css, js etc) mostly filled with Googlebot requests from 66.249.65.101.

I noticed several points where it was crawling at 30/pages second, which is far too much even if I do want google to crawl my website quickly and completely.

All my URL's are presented as static pages using mod_rewrite, and there's nothing the bot could trip up on. I can't remember the last time Google crawled like this, it's not like them!

Critter




msg:179017
 1:06 am on Nov 2, 2004 (gmt 0)

The number of requests per second that the "new" Googlebot is making is probably some way for Google to determine how optimized/fast/responsive your site is.

I noticed that the connections that Mozilla/Googlebot was making specified a KeepAlive option, so requests were basically filed one after the other on the same connection, rather than being piled up in parallel and overwhelming the server.

Seriously, I see no other way to explain why I was getting 200 requests per second, which is practically the *maximum* amount of serial requests that would be possible to fulfil given the latency from Googlebot to my server (30ms or so). I would imagine that the better your site responds the more apt Google would be to send you more traffic (ie. higher ranking)

[edited by: Critter at 1:08 am (utc) on Nov. 2, 2004]

iblaine




msg:179018
 1:08 am on Nov 2, 2004 (gmt 0)

I do not think there is anything too abnormal going on. The googlebot comes and goes and it just so happens that today it's very busy. If this trend continues for a few days, then I would say google is up to something. Slurp activity also increased sharply today, which is a strange coincidence.

Blackguy




msg:179019
 1:32 am on Nov 2, 2004 (gmt 0)

but does anybody know what is the difference between this mozilla bot and the other bot? I mean does it have a different job than the non mozilla?

any comments?

Note that it crawled my site from two different ip

uioreanu




msg:179020
 1:46 am on Nov 2, 2004 (gmt 0)

again, this is definitelly google. run a whois check on the IP if you want, but this thread is not about halloween Googlebot spoofing crawlers:)

the Google team did reply, here's their answer:
“We are sorry about any excessive strain Google is causing your web servers. If you would like us to slow down the rate at which Googlebot crawls your site, please send us a copy of your most recent weblog that lists Googlebot, and we will pass your request on to our engineers. “

I had very little Googlebot requests for a few days. Then Today I had 30k requests again, at this rate:
% cat access_log ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
40 [01/Nov/2004:18:17:28
39 [01/Nov/2004:18:16:50
36 [01/Nov/2004:18:17:22
35 [01/Nov/2004:19:17:06
35 [01/Nov/2004:19:15:51
35 [01/Nov/2004:18:17:32
35 [01/Nov/2004:18:17:29
35 [01/Nov/2004:18:17:21
34 [01/Nov/2004:19:17:09

I should email them with the access logs. Your growing thread here makes me think my website was by accident one of their first targets; now the crawler targets other websites as well at the same greediness.

what is interesting though is that they don't have throttling built into the new crawler. Or Maybe this is the next generation crawler, adapting its crawling speed to the speed of the website reply.

Critter




msg:179021
 2:12 am on Nov 2, 2004 (gmt 0)

Well I just wish they'd crawl 1000 pages a second with multiple crawlers and get this over with fast. :)

My site is database-driven, but is uber-optimized so it can handle the load...while my competitors, by necessity, are also db-driven but have pages that take 4 to 5 seconds to load. I'm HOPING that the new bot does rank me higher because of the performance disparity.

uioreanu




msg:179022
 2:30 am on Nov 2, 2004 (gmt 0)

Everyday rumours of new algos pop up, with a reason or without it; I hope this thread won't continue fueling such rumours (because it already started). I mean from a new crawler behaviour to new SERPs there's some way to go

If they would start differentiating the websites based on their speed, as one side effect this would rank much lower any website that is hosted outside US (because of the delays). Actually, since they launched, they never expanded their crawler centers; only the frontend datacenters, so crawler speaking, they are still US-centrics.

Blackguy




msg:179023
 3:32 am on Nov 2, 2004 (gmt 0)

did anybody whose site been crawled by this mozilla see those crawled pages on google index? or we just have to forget about seeing those pages being indexed?

Critter




msg:179024
 3:53 am on Nov 2, 2004 (gmt 0)

Man, it just started crawling like this on Friday. Be patient.

ericli




msg:179025
 5:44 am on Nov 2, 2004 (gmt 0)

googlebot crazly crawled my website 20820 pages on Oct 28, 2004.

freeflight2




msg:179026
 6:07 pm on Nov 3, 2004 (gmt 0)

it's getting insane here... 30+ googlebot hits per second from 66.249.65.72 crawl-66-249-65-72.googlebot.com on a highly database driven site.
Critter: it does not make sense what G is doing here... your 200 requests/second would translate into 1/4B+ requests per month... it's easy to serve that through a static setup but for a site with dynamic backend it would result in a DOS attack.

This 35 message thread spans 2 pages: 35 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved