Forum Moderators: open

Message Too Old, No Replies

googlebot gone crazy

website under siege; 250k requests in one day, next day vanished

         

uioreanu

5:38 pm on Oct 30, 2004 (gmt 0)

10+ Year Member



Googlebot behaved yesterday in a completelly irrational way. Googlebot usually “reads” about 1-200 pages a day. My traffic is quite low also: around 250 visitors per day.

Yesterday, a Googlebot/2.1 crawler coming from 66.249.66.205 literally sieged my website. In one day it requested more than 250,000 pages from this website. If I were to calculate the average requests per second, (250,000/(24*3600)), this issues around 2,89 requests per second, which is not that much. However this particular crawler performed batched requests; this means that it paused for minutes, and then ferociously read dozens of webpages again. I hoped it does not exceed a threshold of 10 pages per second; or 20, you would think this would kill any database driven website. My hopes were wrong:

cat my-really-nice-website.com.20041029 ¦ grep Googlebot ¦ awk -F - ‘{print $3}’ ¦ sort ¦ uniq -c ¦ sort -r ¦ head
47 [29/Oct/2004:21:33:44
44 [29/Oct/2004:08:23:21
41 [29/Oct/2004:21:33:42
41 [29/Oct/2004:08:36:29
41 [29/Oct/2004:08:23:19
40 [29/Oct/2004:08:43:23
40 [29/Oct/2004:08:36:25
40 [29/Oct/2004:08:34:20
39 [29/Oct/2004:10:12:13
39 [29/Oct/2004:09:22:17

As this small measurement shows, there were seconds when the crawler exceeded 40 requests per second; and once issued 47 requests per second

Note: this IP definitelly belongs to Google . There are no new links pointing to my website, nothing new happening there since quite sometime, nothing that I know of that would justify such a rage.

Today everything is calme again; actually Googlebot is completelly vanished from the crawlers list for today.
This never happened in the past, and I guess won’t happen again soon. Any similar experiences?

pipster2004

6:17 pm on Nov 3, 2004 (gmt 0)

10+ Year Member



Yup...

I think I hit about 20-30 requests a second, on some rather stressful areas of our site, and has seriously effected overal site performance..

So far 160k pages today!

Certainly shows up the "rather dubious" ASP code of the site...(memory munch in full effect!)

Been frantically trying to streamline code all weekend!

Its large scale war out here....

surfgatinho

7:38 pm on Nov 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have never really tracked the google bots before but since the beginning of this month somehting has been absolutely caning my bandwidth. I'm up to nearly 5gb on a site that gets around 500 UVs a day.
The total bandwidth for last month was 4gb.

Yesterday I got an email from my hosting company warning me my mod_rewrite in the .htaccess file was using 30% of the CPU.
On further investigation the rule causing the problem dealt with a session ID.

All I can conclude is the google bot has been indexing the same page with different session ids and causing havoc.
The reason I think it is the gbot is the prime user of bandwidth is in the 66.249.65.#*$! range.

The site is a Postnuke site using mod_rewrite for friendly URLs. It's a very common fix so I'm wondering if anyone else is experiencing similar problems?

ogletree

8:00 pm on Nov 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I teel you what I put 1500 pages up Monday and they were live and getting hits by Friday. It was an established site. I'm glad Gbot is active.

sasha

8:02 pm on Nov 7, 2004 (gmt 0)

10+ Year Member



I have 'friendly' URLs for coldfusion, but without mod_rewrite. Had 100,000 hits in 1 day.

Still waiting to see any of the spidered pages in the index.

uioreanu

7:36 pm on Nov 8, 2004 (gmt 0)

10+ Year Member



maybe google via googlebot is teaching fellow webmasters a lesson!

Think about this a little; I wouldn’t mind having only fast websites as Google results. Old rule: any website pages should load in under 8 seconds; and 8 secs is already an eternity. New rule: How about 1 second? How about a tenth of a second loading time? Think about how much time in your life you would save if all the websites you visit would obey this rule

This 35 message thread spans 2 pages: 35