Welcome to WebmasterWorld Guest from 54.162.107.231

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Googlebot brings my server down everyday :(

googlebot ddos dos attack

     
7:09 pm on Apr 7, 2007 (gmt 0)

New User

10+ Year Member

joined:Aug 4, 2004
posts:13
votes: 0


I have a Dual Xeon with 4GB of RAM, and, I still have downtime problems. Further investigation showed that the culprit is Googlebot, it seems to grab the whole site at once, generating a huge amount of symetric httpd processes at the same time. From my terminal the load average can reach 50, then I need to reboot the poor server.

I need this web site indexed, but I don't want Google to take the server down everyday. What can I do? Are there special rules in robot.txt that I could use?

Thanks,

7:25 pm on Apr 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Here's what Google suggests:

Q: Googlebot is crawling my site too fast. What can I do?

A: Please contact us with the URL of your site and a detailed description of the problem. Please also include a portion of the weblog that shows Google accesses so we can track down the problem quickly.

Google Webmaster Support [google.com]

8:44 pm on Apr 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


calande,

A dual xeon with 4 GB of ram should be able to handle Google when it comes calling unless you have scripts that use a lot of CPU or a process running that trashes the cache.

8:56 pm on Apr 7, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


I would have thought your server would be able to handle it also.

You could tell google to crawl your site slower, via the google webmaster control (sitemaps).

Also if your using php, you could try some php caching like Xcache, or APC, etc

9:15 pm on Apr 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 25, 2004
posts:2156
votes: 0


Hi,

Googlebot used to bring my main sites and Internet connection down repeatedly ~8 years ago.

1) I sent them an email, and they tweaked things to be less toxic. You can still do the same now AFAIK.

2) In their Webmaster services you can ask the bot to crawl more slowly than it otherwise would. I do that on one of my sites that is really only a fallback, for example.

3) On the grounds that NO one remote entity should be able to bring your site down casually, put in behaviour-based controls that throttle the amount of traffic/load any one remote /24 (Class C) or-similar set of addresses/hosts/users can impose. This will save you from all sorts of other DoS grief too. (Won't help with DDoS, but not much will.)

4) Your system is more powerful than most of mine and yet I survive the Googlebot plus lots of other less well-behaved bots/scrapers/idiots. How? Partly (3) and partly by tuning the site code to keep the costs of most operations down, and cacheing the results of others. What is your normal page generation time? Seconds or milliseconds? If seconds then (a) you'll be irritating humans and (b) you won't keep up with most spiders' demands either.

Rgds

Damon

10:13 pm on Apr 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


What do you have like a billion pages? With that server I'd think you'd be able to survive for quite some time even under a prolonged attack.
10:37 pm on Apr 7, 2007 (gmt 0)

Junior Member

joined:Mar 15, 2007
posts:120
votes: 0


Sign up to webmastertools is my advice and there you can select a slower crawl rate. But to be honest if my hosting couldn't handle googlebot think what happens when you have a few visitors. Your hosting should be your priority to look at not google.
3:10 am on Apr 8, 2007 (gmt 0)

New User

5+ Year Member

joined:Jan 28, 2007
posts:30
votes: 0


Are you sure it's the Googlebot, and not some problem with your site's software or server?

We get nailed constantly by Google's robots, in a tripple whammy, the indexer, the Google News bot (we're a news source), and the AdWords bot. And we're on a shared server at Pair.com with hundreds of other users. No problems at all.

If you're using PHP 3 or some really old, inefficient softare, maybe that's it. Or try caching, such as jpcache.

9:57 pm on Apr 11, 2007 (gmt 0)

New User

10+ Year Member

joined:Aug 4, 2004
posts: 13
votes: 0


Thank you guys. It seems it solved the problem. I don't get these Googlebot deluges of connections at the same time anymore. I asked Google to be lighter with my server.