Forum Moderators: Robert Charlton & goodroi
5646 hits - 1498227Kb - 66.249.65.230
And that was over three days!
Yes the site is dynamic, but the structure has been unchanged for the last 6 months. So why now?
First time, I wrote to them requesting them to rein in their bot or be banned. They wrote back apologising.
Second time, they never replied.
Goole's not been the only bad bot that has hit the sites, so the sites are now protected by flood control....And bot that runs wild gets an automatic 10 minute to 72 hour ban (depends on the degree of wildness) And no exceptions for bots like google.
Any pages they grab while banned are simply a short flood-control note explaining that the bot that indexed the page is behaving badly. At any one time, I can find several of my pages in several search engines by looking for the appropriate phrase from my flood-control text.
This is not of course cloaking -- any user who hits the F5 button enough times will start getting the flood control page.
I do something similar withn RSS feeds too: there are some badly behaved ones that ignore the <skipHours> tag and want to revisit multiple times an hour.
Contractor's got a good point. If your pages really are that large (detailed pictures perhaps?), you might want to consider buying more bandwidth. How many pages of this size do you have (or perhaps you've got one large page that is pushing you over...).
5646 hits - 1498227Kb - 66.249.65.230
And that was over three days!
To be honest: I don't see why this hit rate (one page fetched every 45 seconds) is overloading your site.
For my part i'm quite happy when google is spidering my site heavily, because that's a sign that the site is popular.
The last time google overloaded my site (one year ago) with a rate of 30 pages/second i decided to upgrade my hardware to a cluster (15 machines) to solve this problem.
Every visit of googlebot rises the chance to get more visitors.
If a PHP-site, have a look at this thread for a Content-Negotiation Class [webmasterworld.com]. Easy to implement, and will fix the above problems for good. Some support on the Class is also available on this and following pages [webmasterworld.com].
last time google overloaded my site ... a rate of 30 pages/second ... upgrade my hardware to a cluster (15 machines) to solve this problem.
Any other scenario means that your site is mis-configured (see msg#11) and a few hours work would have fixed the problem for nothing.
i'm quite happy when google is spidering my site heavily, because that's a sign that the site is popular