Did you submit a sitemap, get a crazy number of new inbound links, new 301/302 redirects, accidental spider trap or anything else? I've seen the speed of Googlebot historically fluctuate but an increase of this size is unusual.
No site map, and no increase in backlinks that I know of.
I'm taking a closer look at the second site that is on another host on a totally different subject. For example a google ip ending in .1, which I assume is one of the googlebots visited 10 times, grabbed over 27,000 files (i don't think there are 27000 files on the site) and chewed a little over half a gig of bandwidth. Another one with an IP ending in 168 visited 6 times, grabbed a 13000 files and used a third of a gig of bandwidth.
I have some other sites that are similar and I have not seen the same behavior on them yet.
Some common factors
- Both are registered with the same registrar
- Both are running phpbb
- Both have an apache mod rewrite that changes the urls in phpbb to a more search engine friendly url (I forget all the details since it's been a while since I did that)
- Both are running adsense
Anyway, the curve on the hits seems to be a little bell shaped. I'm hoping it's declining a bit.
I'll keep watching WebmasterWorld to see if anyone else is running into this.
Whole thread on the subject.
My advice is msg 87.
Months later, Google started pounding the site again. Again, I emailed them telling them to behave or be banned; and that time I asked for compensation for the squandered bandwidth.
They didn't pay anything, but they have been well behaved since.
Read this thread before you consider contacting Google on this matter. IMO bandwidth is cheap; invisibility is expensive.
jomaxx - I agree. I'm reluctant to rattle any cages because I want the googlebot there. However I want it to behave a little.
The problem is definitely with the phpbb area. It just came in and made about 100 requests for discussions/viewtops.11.01.11.html plus a session ID. There is no such thing as discussions/viewtops that I'm aware of. There is a discussion/viewtopics.
I read about some mods that will remove the session IDs for the googlebot but there are some issues with them.
I have a couple of choices now
1. A nice email to google asking them to take a look at this
2. Modify robots text to ban googlebot from the forums. They were just starting to get traffic but there are only a hundred messages or less in the entire board. It's not worth it since it seems the googlebot is kind of locked up.
3. Apply the sessions mod to phpbb.
4. Let it keep running and hope it quits before I have to mortgage the house to pay for the bandwidth.
I have been trying to tweak the robots.txt. I'm going to add a disallow on the discussions/viewtops. I'm hoping that won't hurt anything.
Lose the session ID. It's generating an infinite number of URLs for Google to spider.
If there's not much indexed yet, you might also want to use the robots.txt file to bar spiders from crawling the non-rewritten URLs.
Im not running any kind of forum at all. As far as php goes the only thing I am using are a few includes. Also, most all the pages end in ".php". (The same as last month with much lower google hits)
The number of hits from googles bot is up to 37,364 and 1.2 Gigs of banwidth and that is just for this month so far! If the trend continues I will have expended over 2 Gigs BW - just for this - by the time the month is over.
Also I'm not using any session IDs
I generally add several pages to the site every week. But I have done the same in previous months.
I am going to check my google alerts for inbound links to see if I can match any date to what has happened but I doubt that is the answer.
Thanks - I just patched the phpbb code on three sites. That should take care of things. So far it looks good.
Just an update to my original post..
I checked out a third site that was running phpbb and this was getting killed also. I had tried a couple of different phpbb mods to stop the session id for googlebot but they don't appear to be working.
Since most of the traffic all three sites comes from the regular content pages and not the phpbb boards, I decided to block googlebot from the discussions via robots.txt for the time being.
If you are running phpbb you might want to take a look at your logs. I found that the entry and exit pages had all gone from the regular content to the discussion pages. That was the effect of the googlebot hammering.
Again, I realize that banning the bot from the discussions will make them probably drop from the google index but it's better than running out of bandwidth on all the sites and losing all of the other search engines when the site disappears. Also, I think the bot activity was slowing down the regualar visitors to the site causing a little bit of a traffic drop.
You might have luck with this code in your robots.txt file. I was having some similiar issues with BecomeBot, once I added this code, it fixed it. GoogleBot should follow this command as well.
I dont think crawl delay will work with googlebot.
Everything I've read says that Googlehot doe not honor the crawl delay statement.
I think this behavior is indicative of a deep crawl going on right now. In the past when I have seen this activity, it seems that a dance was coming soon. I have seen other metrics on my sites that also suggest an index update might be on the horizon. Anyone else seeing signs of an update coming soon?
On my site I have a script generating dynamically some funny/stupid content (recurence - see recurence type joke). URLs take form example.com/something.cgi?cacbbabc and so on. Up to August 8th Googlebot have not tried to spider the script. Since August 8th it started to crawl deeper and deeper, to the point I was forced to modify the script, to force end of crawling.
Perhaps Gbot code was modified and it is able to crawl dynamically generated pages much better then before? IIRC phpbb forums using ID's in the url were not spidered up to now (or at least were not always spidered - I know pretty old sites with high PR, no robots.txt, no nofollow tags, where phpbb is used and forum is not indexed).