Welcome to WebmasterWorld Guest from 18.104.22.168
From the 8th on it has risen to 17,000 pages. (No I haven't been farked or drudged as far as I know) I haven't seen a corresponding increase in the adsense page views either. So, it's either the googlebot, referrer spam or some type of hack.
I looked a bit further and found that the top 3 googlebot ips chewed up over a gig of bandwidth in about 3 days... This is on a site that averages only a quarter gig of bandwidth each month.
I have phpbb running on the site and it looks like that was the hardest hit. The strange thing is that there aren't a whole lot of posts since the board was down and I just put it back up.
I noticed similar behavior in a site that I had and just added a whole bunch of content to. That site did not have phpbb.
Don't get me wrong, I'm happy to see the googlebot because it usually means that the pages are getting indexed. However, this type of bandwidth will start shutting sites down or cost the owners for bandwidth overages.
I'd appreciate any thoughts...
I'm taking a closer look at the second site that is on another host on a totally different subject. For example a google ip ending in .1, which I assume is one of the googlebots visited 10 times, grabbed over 27,000 files (i don't think there are 27000 files on the site) and chewed a little over half a gig of bandwidth. Another one with an IP ending in 168 visited 6 times, grabbed a 13000 files and used a third of a gig of bandwidth.
I have some other sites that are similar and I have not seen the same behavior on them yet.
Some common factors
- Both are registered with the same registrar
- Both are running phpbb
- Both have an apache mod rewrite that changes the urls in phpbb to a more search engine friendly url (I forget all the details since it's been a while since I did that)
- Both are running adsense
Anyway, the curve on the hits seems to be a little bell shaped. I'm hoping it's declining a bit.
I'll keep watching WebmasterWorld to see if anyone else is running into this.
My advice is msg 87.
Months later, Google started pounding the site again. Again, I emailed them telling them to behave or be banned; and that time I asked for compensation for the squandered bandwidth.
They didn't pay anything, but they have been well behaved since.
The problem is definitely with the phpbb area. It just came in and made about 100 requests for discussions/viewtops.11.01.11.html plus a session ID. There is no such thing as discussions/viewtops that I'm aware of. There is a discussion/viewtopics.
I read about some mods that will remove the session IDs for the googlebot but there are some issues with them.
I have a couple of choices now
1. A nice email to google asking them to take a look at this
2. Modify robots text to ban googlebot from the forums. They were just starting to get traffic but there are only a hundred messages or less in the entire board. It's not worth it since it seems the googlebot is kind of locked up.
3. Apply the sessions mod to phpbb.
4. Let it keep running and hope it quits before I have to mortgage the house to pay for the bandwidth.
I have been trying to tweak the robots.txt. I'm going to add a disallow on the discussions/viewtops. I'm hoping that won't hurt anything.
The number of hits from googles bot is up to 37,364 and 1.2 Gigs of banwidth and that is just for this month so far! If the trend continues I will have expended over 2 Gigs BW - just for this - by the time the month is over.
Also I'm not using any session IDs
I generally add several pages to the site every week. But I have done the same in previous months.
I am going to check my google alerts for inbound links to see if I can match any date to what has happened but I doubt that is the answer.
I checked out a third site that was running phpbb and this was getting killed also. I had tried a couple of different phpbb mods to stop the session id for googlebot but they don't appear to be working.
Since most of the traffic all three sites comes from the regular content pages and not the phpbb boards, I decided to block googlebot from the discussions via robots.txt for the time being.
If you are running phpbb you might want to take a look at your logs. I found that the entry and exit pages had all gone from the regular content to the discussion pages. That was the effect of the googlebot hammering.
Again, I realize that banning the bot from the discussions will make them probably drop from the google index but it's better than running out of bandwidth on all the sites and losing all of the other search engines when the site disappears. Also, I think the bot activity was slowing down the regualar visitors to the site causing a little bit of a traffic drop.
Perhaps Gbot code was modified and it is able to crawl dynamically generated pages much better then before? IIRC phpbb forums using ID's in the url were not spidered up to now (or at least were not always spidered - I know pretty old sites with high PR, no robots.txt, no nofollow tags, where phpbb is used and forum is not indexed).