homepage Welcome to WebmasterWorld Guest from 54.226.180.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Heavy GoogleBot Attack?
Amazing bandwidth devoured on 2 sites
cmendla




msg:766321
 4:49 am on Aug 14, 2005 (gmt 0)

I got an admin email from a site that I have kind of simmering on the back burner. The site gets about a 100 visitors/day and 5meg bandwidth/day.

From the 8th on it has risen to 17,000 pages. (No I haven't been farked or drudged as far as I know) I haven't seen a corresponding increase in the adsense page views either. So, it's either the googlebot, referrer spam or some type of hack.

I looked a bit further and found that the top 3 googlebot ips chewed up over a gig of bandwidth in about 3 days... This is on a site that averages only a quarter gig of bandwidth each month.

I have phpbb running on the site and it looks like that was the hardest hit. The strange thing is that there aren't a whole lot of posts since the board was down and I just put it back up.

I noticed similar behavior in a site that I had and just added a whole bunch of content to. That site did not have phpbb.

Don't get me wrong, I'm happy to see the googlebot because it usually means that the pages are getting indexed. However, this type of bandwidth will start shutting sites down or cost the owners for bandwidth overages.

I'd appreciate any thoughts...

thanks

 

goodroi




msg:766322
 2:29 pm on Aug 15, 2005 (gmt 0)

Did you submit a sitemap, get a crazy number of new inbound links, new 301/302 redirects, accidental spider trap or anything else? I've seen the speed of Googlebot historically fluctuate but an increase of this size is unusual.

cmendla




msg:766323
 10:31 pm on Aug 15, 2005 (gmt 0)

No site map, and no increase in backlinks that I know of.

I'm taking a closer look at the second site that is on another host on a totally different subject. For example a google ip ending in .1, which I assume is one of the googlebots visited 10 times, grabbed over 27,000 files (i don't think there are 27000 files on the site) and chewed a little over half a gig of bandwidth. Another one with an IP ending in 168 visited 6 times, grabbed a 13000 files and used a third of a gig of bandwidth.

I have some other sites that are similar and I have not seen the same behavior on them yet.

Some common factors
- Both are registered with the same registrar

- Both are running phpbb

- Both have an apache mod rewrite that changes the urls in phpbb to a more search engine friendly url (I forget all the details since it's been a while since I did that)

- Both are running adsense

Anyway, the curve on the hits seems to be a little bell shaped. I'm hoping it's declining a bit.

I'll keep watching WebmasterWorld to see if anyone else is running into this.

thanks

C

victor




msg:766324
 6:03 am on Aug 16, 2005 (gmt 0)

Whole thread on the subject.

[webmasterworld.com...]

My advice is msg 87.

Months later, Google started pounding the site again. Again, I emailed them telling them to behave or be banned; and that time I asked for compensation for the squandered bandwidth.

They didn't pay anything, but they have been well behaved since.

jomaxx




msg:766325
 8:27 am on Aug 16, 2005 (gmt 0)

Read this thread before you consider contacting Google on this matter. IMO bandwidth is cheap; invisibility is expensive.

[webmasterworld.com ]

cmendla




msg:766326
 10:20 pm on Aug 17, 2005 (gmt 0)

jomaxx - I agree. I'm reluctant to rattle any cages because I want the googlebot there. However I want it to behave a little.

The problem is definitely with the phpbb area. It just came in and made about 100 requests for discussions/viewtops.11.01.11.html plus a session ID. There is no such thing as discussions/viewtops that I'm aware of. There is a discussion/viewtopics.

I read about some mods that will remove the session IDs for the googlebot but there are some issues with them.

I have a couple of choices now

1. A nice email to google asking them to take a look at this

2. Modify robots text to ban googlebot from the forums. They were just starting to get traffic but there are only a hundred messages or less in the entire board. It's not worth it since it seems the googlebot is kind of locked up.

3. Apply the sessions mod to phpbb.

4. Let it keep running and hope it quits before I have to mortgage the house to pay for the bandwidth.

I have been trying to tweak the robots.txt. I'm going to add a disallow on the discussions/viewtops. I'm hoping that won't hurt anything.

Thanks
c

jomaxx




msg:766327
 1:44 am on Aug 18, 2005 (gmt 0)

Lose the session ID. It's generating an infinite number of URLs for Google to spider.

If there's not much indexed yet, you might also want to use the robots.txt file to bar spiders from crawling the non-rewritten URLs.

kentuckyslone




msg:766328
 2:24 am on Aug 18, 2005 (gmt 0)

Im not running any kind of forum at all. As far as php goes the only thing I am using are a few includes. Also, most all the pages end in ".php". (The same as last month with much lower google hits)

The number of hits from googles bot is up to 37,364 and 1.2 Gigs of banwidth and that is just for this month so far! If the trend continues I will have expended over 2 Gigs BW - just for this - by the time the month is over.

Also I'm not using any session IDs

I generally add several pages to the site every week. But I have done the same in previous months.

I am going to check my google alerts for inbound links to see if I can match any date to what has happened but I doubt that is the answer.

cmendla




msg:766329
 2:42 am on Aug 18, 2005 (gmt 0)

Jomaxx

Thanks - I just patched the phpbb code on three sites. That should take care of things. So far it looks good.

Thanks Again
c

cmendla




msg:766330
 10:36 pm on Aug 18, 2005 (gmt 0)

Just an update to my original post..

I checked out a third site that was running phpbb and this was getting killed also. I had tried a couple of different phpbb mods to stop the session id for googlebot but they don't appear to be working.

Since most of the traffic all three sites comes from the regular content pages and not the phpbb boards, I decided to block googlebot from the discussions via robots.txt for the time being.

If you are running phpbb you might want to take a look at your logs. I found that the entry and exit pages had all gone from the regular content to the discussion pages. That was the effect of the googlebot hammering.

Again, I realize that banning the bot from the discussions will make them probably drop from the google index but it's better than running out of bandwidth on all the sites and losing all of the other search engines when the site disappears. Also, I think the bot activity was slowing down the regualar visitors to the site causing a little bit of a traffic drop.

cg

Big_Gig




msg:766331
 11:54 pm on Aug 18, 2005 (gmt 0)

Hey guys,

You might have luck with this code in your robots.txt file. I was having some similiar issues with BecomeBot, once I added this code, it fixed it. GoogleBot should follow this command as well.


User-agent: BecomeBot
Crawl-Delay: 30

cmendla




msg:766332
 11:34 am on Aug 19, 2005 (gmt 0)

Big_Gig

I dont think crawl delay will work with googlebot.

Everything I've read says that Googlehot doe not honor the crawl delay statement.

cg

Andy_r




msg:766333
 12:34 am on Aug 21, 2005 (gmt 0)

I think this behavior is indicative of a deep crawl going on right now. In the past when I have seen this activity, it seems that a dance was coming soon. I have seen other metrics on my sites that also suggest an index update might be on the horizon. Anyone else seeing signs of an update coming soon?

Borek




msg:766334
 1:00 pm on Aug 21, 2005 (gmt 0)

On my site I have a script generating dynamically some funny/stupid content (recurence - see recurence type joke). URLs take form example.com/something.cgi?cacbbabc and so on. Up to August 8th Googlebot have not tried to spider the script. Since August 8th it started to crawl deeper and deeper, to the point I was forced to modify the script, to force end of crawling.

Perhaps Gbot code was modified and it is able to crawl dynamically generated pages much better then before? IIRC phpbb forums using ID's in the url were not spidered up to now (or at least were not always spidered - I know pretty old sites with high PR, no robots.txt, no nofollow tags, where phpbb is used and forum is not indexed).

Best,
Borek

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved