homepage Welcome to WebmasterWorld Guest from 54.205.247.203
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
Hundreds of fake Googlebot hits from different IPs
Clinton




msg:4554445
 10:12 pm on Mar 13, 2013 (gmt 0)

Hi.

I run one of the hundreds of stupid "what is my IP" websites out there. For about the past month, I've been receiving about 500 hits per hour, from distinct IP addresses, using a miscapitalized Googlebot user-agent:

Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)

Many of these seem to be from Russia, but enough have unique reverse DNS entries that suggests these sites may be hosting content. I suspect this is a botnet of some sort.

Any ideas on how to proceed?

 

lucy24




msg:4554479
 11:19 pm on Mar 13, 2013 (gmt 0)

Are you asking about the mechanics of blocking them? Or about deeper issues that involve direct contact with the offending sites?

There are two rules that almost all sites should have. Exact wording depends on your server type-- I assumed Apache, but you don't say --and then your chosen method. Apache, for example, generally does it in mod_rewrite but you could also do it in mod_setenvif.

Essential Rule 1:
UA is "Googlebot" (case-sensitive)
IP is not 66.249 or other legitimate G IPs.

Essential Rule 2:
IP is bing (the are lotsx of them)
UA is not bingbot/msnbot (OR: UA is MSIE-anything)

Some of your offending robots can probably be blocked by IP alone. Others need a UA block. In this case it would be "googlebot". Don't know about IIS, but in Apache I believe everything is case-sensitive by default.

Be careful in wording your rule. "Googlebot" is Capitalized, but www dot google dot com is lower case-- and it's contained within the UA string.

Clinton




msg:4554720
 10:59 am on Mar 14, 2013 (gmt 0)

I could trivially block based upon the mis-capitalized User-Agent... I'm more concerned with how to contact the sites using my service that may be compromised!

Any suggestions?

phranque




msg:4554725
 11:04 am on Mar 14, 2013 (gmt 0)

welcome to WebmasterWorld, Clinton!

I'm more concerned with how to contact the sites using my service that may be compromised

are these requests all referred from other sites or are they direct requests?

Clinton




msg:4554726
 11:09 am on Mar 14, 2013 (gmt 0)

Hi phranque,

These are direct requests. Quickly looking at the headers sent in the requests, the Referer header is not set. Neither are most other common HTTP headers set by normal clients (Accept, Accept-Language, Keep-Alive, etc).

These are almost certainly not coming from live humans or normal browsers.

phranque




msg:4554741
 11:52 am on Mar 14, 2013 (gmt 0)

in that case there isn't a "site" to contact.
you could do a whois on the IP address(es) and email the abuse contact for the ISP(s).

lucy24




msg:4554935
 9:16 pm on Mar 14, 2013 (gmt 0)

... but if the IP resolves to anything other than an established human ISP-- meaning that someone's running a bot off their home computer --don't hold your breath waiting for action. Just lock out the whole IP range. And hope you don't get too many from neighborhoods like 91. or 198.* where a /20 counts as a vast block.


* The latest trouble spot is 185. which had barely started being assigned when RIPE slapped down the /30 limit. Ugh.

stucco




msg:4554955
 11:11 pm on Mar 14, 2013 (gmt 0)

Sounds like a botnet to me.

1. Contacting the 'abuse' address for the ISP is one route to take (may or may not get a response). May result in a few hosts getting fixed.

2. You could block their traffic. They would probably just find another service to do the same thing.
2b. They do use other services to do the same thing; you're just seeing some of the traffic.

3. You could return wrong addresses (127.0.0.1, 192.168.1.1, some internal DISA or MILNET reserved address, or cia.gov) -- that might cause some puzzlement for a bit, but same as #2.

Likely these IP responses (from your server) then get posted somewhere else (IRC, p2p network, some other compromised vps servers etc). No single ISP is going to care about it enough to do much than take care of their hosts.

There may be some interest by Interpol or a military/intelligence anti-cyberterrorism organization (or they might be causing it). I would cantact an organization like this... I remember some big botnet just got shut down in the past couple months, you could try whoever was involved with that.

Clinton




msg:4554957
 11:18 pm on Mar 14, 2013 (gmt 0)

Hi stucco,

Thanks for your insight.

1. Contacting the 'abuse' address for the ISP is one route to take (may or may not get a response). May result in a few hosts getting fixed.


1000 uniques over the past day. This won't scale!

There may be some interest by Interpol or a military/intelligence anti-cyberterrorism organization (or they might be causing it). I would cantact an organization like this... I remember some big botnet just got shut down in the past couple months, you could try whoever was involved with that.


I've contacted https://www.team-cymru.org/ about this, they may be able to assit.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved