Welcome to WebmasterWorld Guest from 18.207.136.184

Forum Moderators: Ocean10000

Message Too Old, No Replies

Clickbots and other bad unknown crawlers

bad bot blocking issues

     
9:06 pm on Apr 22, 2016 (gmt 0)

Full Member

10+ Year Member

joined:Apr 26, 2009
posts: 286
votes: 6


I have been monitoring and blocking bad crawlers and clickbots for quite some time now. (see the list below)
Recently I have found some IP's which look like they are from the range of official Google Bot IP's according to the following [chceme.info ], but nothing except IP can indicate that they are actual Google Bot IP's. UA does not have usual Googlebot 2.1 indicator.

Please share your thoughts!

Here they are:
66.249.88.161 - GET - HTTP/1.1 - Sunday, February 21st 2016 @ 12:22:03 - Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36
66.249.88.156 - GET - HTTP/1.1 - Sunday, February 21st 2016 @ 12:30:08 - Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36
66.249.88.151 - GET - HTTP/1.1 - Monday, February 22nd 2016 @ 10:08:49 - Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko


I am happy to share the entire list of the blocked IP's, but it is too long to share here unless Webmasterworld's moderators can allow me to do so.
9:42 pm on Apr 22, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Recently I have found some IP's which look like they are from the range of official Google Bot IP's
66.249.88.151, 66.249.88.156 & 66.249.88.161 are all Google Proxy so the hits could be from anyone. Google has numerous ranges, only specific "crawl" ranges are allocated to the various Googlebots. The info source you cited is ambiguous.
1:06 am on Apr 23, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1991
votes: 74


@AlexB77, look at the reverse DNS entry for these IPs:

66.249.88.161 (google-proxy-66-249-88-161.google.com)
66.249.88.156 (google-proxy-66-249-88-156.google.com)
66.249.88.151 (google-proxy-66-249-88-151.google.com)

Those would be proxies, as keyplyr said.

They do fit in between
NetRange: 66.249.64.0 - 66.249.95.255
CIDR: 66.249.64.0/19

That range has been used for crawling for who knows how long... but it also contains proxies, which I personally block and/or captcha them on several sites I run with no problems for a while now. Pretty much same principal as Google uses for TOR IPs.

There was also one thing recently that caught my attention - some hits from 66.249.91.0/24
example: 66.249.91.72 (rate-limited-proxy-66-249-91-72.google.com)

The requests were made for a site owner verification files and "noexist_" variety of the same file, so just to keep that in mind.

Oh and there is a boat load of Google IPs listed here: [bgp.he.net...] look at prefixes, both IPV4 and IPV6.
6:29 am on Apr 23, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15808
votes: 846


range of official Google Bot IP's

Careful. There are two neighboring Google ranges used for entirely different purposes.
66.249.64.0/20 (that is, 64-79) is Google Search. That's where you'll meet the bona fide Googlebot. But
66.249.80.0/20 (that is, 80-95) is an array of other Googloid functions-- Preview, Translate, Snippet, maybe favicon. You need to make individual decisions about how to handle these.

:: idly wondering how Google manages to do all their crawling from a single /20 while That Other Search Engine sprawls over several /16s ::
6:58 am on Apr 23, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4458
votes: 331


:: idly wondering how Google manages to do all their crawling from a single /20 while That Other Search Engine sprawls over several /16s ::



It's the spaces...all those double spaces. ;)
7:45 am on Apr 23, 2016 (gmt 0)

Full Member

10+ Year Member

joined:Apr 26, 2009
posts: 286
votes: 6


Thank you all for your responses.
I do understand that those IP's are Google proxies, but due to their behaviour on our site they fell into a trap, where clicks from googlebot are allowed. So, I will keep them blocked as I do not see any potential benefits coming from them.

Also, it may be a bit ambiguous to say but since I've started blocking this clickbots that bring no benefits to my site Google dramatically reduced the amount of clawbacks in our Adsense revenue.
8:43 am on Apr 25, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Just a FYI - Google proxies are not bad per se, just unaccountable to who exactly it is, but no more so than Comcast or any other ISP. There are many legit companies and/or internet users on Google proxies. The broadband I use is a proxy, and I'm not a bad guy... really :)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members