Welcome to WebmasterWorld Guest from 3.80.4.76

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Blocking JAVA Bots

     
4:11 pm on Oct 6, 2006 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:18
votes: 0


There is a closed thread here:

[webmasterworld.com...]

about blocking java based bad bots but allowing Google and Yahoo java based bots access. I just implemented the rules posted and I am blocking Google bots from newer IP addresses like [64.233.172.35...]

The original code was created by jdMorgan. Have you updated the routine you created in 2005? I sure could use it if you have! Tonerman

4:21 pm on Oct 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Just add another RewriteCond line to allow ^64\.233\.172\.

That should take care of it.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim

12:24 am on Oct 7, 2006 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:18
votes: 0


Thank you for your help jd. We implemented your change. Do you think there are any other ip addresses we need to allow?

Your code sure does kill the java site scrapers! Haven't seen one in the logs since we turned it on this AM. Very kind of you to share it with others. Thanks, Tonerman

2:24 am on Oct 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Do you think there are any other ip addresses we need to allow?

No, as a matter of fact, I appreciate you posting this new IP address range, because I was unaware of it; One effect of blocking common scrapers *seems* to be that the more you block, the fewer attempts your site is subjected to over time. I mean, I think there are target site lists, and once you take action, they take you off the 'easy targets' list, probably because some of the hard targets report their activities to their hosting providers and ISPs... :)

Jim

3:51 pm on Oct 7, 2006 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:18
votes: 0


Jim,
Here are all the Google datacenter IP addresses:
216.239.37.104
216.239.39.104
216.239.53.104
216.239.57.104
216.239.59.104
216.239.63.104
64.233.161.104
64.233.167.104
64.233.171.104
64.233.179.104
64.233.183.104
64.233.185.104
64.233.187.104
64.233.189.104
66.102.11.104
66.102.7.104
66.102.9.104
66.249.89.104
66.249.93.104
72.14.207.104

Tom

10:24 pm on Oct 8, 2006 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:18
votes: 0


Jim:

There arwe 600 total known Google Data Center IP addresses. Using only the first three parts of the address the 600 ip addresses boiled down to the following:

^64\.233\.161\.
^64\.233\.163\.
^64\.233\.167\.
^64\.233\.169\.
^64\.233\.171\.
^64\.233\.179\.
^64\.233\.183\.
^64\.233\.185\.
^64\.233\.187\.
^64\.233\.189\.
^66\.102\.1\.
^66\.102\.7\.
^66\.102\.9\.
^66\.102\.11\.
^66\.249\.81\.
^66\.249\.83\.
^66\.249\.85\.
^66\.249\.89\.
^66\.249\.91\.
^66\.249\.93\.
^72\.14\.203\.
^72\.14\.205\.
^72\.14\.207\.
^72\.14\.209\.
^72\.14\.211\.
^72\.14\.215\.
^72\.14\.217\.
^72\.14\.219\.
^72\.14\.221\.
^72\.14\.223\.
^72\.14\.235\.
^72\.14\.253\.
^216\.239\.37\.
^216\.239\.39\.
^216\.239\.51\.
^216\.239\.53\.
^216\.239\.57\.
^216\.239\.59\.
^216\.239\.63\.

Tom

1:08 pm on Oct 9, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I was really just looking for the ones that make requests using the Java and/or Python UAs, but thanks for the lists.

Jim

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members