Forum Moderators: open

Message Too Old, No Replies

Banning Googlebot Spoofers?

         

keyplyr

7:30 pm on Jul 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you ban every IP address that spoofs as Googlebot?

My list is getting quite long, and although I am not noticing a hit in load time, I am wondering if all this is necessary.

My thinking is, that if they are pretending to be something/someone other than what they really are, then they can't be trusted to be a friendly UA and therefore should be banned pro actively.

wilderness

9:14 pm on Jul 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplr,
Have one question and a suggestion!

Are you finding the majority of the ranges from RIPE?

Rather than denying the spoofers?
Why not just attempt the reverse in your rewrites and only allow the genuine googles from specific ranges?

Don

keyplyr

11:53 pm on Jul 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Don. Don't know about RIPE. Don't understand those divisions.

And I've never kept track of the various IP addresses that Googlebot uses. Would Googlebot-Image also use this range?

RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REMOTE_ADDR}!^66\.249\.[6-9][0-9]\.
RewriteRule!^forbidden\.html$ - [F]

Are there other ranges I need to include to allow authentic Googlebot(s)?

wilderness

3:32 am on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Don. Don't know about RIPE. Don't understand those divisions.

keyplr,
RIPE is the registrar for European conutries.
Although there are some vogue ranges in the North American ARIN ranges.

And I've never kept track of the various IP addresses that Googlebot uses. Would Googlebot-Image also use this range?

I had a problem with the google-image last year and have the following recorded:

66.249.66.244 - - [21/Oct/2006:11:48:30 -0700] "GET /robots.txt HTTP/1.1" 403 - "-" "Googlebot-Image/1.0"

Today I have that bot denied as it has gone daft on two occassions and crawled my images even though excluded in robots.txt for EVER.

RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REMOTE_ADDR}!^66\.249\.[6-9][0-9]\.
RewriteRule!^forbidden\.html$ - [F]

Are there other ranges I need to include to allow authentic Googlebot(s)?

Generally, I don't see rogue google bots, likely because I have most non-North American ranges denied access.

Our forum moderator keeps the major SE IP ranges rather current:
[iplists.com...]

The ranges in the google examples are noted by the leading remarks.
Thus I believe the ranges that your using for the actual SE bot are accurate.

Are some rogue spoofers getting through despite requirement of the "Googlebot" from the designated range?

Don

keyplyr

4:25 am on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the suggestion to just allow from authentic Google ranges. I had considered it, just needed a nudge.

Are some rogue spoofers getting through despite requirement of the "Googlebot" from the designated range?

Well, not yet... I just added that rewrite today. So far, so good. I'll keep a close look.

While a good percentage of the spoofing comes from Eastern Europe, I do sell products to Europe and about a third of my traffic is worldwide so I can't just ban those ranges.

And thanks for the link: [iplists.com...] I had forgotten about that list.