Forum Moderators: open

Message Too Old, No Replies

How to block googleusercontent.com

blocking google and googleusercontent in htaccess

         

rogerroger

3:47 am on Apr 5, 2014 (gmt 0)

10+ Year Member



I want to block googleusercontent.com and google overall in my htaccess file. Is there a post or site that lists the ip ranges for the googleusercontent cdn and google's other bots like the translate, plus circle, and related?

I've blocked the hostname itself with:
Deny from *.googleusercontent.com

This is not working, so I think ip ranges will be the most reliable fix, but searching gives me little information. I have been successful in blocking the AWS cdn with ip ranges, but I just can't find similar information for google.

Thanks in advance.

keyplyr

5:50 am on Apr 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Only allow the Googlebot IP range, blocking anything else containing "Google" or "google" anywhere in the User Agent string:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Google [NC]
RewriteCond %{REMOTE_ADDR} !^66\.249\.[6-9][0-9]\.
RewriteRule !^robots\.txt$ - [F]


Note, if you use a custom 403 page add it to the last line:

RewriteRule !^(custom-403-page\.html|robots\.txt)$ - [F]

rogerroger

6:40 am on Apr 5, 2014 (gmt 0)

10+ Year Member



Thank you keyplyr... I will try this out and post the outcome.

lucy24

9:41 am on Apr 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Deny from *.googleusercontent.com

Urk, don't do that, it throws your whole server into non-numeric mode (it's got a technical term whose name refuses to stick in my memory). Simplest alternative is

BrowserMatch googleusercontent keep_out
(or bad_bot or label of your choice)

... except that it isn't normally a UA, is it? It's a referer, so that's where you block it.

rogerroger

3:20 am on Apr 6, 2014 (gmt 0)

10+ Year Member



Thanks for that, lucy24.

Since this is a referer, should I use BrowserMatch or "SetEnvIf"?

SetEnvIf Referer googleusercontent bad_referer
Deny from env=bad_referer

lucy24

9:59 am on Apr 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



should I use BrowserMatch or "SetEnvIf"?

They're not mutually exclusive; BrowserMatch is simply a shorthand within mod_setenvif. It's equivalent to

:: detour to look up, because I never use the long form ::

SetEnvIf User-Agent etcetera


But if you're not dealing with the user-agent then obviously neither form will do, and you have to go to

SetEnvIf Referer googleusercontent keep_out

keyplyr

5:19 pm on Apr 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




And if you are already using Rewrites, then stick with that.

rogerroger

5:33 am on Apr 7, 2014 (gmt 0)

10+ Year Member



Thanks very much for the clear explanations. I'm going to try these options out in my htaccess and see how it goes.

As my first time posting here, I really appreciate the excellent feedback to my question. Before posting I did some searching and didn't find any clear direction. Thanks again!