Welcome to WebmasterWorld Guest from 23.20.238.193

Forum Moderators: phranque

Message Too Old, No Replies

How to Block Google Translate from Translating Our Sites?

Protecting Content Ripoff Via Google

     
4:03 am on May 22, 2008 (gmt 0)

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Ok, maybe this is so obvious I can't see it. For instance, how to block visits from this:

[translate.google.ru...]
[translate.google.bg...]

I get the feeling there's potentially more harm going on there than good. How to stop visits from those Google translations?

8:23 pm on May 22, 2008 (gmt 0)



The TLD isn't actually relevant, since this just gets turned into a language parameter on the destination page. So, a translation at google.ru turns into:

http.//[google IP]/translate_c?hl=ru&sl=en&tl=ru&u=http://www.example.com/

Google do add the text "(via translate.google.com)" to the user-agent, however this always says google.com regardless of regional variation used. The requests will come from a Google IP, since they're essentially running a proxy. I don't know if there's a particular IP range allocated to their translate proxies.

So, you could easily block all translations, but not regional variations. I suppose you could check both the UA and the browser language to see if it was from a country you wanted block. For extra accuracy you could also check that the request came from a .google.com IP, but I'm not sure that would be necessary.

11:59 pm on May 24, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Some simple code I use to detect some common translating services, which alter the User-Agent.

public bool IsViaTranslate
{
get
{
if (HasBlankUserAgent == false)
{
if (Headers["User-Agent"].IndexOf("via translate") > -1)
{
return true;
}
if (Headers["User-Agent"].IndexOf("via babelfish") > -1)
{
return true;
}
if (Headers["User-Agent"].IndexOf("Google Wireless Transcoder") > -1)
{
return true;
}
}
return false;
}
}
12:09 am on May 25, 2008 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Would there be any risk involved in doing something like this?

RewriteCond %{HTTP_USER_AGENT} via\ translate [NC,OR]
RewriteCond %{HTTP_USER_AGENT} via\ babelfish [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Google\ Wireless\ Transcoder [NC,OR]
4:47 am on May 25, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



"Google Wireless Transcoder" translate pages for use for mobile devices. I just happen to lump it in with translation services.

As for the other two. I usually return a 403, with no content. My sites I deal with only cater to US Residents, who speak English. So removing these services does not hurt me at all. But would be careful of doing this on other sites, depending on who there target audience's are, and where they are located.

Also blocking these services stops another way people can scrap data from a site. Reducing the ways someone can get to your content, reduces your risk of being successfully scraped.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month