homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
How to Block Google Translate from Translating Our Sites?
Protecting Content Ripoff Via Google
martinibuster




msg:3656014
 4:03 am on May 22, 2008 (gmt 0)

Ok, maybe this is so obvious I can't see it. For instance, how to block visits from this:

[translate.google.ru...]
[translate.google.bg...]

I get the feeling there's potentially more harm going on there than good. How to stop visits from those Google translations?

 

Receptional Andy




msg:3656777
 8:23 pm on May 22, 2008 (gmt 0)

The TLD isn't actually relevant, since this just gets turned into a language parameter on the destination page. So, a translation at google.ru turns into:

http.//[google IP]/translate_c?hl=ru&sl=en&tl=ru&u=http://www.example.com/

Google do add the text "(via translate.google.com)" to the user-agent, however this always says google.com regardless of regional variation used. The requests will come from a Google IP, since they're essentially running a proxy. I don't know if there's a particular IP range allocated to their translate proxies.

So, you could easily block all translations, but not regional variations. I suppose you could check both the UA and the browser language to see if it was from a country you wanted block. For extra accuracy you could also check that the request came from a .google.com IP, but I'm not sure that would be necessary.

Ocean10000




msg:3658418
 11:59 pm on May 24, 2008 (gmt 0)

Some simple code I use to detect some common translating services, which alter the User-Agent.

public bool IsViaTranslate
{
get
{
if (HasBlankUserAgent == false)
{
if (Headers["User-Agent"].IndexOf("via translate") > -1)
{
return true;
}
if (Headers["User-Agent"].IndexOf("via babelfish") > -1)
{
return true;
}
if (Headers["User-Agent"].IndexOf("Google Wireless Transcoder") > -1)
{
return true;
}
}
return false;
}
}

encyclo




msg:3658425
 12:09 am on May 25, 2008 (gmt 0)

Would there be any risk involved in doing something like this?

RewriteCond %{HTTP_USER_AGENT} via\ translate [NC,OR]
RewriteCond %{HTTP_USER_AGENT} via\ babelfish [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Google\ Wireless\ Transcoder [NC,OR]

Ocean10000




msg:3658481
 4:47 am on May 25, 2008 (gmt 0)

"Google Wireless Transcoder" translate pages for use for mobile devices. I just happen to lump it in with translation services.

As for the other two. I usually return a 403, with no content. My sites I deal with only cater to US Residents, who speak English. So removing these services does not hurt me at all. But would be careful of doing this on other sites, depending on who there target audience's are, and where they are located.

Also blocking these services stops another way people can scrap data from a site. Reducing the ways someone can get to your content, reduces your risk of being successfully scraped.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved