Forum Moderators: phranque

Message Too Old, No Replies

The Google Translate Dilemma

         

web_wheeler

11:52 pm on Dec 27, 2007 (gmt 0)

10+ Year Member



In my opinion, the new Google Translate has the potential to be the second most important application that Google has to offer, next to its own search engine, because Google Translate has the potential to literally make information accessible in any language.

Many will complain that the Google translation is barely legible, however, I feel legibility is a problem which Google will solve, in due time, because of the way it has designed its new Translation facility, but that is not the purpose of this post.

Unfortunately, there are a number of issues which significantly limit Google Translate's potential, not the least of which is the ability to misuse Google Translate as a proxy server to anonymously visit websites of questionable or illegal content, but that is also not the purpose of this post.

The purpose of my post is to point out a dilemma in the current implementation of Google Translate, which is the use of the Translate facility to internationalize a website by offering foreign language translations to website visitors via the "Translate a Web Page" Google Translate option. The dilemma is that all search engines, including Google, only index content that appears on the website, so how is a foreigner ever going to find your "international" website content if it does not appear in any SERP?

The alternative, to get listed in foreign language SERP's, of course, is to host your website content in foreign languages, however, this must be accomplished by copy text, paste into Google Translate, translate, copy translated text, paste translated text back into your website; because Google expressly forbids an automated interface to its Translate application in its TOS. For websites with a lot of dynamic content, this represents an enormous amount of vigilance and manual labor.

Your comments please!

lammert

2:38 am on Dec 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(...) this must be accomplished by copy text, paste into Google Translate, translate, copy translated text, paste translated text back into your website (...)

I have used on-line and PC based automatic translation software extensively and although the results of some translating algorithms--including Google's--are quite impressive, they still do not match natural language at such a level that you won't scare away foreign visitors after reading two sentences.

It is no problem when a visitor decides himself to use a translating service like Google's. He expects errors, wrong translations etc, but it was his choice to use such a service. If however the website owner decides to present such a machine translated page to his visitors, directly hitting the back button will probably be the reaction. Don't under estimate the huge amount of high quality pages that are already available on many non-english websites. Most visitors would rather read those pages than some semi-automatically generated content.

I have found the type of site you describe in the last month several times when searching on the internet. Content original written in English now available in 20 or more languages, all copy-and-paste work from Google, Babelfish and others.

Personally I see these sites as the new generation search engine spam. 20 or more autmatically generated pages per English source page with the only intent to feed more keywords into the search engines in the hope that they will pick up some traffic.

MFT comes to my mind, "Made From Translator", although MFA (made for AdSense) would be a nice description in many case because AdSense or other advertising programs are often the reason to make these pages.

web_wheeler

8:44 am on Dec 28, 2007 (gmt 0)

10+ Year Member



It is no problem when a visitor decides himself to use a translating service like Google's. He expects errors, wrong translations etc, but it was his choice to use such a service.

Hmmm... I don't really get it... what is the choice? Isn't the choice between not understanding a single word of what is written and understanding so much that you can actually criticize the grammar?

Or, is it you feel all web pages, including dynamic web pages such as this one, should only be translated by expert human translators, or not at all?

Or, are you sitting on the fence waiting for a better automated translation service to become available?

Or, do you feel this whole globalization thing has gone too far and we should just stick to our own language?

lammert

9:17 am on Dec 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmmm... I don't really get it... what is the choice?

The choice is finding excellent content written by native speakers in their own language, or finding your machine generated garbage.

There are many high quality content sites available on the internet in languages other than English. The SERPs in these languages are not so polluted by search engine spam as the English language SERPs. Searching in other languages often gives better results than in English. Why not keep it that way? ;)

[edited by: lammert at 9:28 am (utc) on Dec. 28, 2007]

phranque

9:19 am on Dec 28, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



it is unlikely that the information on your website is so important that people will jump through hoops to consume it.
it depends on the website, but at a minimum you need a first-language human rewrite of automatic translations.
the problems with automatic translation and language usage in context vary widely.
english in particular has a limited vocabulary and many words serve multiple meanings and often different parts of speech.
beyond that, in order to have an optimized or even effective site you may need in-country translators and/or domain experts to use the correct localized spelling, dialect, jargon, vernacular, terminology, etc...

lammert

9:25 am on Dec 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep, in-country translators is a good point. Many Dutch and French sites are human translated by people from Belgium because both Dutch and French are spoken in that country and many people from Belgium have (near)native writing capabilities in both languages. But if you are from the Netherlands or France, you will notice the difference and after two or three sentences you will know if the content was written by someone in your own country or by a Belgian translator.

The same for people from Switzerland, Canada and other countries where more than one language is spoken. They certainly may have a thorough understanding of more than one language, but due to historical evolution, the language they know may not match entierly the language of the targetted country anymore.

web_wheeler

10:32 am on Dec 28, 2007 (gmt 0)

10+ Year Member



It's all good to have in-country translators translating your static website, but perhaps you missed the second to last sentence in my first post?

For websites with a lot of dynamic content, this represents an enormous amount of vigilance and manual labor.

For dynamic websites, such as this forum, human translation becomes very impractical, even though there is much valuable information here.

The choice is finding excellent content written by native speakers in their own language, or finding your machine generated garbage.

Let me show you your choice...

It's either this:

A.

De keuze is aan het vinden van uitstekende inhoud geschreven door native speakers in hun eigen taal, of het vinden van uw computer gegenereerd afval.

or this:

B.

The choice is finding excellent content written by native speakers in their own language, or finding your computer-generated waste. (Translated into Dutch and back using Google Translate).

For me, I don't read Dutch, so my choice would be B - the computer-generated waste!

HarryM

1:28 pm on Dec 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In your example you are translating from English to Dutch which are related languages and so the results are quite close. As I don't speak Dutch I have no idea how the Dutch translation would be perceived by a local, but I doubt if they would be impressed.

In the example sentence Google cannot handle "native speakers" and leaves the words untranslated. Checking for the words alone, AltaVista produces "inheemse sprekers" and Google produces "Moedertaalsprekers". But you really need to be a Dutch speaker to know which (if any) is current usage.

This subject has come up time and time again, and the answer is always the same: there is no substitute for human translation. There is so much quality material on the internet in all languages that any literate person would automatically hit the back button when presented with a machine translation.

[added] Another example of the problems encountered in machine-translation is that the word "garbage" is an American colloquiallism. The translation "afval" may or may not be the equivalent Dutch colloquialism, and only a native-speaker would know that. [/added]

[edited by: HarryM at 1:41 pm (utc) on Dec. 28, 2007]

jtara

5:10 pm on Dec 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The dilemma is that all search engines, including Google, only index content that appears on the website, so how is a foreigner ever going to find your "international" website content if it does not appear in any SERP?

That's not a dilemma. That's a good thing. I don't want the SERPs returning poor translations of sites in other languages.

Google already offers foreign-language results. Dunno which language the search is done in in this case? (Does Google translate in order to match keywords?) If you have foreign language results turned on, you can then use Google Translate or another service to translate the site if you don't read the language.

web_wheeler

8:58 pm on Dec 28, 2007 (gmt 0)

10+ Year Member



Ik ben echt verbaasd over het feit dat mensen liever niet lezen dit bericht op alle, of worden gedwongen tot gebruik van een manuele techniek te vertalen. Voor dynamische websites, dit zal leiden tot veel werk, om niet te spreken van de bandbreedte en processor tijd die wordt verspild door opnieuw vertalen dit bericht elke keer iemand wil lezen! Maar als dat is wat de mensen willen doen, dan moet dat maar. Het gebruik van menselijke vertalers vertalen dynamische websites is zo onpraktisch dat het niet zal zelfs een optie. Dus, de keuze is aan jou ... Een computer-gegenereerde vertaling, of niet lezen op dit bericht. Maar, als u er voor kiezen om dit bericht te lezen, waarom zou je wilt vertalen het met de hand?

lammert

9:24 pm on Dec 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ik ben echt verbaasd over het feit dat mensen liever niet lezen dit bericht op alle, ...

As a native Dutch speaker, I guess your original sentence was something like:

I am really surprised about the fact that people rather don't read this message at all, ...

which is understandable in the English language. The first part of the sentence was translated quite well but "at all" is typical English which has no direct word-to-word translation in Dutch. You could translate it to something like "in het geheel niet" which word-to-word translates back to English as "not in the whole". The translation "op alle" is closer to the translation of "on top of all" and the reader expects a noun. After the first part of the Dutch translation he naturally stops reading and thinks on top of what? which word did the writer forget to insert? At that moment the interest of the reader in the content shifts to thinking what word should be inserted and as our human brains are very flexible, the reader forgets the original reason to open the page.

Search engines have no flexible brains and their interest in a page is not guided or distracted by grammar. The words "at" and "all" are just fill words and won't have any significant meaning in the scan process which is (still) basically based on words and word combinations rather than the semantic interpretation of sentences. That's why these type of pages unfortunately find their way in the SERPs--they can be easily consumed by the bots--but don't find many happy human consumers.

web_wheeler

1:18 am on Dec 29, 2007 (gmt 0)

10+ Year Member



Thanks for your insight, lammert! As a native Dutch speaker, you understood my post, even though it was the output of a machine translation. But, what if this post were in Arabic, or Russian, or Greek... what would you do then?

There is no question that a native human translator would provide a better translation, but then you would need the time, money, and the availability of the human translator before you could understand or respond to the post. You wouldn't publish the Dutch travel guide to Amsterdam using Google Translate from Dutch to English, but if you simply wanted to go onto a local entertainment forum, in Amsterdam, and post "Kan iedereen aanbevelen leuke plaatsen om te bezoeken tijdens mijn vakantie naar Amsterdam?" you could do so, using Google Translate, and be able to understand the results without knowing how to read or write a word of Dutch.

To believe there are ever going to be human translators translating such forum posts both back and forth into Dutch and English, as well as the many other languages that Google Translate supports, is complete nonsense. There is no question in my mind that a computer-based application, such as Google Translate, is an invaluable tool.

The point I was making, in my first post, is that the Google Translate tool, with its current TOS restrictions, is far less useful than it could be, both in terms of its automated interface restrictions, and that search engines will not index the translated content.