Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to integrate a new language into a website

         

Trueman

7:40 pm on Sep 25, 2012 (gmt 0)

10+ Year Member



Hi,

I'm currently implementing the translation of an SAAS web project from English to Spanish. The translation is done by a native speaker for the whole website.

I want to share my steps and start a discussion about best practices to easily integrate the new language, keep current rankings and start with a good ranking for the translated pages.

Content:
I think it is important to not exactly translate the content and the keywords. Keywords may differ in other languages. So the translator needs to know some about SEO like keyword research. Although reader may speak the same language (e.g. Spanish), keywords may differ for Spain, South America and the Spanish speaking people in the US.

Structure design:
EN (my existing structure):
http://www.example.com
http://category1.example.com/widget1


ES (structure that will be added):
http://www.example.com/es
http://categoria1.example.com/es/trasto1


There are different approaches, but it's important to add the language tag to some part of the URL. Translation of the URL is of course also important. I also use the same URLs as canonical, e.g. on the Spanish page:
<link rel="canonical" href="http://categoria1.example.com/es/trasto1" /> 


Charset:
I will use UTF-8 for the source code and ASCII chars < 127 for the URL. So no accents and the like. I prefer this way to be more compatible with old browsers and I don't like IDNs either.

Code changes:
I have added these parameters to declare the language of the website:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="es" lang="es">
<meta name="content-language" content="es" />


I was thinking about using the rel="alternate" hreflang="x" tag:
[support.google.com...]

On the ES page (
http://categoria1.example.com/es/trasto1
):
<link rel="alternate" hreflang="en" href="http://category1.example.com/widget1" /> 


On the EN page (
http://category1.example.com/widget1
):
<link rel="alternate" hreflang="es" href="http://categoria1.example.com/es/trasto1" /> 


Maybe it's better to avoid this to pretend more unique content.

Usability:
I also added a drop down on every page to easily select the language. When someone selects the translation, the user is redirected to the same, translated page, not the homepage or somewhere else.

Multilingual versus localization:
The website will be a multilingual website (e.g. "en"), but not localized (e.g. "en-GB"). So I will not use flags in the drop down to select a language, flags are for localization. People from different countries may speak the same language, but fight against each other. The language string in the dropdown is in the native language on all EN and ES pages and the title is the language, but translated for the current language the user sees. On the EN and ES page it is:
<option title="Spanish" value="http://categoria1.example.com/es/trasto1">Espaņol</option>

Automatic language detection:
I avoid this completely. Detection by IP or browser preferences may be inappropriate. IP detection may give problems with googlebot. I guess people using a search engine in Spanish will get my Spanish translation. If not, they can easily select the language on my page.

Cookies:
In the future I might store the language code in a cookie, but only if the user has actively selected a language once from my drop down. This cookie will then be read and the user 302 redirected to the once pre selected language. This can be changed at any time by using the drop down again.

Translation:
I'm using a self made CMS, so the translator has a frontend to enter the translation for the dynamic content. For the static strings (like "Terms", "Contact") I create a PO file and the translator can use for example poedit to translate these strings.

Things to investigate:
- Right to left written languages
- Different design for different cultures (e.g. some colors may be odd in some cultures)
- Online translation services (API integration)
- URL design for non ASCII languages like Russian or Arab.

tedster

4:18 am on Sep 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your approach looks pretty solid to me.

On the ES page (
http://categoria1.example.com/es/trasto1 ): 
<link rel="alternate" hreflang="en" href="http://category1.example.com/widget1" />


On the EN page (
http://category1.example.com/widget1 ): 
<link rel="alternate" hreflang="es" href="http://categoria1.example.com/es/trasto1" />


Maybe it's better to avoid this to pretend more unique content.

Content that means the same thing but is translated into a different language is already unique. Duplication of content is about exact character matching, not matched "meanings". I'd say you're in good shape using those <link> elements.
.

[edited by: Robert_Charlton at 11:25 pm (utc) on Sep 27, 2012]
[edit reason] Delinked example urls [/edit]

deadsea

11:01 am on Sep 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I disagree somewhat automatic language detection. I wouldn't recommend redirecting because of automatic language detection, but I would recommend a note saying "View this content in English" when somebody lands on a site and their browser preference indicates they use a different language. I certainly wouldn't redirect away automatically, but my users seem to use a prominent link most of the time when I do language detection by browser preference.

When I do that language detection, I don't need the drop down for users to switch languages. I just need a way for Googlebot to be able to find and crawl all the different languages. So I just use links in the footer to the homepage for that purpose.

As far as different direction languages go, I use the following html tags:
<html lang="en" dir="ltr">
<html lang="ar" dir="rtl">
Seems to work like a charm.

I took the route of initial translation done by machine for my website. This allowed me to expand very quickly into 40 languages. To go along with this I created an interface for users to make corrections. When a user makes a correction, it gets stored in their session so that they can see it used live on the site right away. It also gets put in an approval queue so that I can look at it before making it available on the site to everybody. This has worked really well. Three of the languages have been completely corrected by users. I get about two users a week that correct on average 10 strings apiece. Most of the horrendous machine translations have been excised by now (a year after initial launch).

I would recommend NOT using the altlang links if you are going to have more than five languages. It ends up adding a huge block of HTML to your pages at that point.

I chose ASCII only URLs. for my site. I ran into problems with non-ascii urls a few years ago. Browsers today seem to be much better about displaying non-ascii characters. You'll run into non-ascii issues even in Spanish if you put accents on the letters in your urls. Post here if you find any problems with them. It might be time for me to internationalize my urls as well.