Forum Moderators: phranque
For SEO purposes and to reflect local market preferences I use local ccTLD domains for each language site. Make sure each site is using the proper character encoding so that it will display properly.
> implementing multiple languages
I have no idea about your back end or what you're using so that would be a tough one.
There are some search engine duplicate content and index problems [webmasterworld.com] that might occur if you don't implement it correctly, but I have it working with good SE rankings for all language variants on a few domains.
For example, just because I'm surfing from a Japanese IP doesn't mean I want content in Japanese.
Have a look Here [w3.org]
indeed it calls for experience and knowledge
however it could work and it does if well implemented
I live in one country, my computer is set to the language of another, my browser is set to my mother tongue and I can use three languages and get by in more.
phranque's point about user-changeable settings is the crucial one here (note the user rather than the site-owner).
definitely a good read to get a sense of the issues.
oh and welcome to WebmasterWorld [webmasterworld.com], teamcoltra!
I don't like subdomains (they dilute the domain name importance), and ccTLDs are for countries, not languages (of course, if I had country-dependent websites, I would use the proper ccTLDs -or maybe use three-letter country codes, in the URLs, if I don't have enough money).
The Apache content negociation uses the language configured by the user. The default language is the language of the browser interface, which should generally be quite ok.
If it is not, then the user will have to use the link at the top of the page, for the home page, if he does not bookmark the home page in its preferred language.
Internally, my directory structure follows the URLs ("./htdocs/{en,fr}/*"). In these directories, I have my static content. For dynamic content, I use mod_rewrite to check a cache directory, first, and if the page is not cached, I'll generate it and cache it (by redirecting, internally, to a PHP script). For personalized content, I just skip the cache, and return the generated page directly.
[edited by: Mathieu_Bonnet at 11:27 am (utc) on Mar. 30, 2008]
1.) we installed NGINX (open source high-speed proxy) on a linux box in the country of the new language (.de)
2.) we have "tagged" every piece of language in our 300 PHP pages (that took over a week) with _translate("tag","This is the text") and run a dictionary where the languages are stored
3.) we have installed memcached, a memory cache for PHP, where you can put the language to avoid database access
4.) the reverse proxy calls "de.someenglishdomain.com", which points to the SAME pages like www. - our PHP header sees the de. host and switches to german
There are some other things to do, but that is the basic setup based on the idea, that local sites IN the country have a better chance of ranking, than german pages in the states. It looks like it is working fine!
P!
For SEO purposes and to reflect local market preferences
Bill, this is related to your first response.
What SEO advantages does a separate tld have over using a separate directory? www.example.com\fr vs www.example.fr, won't you be better off cost and SEO wise to go with the former.
In addition to that, I am not sure if language does necessairly imply a location.
Having a spain tld for spanish might imply that it is meant to target the visitors in spain but less so the rest of the spanish speaking population, while www.example.com/es could imply the version of the same site content in spanish regardless of the visitor's location.
What SEO advantages does a separate tld have over using a separate directory?
(I'm straying away from our focus on language here, but language and localization are related...)
You will do much better in the country specific SERPs if you:
1) use a ccTLD
2) host in target country
3) have target country whois
4) set webmaster central country settings to target country
Although the above can be costly and complicated to manage. From my experience, the benefits outweigh the costs.
#2-4 are dependent on #1 although the (cheaper) alternative to #1 is to use subdomains (each subdomain could be set up to resolve to a different IP so it can be hosted in the target country). However, I believe that this is much less effective than using ccTLDs.
You could implement content negotiation.
Don't do this. Not with languages.
For example, just because I'm surfing from a Japanese IP doesn't mean I want content in Japanese.
Quite. And just because I'm surfing from a UK IP doesn't mean I don't want the content in Japanese.
You can't pre-empt what language the user would like their content in so it's better not to try. Instead just give the user as much control as possible.
Is there any sort of industry consensus over which of these would be best practice for a Spanish translation of a page about Augusto Pinochet (former Chilean dictator) on a predominantly English language website published in the US:
1) www.mysite.es/myfolder/mypage.html
2) www.mysite.com/es/myfolder/mypage.html
3) es.mysite.com/myfolder/mypage.html
4) www.mysite.es/myfolder/mypage.php?lang=es&loc=us
5) Something else, possibly involving .cl (the ccTLD for Chile) ...?
I'm thinking either 2), hosted in the USA, or 3) hosted in Chile (if all the articles on the site are Chile related), or else 3) hosted in Spain.
If you take a look at the site in my profile and look at how the langauges have been handled, including links on all pages to the corresponding page in all other languages, you will find an approach that has been paying off handsomely for the past six years. These sites have excellent rankings in Google - typically top 5 - for all of the most important keywords.
<edit> Not to mention that if you purchase a machine in any contry, your language preference is set by default to the country of purchase.
Nevertheless you may set it any way that please you
even with a first choice and second choice etc. </edit>
example.com/en/good
example.com/fr/bon
Both with the same content but in their respective languages?
Would these be seen as duplicate content?
I'd have thought that no matter where the user is in the world, they would search in their prefered language and thus find the appropriate version of your sites pages.
i totally agree:
if you register a .de via godaddy (eg., not sure about the other registrars), you get a german trustee as the whois info. The same for most other european countries.
if you then get a VPS with a reverse proxy, you are fully targetted in your country of choice:
1.) tld is there
2.) whois is there
3.) IP is there
4.) language is there
if you are linux savy for costs below $200 per year and local domains just rank better, I have serveral .coms in europe to show you proof that they are NOT ranking in US serps - despite incoming links and weak keywords!
P!
Would these be seen as duplicate content?
I am not really sure. The same content in different languages, would it be generally considered a duplicate content. Well, I think it shouldn't, if you specially consider the generally acceptable approach of creating content for visitors and not search engines.
Would you then consider that in the context of the example I gave above, the best of all solutions would be: www.misitio.cl/micarpeta/mipagina.html with the site hosted in Chile?
Yes, assuming that both misitio and /micarpeta stand for important keywords. I have not found that hosting is really an issue. All of our sites are hosted in the UK, and that has not represented a problem for us, even though 20 language areas are involved.
example.com/en/good, example.com/fr/bon Both with the same content but in their respective languages? Would these be seen as duplicate content?
NO! Don't worry! All of the sites in my profile are identical except for language and that has not hurt us one bit in six years. But you may want to reconsider the name of the folders. /en and /fr are not good solutions SEO-wise. /my-service-in-english and /mon-service-en-francais will help rankings in Google if "my-service" and "mon-service" are the #1 most important keywords in each language.
I think we can all agree that this approach hasn't harmed their SERPs
If you can persuade Google to turn up your site to the very top, then you have nothing to worry about. A press realease informing the world that you intend to start a search engine that will make Google obsolete, might do the trick. ;-)
While on the subject, internationalization (i18n) is the concept of making your software/website ready to be translated into different locales, a locale denotes the combination of language and conventions for a given culture, and localization (l10n) is the act of translating your software/website to a specific locale. So you have the locales en-US, en-CA, fr-CA, fr-FR, etc. Localization may also involve changing date/time formats, colors and icons. For your particular product, you may want to reduce a "locale" to equate a language.
The OP asked about multilingual websites. Leaving SEO aside for now, either using subdomains (en.example.zz), subfolders (www.example.zz/en/), or a parameter on the query string (www.example.zz/?l=en) works fine. Subdomains: keep in mind that some people will always add a www. to the front of the domain name, you may also need to do additional handling for things like carrying cookies across. Subfolders: will keep your URLs clean, i.e. without any ?xx=... in them. Query string parameters: if most of your URLs already have a bunch of query parameters, you might as well add the language parameter to it, otherwise use subfolders to keep the URLs clean. For the first two methods, you can parse the $_SERVER['SCRIPT_URI'] parameter in PHP to figure out which locale is required. But then again... you may also want to translate the domain name itself: example.com, exemple.com, ejemplo.com.
I don't pay much attention to SEO, but i'd ignore any search engine that would penalize a website because they have the same webpage in multiple languages, such an engine would penalize all Canadian government websites for one thing, many major businesses in Canada, Switzerland, etc., and many websites catering to international sports... As it's been pointed out, you may want to go the extra mile and translate the folder and page names as well, and use mod_rewrite or something to handle the differences.
For deciding what locale to use initially, you only need to do this if someone visits an URL where the locale is not indicated, so you'd do it for www.example.com but not www.example.com/somelocale/buy-widgets.html. For this, definitely don't use the visitor's IP, but it's a safe bet to use HTTP_ACCEPT_LANGUAGE to make an educated guess for the user's preferred language (keep in mind some browsers won't send this at all). So if someone goes to www.example.com, either show a splash page and let them choose from the locales available, or use HTTP_ACCEPT_LANGUAGE and direct them to www.example.com/preferredlocale/index.html, or simply go to www.example.com/defaultlocale/index.html. No matter the method you use, always give the user the option of switching languages on every page.