Forum Moderators: open
I'd appreciate some links on this.
Thank you
I'm willing to search for information but I'm not sure how to formulate my query.
Thanks :)
What I understood from the panel is that the search engines are taking a look at all the words on a page and making a determination as to its language. There are, of coarse, problems if a word is spelled the same in two or more languages. I’d guess that the weight of one set of words over another will determine what language it is listed as.
They said that only 10% of searchers on google will choose a language profile to search from. The other 90% are searching through all pages on the index. I would think that if someone is searching at google.com.mx, that it might be biased toward returning pages in Spanish rather than English or another language.
A meta tag might help also:
<meta name="dc.language" content="en" /> for English, or
<meta name="dc.language" content="es" /> for Spanish
The panel agreed that there are problems in the area of language determination. It seems to be a particular problem with things like adwords and contextual ads.
You could try this though:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="Content-Language" content="en-gb">
The first line lists the character set being used.
The second line lists the language of the content. The first two letters come from ISO 639 and the last two letters come from the ISO 3166 standard.
Each language and country can identified by an ISO and/or encoding. When these tags are used it is easier. When these tags are not used the software simply applies a pattern match to content that is not immediately identified as English/Latin 1 and matches it to a database of language and encoding patterns.
Most search engines can easily handle the most common language/encoding pairs that cover roughly 36 languages.
Japanese and Arabic are the most complex languages to process. Japanese text is normally written without any spaces, and needs to be separated into words before the text is ready for handling. Additionally, Japanese verbs need to be reduced to dictionary form, and compound words separated into sub-compounds.
The best way to get detected properly is to use the following tags:
<meta http-equiv="content-language" content="en"> (insert appropriate language code)
On Japanese pages I would add the following too:
<meta http-equiv="Content-Type" content="text/html; charset=x-sjis">
Otherwise, ranking well in another language is exactly the same optimiztion techniques as in English. The engines are doing a much better job of detecting and scoring non-English language pages.