Welcome to WebmasterWorld Guest from 54.210.61.41

Forum Moderators: open

Message Too Old, No Replies

How does Google determine sites in other languages?

     
9:20 pm on Mar 4, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Nov 8, 2002
posts:223
votes: 0


I'm sure there are threads on this already but I can't find it using the Google search. My question is how does Google (and other search engines) determine the language of a site. One way is of course keyword matching (e.g if it matches the content of a page). What about other methods? What about 2 languages on the same page?

I'd appreciate some links on this.

Thank you

5:27 pm on Mar 5, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Nov 8, 2002
posts:223
votes: 0


Anyone? I'd be really interested in learning more on this issue. Does Google take into account HTML markup that marks changes in language? What about keywords that are spelled the same in different languages (and mean the same or different languages).

I'm willing to search for information but I'm not sure how to formulate my query.

Thanks :)

7:49 pm on Mar 5, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:June 21, 2003
posts:59
votes: 0


There was a presentation yesterday at SES New York on Spanish SEM.

What I understood from the panel is that the search engines are taking a look at all the words on a page and making a determination as to its language. There are, of coarse, problems if a word is spelled the same in two or more languages. Id guess that the weight of one set of words over another will determine what language it is listed as.

They said that only 10% of searchers on google will choose a language profile to search from. The other 90% are searching through all pages on the index. I would think that if someone is searching at google.com.mx, that it might be biased toward returning pages in Spanish rather than English or another language.

A meta tag might help also:
<meta name="dc.language" content="en" /> for English, or
<meta name="dc.language" content="es" /> for Spanish

The panel agreed that there are problems in the area of language determination. It seems to be a particular problem with things like adwords and contextual ads.

9:12 pm on Mar 5, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I am not aware of search engines indexing Dublin Core metadata yet.

You could try this though:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="Content-Language" content="en-gb">

The first line lists the character set being used.

The second line lists the language of the content. The first two letters come from ISO 639 and the last two letters come from the ISO 3166 standard.

12:36 am on Mar 7, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 14, 2003
posts:53
votes: 0


Search engines use third party tools to automatically identify both the language and encoding of the text of a page. Once they have that they can simply apply the same pattern matching of the word to the database and you get your results.

Each language and country can identified by an ISO and/or encoding. When these tags are used it is easier. When these tags are not used the software simply applies a pattern match to content that is not immediately identified as English/Latin 1 and matches it to a database of language and encoding patterns.

Most search engines can easily handle the most common language/encoding pairs that cover roughly 36 languages.

Japanese and Arabic are the most complex languages to process. Japanese text is normally written without any spaces, and needs to be separated into words before the text is ready for handling. Additionally, Japanese verbs need to be reduced to dictionary form, and compound words separated into sub-compounds.

The best way to get detected properly is to use the following tags:

<meta http-equiv="content-language" content="en"> (insert appropriate language code)

On Japanese pages I would add the following too:
<meta http-equiv="Content-Type" content="text/html; charset=x-sjis">

Otherwise, ranking well in another language is exactly the same optimiztion techniques as in English. The engines are doing a much better job of detecting and scoring non-English language pages.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members