homepage Welcome to WebmasterWorld Guest from 23.20.34.25
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 37 message thread spans 2 pages: < < 37 ( 1 [2]     
Multi-lingual pages - language identification
and how Google treats them
vitaplease




msg:215324
 5:08 pm on Oct 29, 2002 (gmt 0)

My indexpage is a multi-lingual (five languages) starting point.

That is, the majority of the bodytext and the title is in English,
but there is also some bodytext in the other languages and amongst others in pulldown menus.

However, if I do a search in a local Google version, with the option: "Pages in that language" only, the indexpage will not turn up. That is Google has labelled it as English and English only.

The question is, if this is reasonable, as e.g. many external Dutch sites or directories would link to the index page by default, with in many cases Dutch anchortexts and their is Dutch content on Page.

It looks like the only way out is always going for multiple single-lingual sites, with their respective tld's and thereby also catching the "searches from this country only" option.

Does anyone know how Google classifies the language of a page?

Is it on percentage bodytext? title? majority of language of inbound linked sites?

 

multilang




msg:215354
 9:19 am on Nov 4, 2002 (gmt 0)

Hi gstewart

How in the world could Google figure out duplicate content if it is in other languages? This is called translation. I don't thing Google would penalize any site for having the same content translated in different languages. On the contrary.

I am always having different languages under the same domain as all backlinks help the whole site in general.

I just do as you do and that has been working fine for me.

I organise link campaigns in 5 lanaguages and I ask all sites to point to domainname.com/xx accordinf to the language, so in fact I try to attract link to sub-directories while I focuse on getting links from directories to the normal URL so it distributes PR to all other pages.

Do I make sense?

Hagstrom




msg:215355
 12:11 pm on Nov 7, 2002 (gmt 0)

I think I'll try a test page with a "no-language title" and 50% English and 50% Dutch body text and see what Google decides.

I have pages like that: A Danish text in the left column with an English (I hope:) ) translation in the right column. Headings and footnotes are in English.

I don't use the meta language-tag - only <HTML LANG="en">.

Google seems to be confused: 6 of the pages are deemed to be Danish and 6 to be English.

vitaplease




msg:215356
 12:21 pm on Nov 7, 2002 (gmt 0)

Nice one Hagstrom,

you cannot find any clue as to why Google decides the Danish ones are Danish?

such as:

- slight higher percentage of Danish words (in character count?)

- some words in the English column that could be identified as Danish?

- more internal links to that page from predominantly Danish pages?

- etc.

SlyOldDog




msg:215357
 1:29 pm on Nov 7, 2002 (gmt 0)

We often show up with the same page for Italian and Spanish searches. This is just due to the similarities of the languages. I don't think Google cares what language the page is in. They are just matching keywords.

rencke




msg:215358
 4:44 pm on Nov 7, 2002 (gmt 0)

>I don't think Google cares what language the page is in. They are just matching keywords

I think you are right. Reading Hagstrom's post above it occurred to me that I too had a few pages with two languages lined up side by side. The title is bi-lingual too. Searching for either of the titles would turn up the pages in the #1 spot irrespective of language preference settings.

Hagstrom




msg:215359
 10:20 am on Nov 8, 2002 (gmt 0)

I beg to differ!

The pages that Google deems to be Danish will not be found when searching for English pages and vice versa.

Danish and Norwegian (bokmål) are much closer related than Italian and Spanish (castilliano). This means that Google might mistake a Norwegian page for Danish, but once Google has filed the page as Danish, that page won't turn up in a search for Norwegian pages.

Anyway, I have sticky'ed the details to Vitaplease, so let's see what turns up.

Hagstrom




msg:215360
 12:23 pm on Nov 8, 2002 (gmt 0)

SlyOldDog and Rencke - you're saying that the same page might be found by surfers looking for Italian pages and surfers looking for Spanish pages? Do you have examples?
After having studied the ToS carefully, I have a counter-example:

A non-commercial Norwegian site writes about a very old Danish book, Døde-Dans, on two HTML-pages. On one page, the book is so prominently featured that Google deems the page to be Danish.
If you try a search for Danish pages about Døde-Dands on his site, you find one and only one page. If you ask for Norwegian pages, you get the other page.

Danish [google.com] Norwegian [google.com]

This 37 message thread spans 2 pages: < < 37 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved