Forum Moderators: open

Message Too Old, No Replies

Can Google tell the language of a page?

         

get smart quick

10:26 pm on Jul 16, 2004 (gmt 0)

10+ Year Member



I have a .hu site in Hungarian. All pages are marked <meta http-equiv="Content-Language" content="hu-HU"> as well as <meta name="language" content="hu"> for good measure.

I do a site:mysite.hu and search for HUNGARIAN pages, and it shows 179 hits. I do a site:mysite.hu and search for pages in ANY language, and it shows 9,150 hits.

Is Google broken, or am I doing something wrong?

troels nybo nielsen

7:36 am on Jul 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure that I understand you. Are you saying that Google find more pages on your website than there actually are?

And the question in your title: Yes, Google can tell the language of a page. But no, they are far from perfect at it. For example they have huge difficulties distinguishing between Danish and Norwegian, but I cannot really blame them for that.

get smart quick

3:13 pm on Jul 20, 2004 (gmt 0)

10+ Year Member



No, no! I have around 9,000 pages with language marked Hungarian, as I described.

If I do an advanced search, and select Hungarian language sites only, it finds 179 pages. When I do the same search, specifying any language, it finds them all.

So what language does it think they are? (How do I find out?)

troels nybo nielsen

3:44 pm on Jul 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> So what language does it think they are? (How do I find out?)

You might test it in some other languages that seem to be realistic guesses.

g1smd

9:43 pm on Jul 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Content-Language meta tag is used more by translation tools than by search engines.

Nikke

10:26 pm on Jul 22, 2004 (gmt 0)

10+ Year Member



Since you are already using two techniques, you might add the lang attribute as well. That is:
<html lang="hu">

I use it on several thousands of Swedish pages and it seems to be interpreted correctly by Google.

g1smd

12:09 pm on Jul 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes. Good idea. I forgot about that alternative method.

RonPK

1:58 pm on Jul 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It seems that language defaults to English. If I use 'lr=lang_en' in the query string with site:mysite.tld, I get about 900 pages with only the URL in the SERP: no title, no snippet. All those pages have <html lang="nl">. They just haven't been properly indexed yet. Maybe that explains what 'get smart quick' is seeing.