phranque - 6:26 am on Oct 22, 2012 (gmt 0)
i mentioned during a presentation at pubcon last week that google ignores language specification in html code and was approached several times afterwards for clarification.
i was surprised this was news, especially since some of those who asked were very familiar with multilingual sites.
so just to get this out there for discussion, from the Official Google Webmaster Central Blog - Working with multilingual websites:
Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.
and from Webmaster Tools Help - Multi-regional and multilingual sites:
Make sure the page language is obvious
Google uses only the visible content of your page to determine its language. We don’t use any code-level language information such as lang attributes. You can help Google determine the language correctly by using a single language for content and navigation on each page, and by avoiding side-by-side translations. Translating only the boilerplate text of your pages while keeping the bulk of your content in a single language (as often happens on pages featuring user-generated content) can create a bad user experience if the same content appears multiple times in search results with various boilerplate languages.
this tells me google isn't that great at language and if not even google can "get it" it's a universal problem, so i would still recommend properly specifying language for all content.
just to be clear, "code-level" language information is distinct from "link-level" language information, which is the proprietary "link rel alternate hreflang" attribute google began supporting last year.
Official Google Webmaster Central Blog: New markup for multilingual content:
rel="alternate" hreflang="x" - Webmaster Tools Help: