Forum Moderators: Robert Charlton & goodroi
Google ignores all code-level language information
Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.
Make sure the page language is obvious
Google uses only the visible content of your page to determine its language. We don’t use any code-level language information such as lang attributes. You can help Google determine the language correctly by using a single language for content and navigation on each page, and by avoiding side-by-side translations. Translating only the boilerplate text of your pages while keeping the bulk of your content in a single language (as often happens on pages featuring user-generated content) can create a bad user experience if the same content appears multiple times in search results with various boilerplate languages.
Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.
Questa pagina ha sempre avuto un insolito numero di visitatori provenienti dall'Italia. This Pagina Semper ha avuto delle Nazioni Unite insolito Numero di Visitatori provenienti DALL'ITALIA. just to be clear, "code-level" language information is distinct from "link-level" language information, which is the proprietary "link rel alternate hreflang" attribute google began supporting last year.
12.3 Document relationships: the LINK element
<!ATTLIST LINK
%attrs; -- %coreattrs, %i18n, %events --
charset %Charset; #IMPLIED -- char encoding of linked resource --
href %URI; #IMPLIED -- URI for linked resource --
hreflang %LanguageCode; #IMPLIED -- language code --
type %ContentType; #IMPLIED -- advisory content type --
rel %LinkTypes; #IMPLIED -- forward link types --
rev %LinkTypes; #IMPLIED -- reverse link types --
media %MediaDesc; #IMPLIED -- for rendering on these media --
>
Links in HTML Documents [w3.org]
even though support of it maybe be limited to Google* it's not a proprietary attribute
Do they also ignore the content-language http header?
what's the use of putting information in a <link>?
Values of the title attribute may be rendered by user agents in a variety of ways. For instance, visual browsers frequently display the title as a "tool tip" (a short message that appears when the pointing device pauses over an object). Audio user agents may speak the title information in a similar context. For example, setting the attribute on a link allows user agents (visual and non-visual) to tell users about the nature of the linked resource
I don't think g### distinguishes between "en" and "en-uk".
they did not mention header-level language specification
Yes, but I want to know! It seems to be to be the easiest way of doing it in most cases.
Content-Language
The Content-Language entity-header field describes the natural language(s) of the intended audience for the enclosed entity. Note that this might not be equivalent to all the languages used within the entity-body.
... The primary purpose of Content-Language is to allow a user to identify and differentiate entities according to the user's own preferred language. Thus, if the body content is intended only for a Danish-literate audience, the appropriate field is
Content-Language: da
If no Content-Language is specified, the default is that the content is intended for all language audiences. This might mean that the sender does not consider it to be specific to any natural language, or that the sender does not know for which language it is intended.
Multiple languages MAY be listed for content that is intended for multiple audiences. For example, a rendition of the "Treaty of Waitangi," presented simultaneously in the original Maori and English versions, would call for Content-Language: mi, en
However, just because multiple languages are present within an entity does not mean that it is intended for multiple linguistic audiences. An example would be a beginner's language primer, such as "A First Lesson in Latin," which is clearly intended to be used by an English-literate audience. In this case, the Content-Language would properly only include "en".
Content-Language MAY be applied to any media type -- it is not limited to textual documents.
one language per page only
In all these pages the canonical tag points to http://www.example.com/ - english (US) version.
Having subfolder like http://www.example.com/de will help in verifying the German site and setting its geographic target in GWT to Germany.
"The best practice is to place languages in subdirectory or subfolder rather than language parameter to help search engines more easily understand site structure."
this is an incorrect application of the link rel canonical element.
the content for each language should have its own canonical url.
having a sudomain will have the advantages mentioned above and also allow the subdomain to be hosted within the targeted geography.
I doubt they just started ignoring it yesterday...
There are issues. I would opt for this only if I am a big brand; and if I could generate content specific to their region! Not everyone could make a subdomain to rank because it spreads the authority.
To me having a single domain with multiple subfolders to target each language would help in getting a single domain that can build authority.