|Google inserts conflicting charset meta tag|
Any way to force my charset meta tag to override Google's
On cached pages, Google inserts a <meta> tag specifying the charset at the very top of the page even if the page already includes a charset <meta> tag. Previously Google always specified the same charset that I specified. Their charset definition was duplicative and had no effect on search results or on the display of cached pages.
Now, however, Google has inserted a conflicting charset definition in a <meta> tag at the top of the home page of one of my sites. The page includes many foreign characters, and my specified charset is windows-1252. Google's <meta> tag specifies the windows-1250 charset, which differs significantly from windows-1252. Because Google's <meta> tag comes first, browsers give it preference, and the foreign characters display wrong. As a consequence, the page is not showing up in search results for words that include foreign characters, and the cached page contains gibberish.
I wondered if there is any way to force my <meta> tag to take precedence over the conflicting <meta> tag added by Google. In the case of CSS, it's possible to override a website's specified style definitions in a user style sheet by adding!important after each definition. Is there any similar trick that works with <meta> tags?
Alternatively, is there any way to communicate with Google to get them to correct this? I'm losing traffic every day that the Google cache page with the erroneous charset definition is online. The site was getting about 8 times as many referrals from Google as Yahoo before Google mangled the charset, and now it's getting about 6 times as many referrals from Google as Yahoo.
The first charset declaration always takes precedence. I've been looking at how Google handles the character encoding when displaying a cached page, but the pattern isn't immediately obvious. Often I see that Google adds a charset of US-ASCII via a meta element even when the page specifies ISO-8859-1. On other occasions, the charset declaration matches the one in the document. In all cases during my (admittedly brief) search, the pages were displayed correctly, in that the charset declared by Google represented adequately the one used within the document (ie. a document which declared ISO-8859-1 but Google chose US-ASCII only contained US-ASCII characters).
Google is certainly using an auto-detection routine even when the charset is defined within the document. The solution will be to ensure that your choice for the display charset is respected. How exactly are you declaring the windows-1252 charset? What does your meta element look like (please copy-paste the exact code here). Is the meta element placed before or after your
I suspect that the best solution is to declare the charset via a HTTP header rather than (or in combination with) the meta charset element. From initial observations indicate that this value is retained and respected by Google. An interesting case which dislays this is the home page of microsoft.com - there are two meta charset elements in the original document, the first declaring UTF-16 and the second UTF-8. The first would normally take precedence, but as the server is also setting the charset as UTF-8 via a HTTP header (ie. sent before any page data), Google is respecting the specification and declaring UTF-8.
Thanks for the help. After reading your suggestion, I did some research on html headers and added the following lines to my .htaccess file. (The site has both htm and html extensions.):
AddType 'text/html; charset=windows-1252' html
AddType 'text/html; charset=windows-1252' htm
My <meta> tag, copied below, is before the <title> tag:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
The conflicting <meta> tag inserted by Google at the top is:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250">
I guess if I got the html header right, it should obviate the <meta> tag conflict.
Adding the charset definition in an html header didn't help. I made the change to my .htaccess file noted above on January 5, but the Google cache retrieved on January 7 at 14:55 GMT contains the same, erroneous <meta> tag at the top specifying the windows-1250 charset.
I would move to UTF-8 on a few test pages and see what you get...