Forum Moderators: Robert Charlton & goodroi
just checked my webmaster tools, and I saw that my site is listed as having all its content listed as USASCII... Surprise surprise..
Its a very valid XHTML 1.0 Transitional site. At least according to w3 org.
But, my header has utf-8 in lower case.
And G itself has UTF in capitals.
Who is to blame here? And would there be any side effects to this? At the moment not, looking at the traffic G sends...
I saw that my site is listed as having all its content listed as USASCII
What does your server header say for character encoding? Not the meta tag in the html source code, but the http header sent from the server? These two should agree, but when they do not the higher priority is the http header.
As far as case goes, I'm not aware that it's an issue - I see it both ways in the server header, but Google probably uses upper case as the standard for their reporting.
Checked a few others, on the same hoster and another hoster, and they give the same, just text/html
And there the webmaster tools give the correct character set, in that case windows-1252 (in header of actual page)
I think its a bug. A real Google bug. The bot doesn't understand lower case utf... or webmaster tools doesn't report correctly.
Unless a lower case utf-8 as language descriptor is not valid.
In that case, w3 has a bug in their validation.
Anyway, I changed things to uppercase.
Would be nice to know if other webmasters have the same.
Get those server response headers in shape, correct the HTML meta-tags if you use them, and then see if this "case error" goes away.
For Apache, see AddType and AddCharset in mod_mime.
Jim
Checked a few other sites, even with windows-1252, also listed totally as US-ASCII.
A phpbb3 forum, ALL of its page are in UTF-8 (uppercase), and there the webmaster tools report about 70% in US-ASCII, and 29% in UTF-8 and 1% unknown...
I am seeing ghosts ?
[edited by: Gede at 10:36 pm (utc) on Aug. 22, 2008]
application/xhtml+xml
I hate to correct jdMorgan, but neither Internet Explorer nor Googlebot understand this mime type, so you should absolutely stick to
text/html even if you are using XHTML. You may be right about this being a bug, as using lower-case (utf-8) should be accepted. However, I've not done any analysis so I can't say this with any degree of certainty. What's more, I always use upper-case so I don't have a test case to compare.
Is your content really UTF-8, in that you are using at least some double-byte characters in your content, or are you using English with nothing outside the US-ASCII range?
Googlebot has no problems indexing the site, as far as I can see, and the site does well enough.
I have requested the hoster to make changes so the site is correctly served, and if they don't want to make changes, I'll put it in the header myself with php.