Forum Moderators: Robert Charlton & goodroi
French site encoded in ISO-8859-1 is seen as 100% text/html, ISO-8859-1 (Latin-1).
Chinese site encoded in GB2312 is seen as 100% text/html, GB (Simplified Chinese).
Chinese site encoded in Big5 is seen as 100% text/html, Big5 (Chinese).
But the English language sites are listed as either 100% US-ASCII, or mainly US-ASCII with only a few pages in ISO-8859-1.
I am at a loss to understand this or what to do about it. Does this have any implications for the way Google indexes these sites?
All sites use the same doctype format.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Language" content="en" />
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
For non-English language sites I change the encoding, etc., in the usual way. For example my French site has:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">
<head>
<meta http-equiv="Content-Language" content="fr" />
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
Any suggestions welcome.
In the caches of my non-English pages both meta tags are identical. I.e., in cached French pages both meta tags are "charset=ISO-8859-1", and in cached Chinese pages both are "charset=Big5" or "charset=GB2312".
But in the English pages the two meta tags are different.
The meta tag pre-pended by Google is: <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">.
Below the Google stuff is my doc type and meta tag: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />.
This has left me more confused. I can understand why Google would see the encoding of Chinese pages correctly even without my meta tag, because the text is actually created in that encoding. But the French pages are created with notepad on the same PC and keyboard as the English pages. The only difference is that I occasionally toggle between FR and EN on the MS language interface to produce characters with accents.
I am not enough of an expert on encoding to understand why this should be, or more importantly, whether I should worry about it. :(
I have now deleted it from all, so will have to wait for the next crawl to see whether this was the problem.
[added]
trinorthlighting,
You mentioned you have a similar problem. I notice you also use © in your site's footer. May be coincidence, of course.:)
[/added]
[edited by: HarryM at 1:04 pm (utc) on Oct. 11, 2006]