Forum Moderators: open
Specifically I would like to shorthen the tags and put the "&" character in the description.
Is this advisable? And are there any rules regarding characters in the tags?
Thanks in advance
The description property is probably the most important one since it is displayed by some search engines on the SERPs.
The content attribute value may contain CDATA [w3.org] both in HTML4 [w3.org] and XHTML [w3.org].
You should use '&' instead of '&' in the content attribute. Other characters are ok as long as you specify the correct character encoding [w3.org].
The Content-Type is a field in the HTTP header (14.17 Content-Type) specifying the media type (text/html, image/png) and in its charset parameter the character encoding. The HTTP protocol [ietf.org] does not require that field. In fact it specifies a default character encoding in the absence of the charset parameter.
Whether the HTML spec requires a valid document to specify its character set and encoding is a totally different matter. It must be answered from the HTML spec:
To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
- An HTTP "charset" parameter in a "Content-Type" field.
- A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
- The charset attribute set on an element that designates an external resource.
In Using Character Encodings [htmlhelp.com] Liam Quinn writes that [a]n HTML document must specify its character encoding. However, I did not found evidence to support that in the HTML4.01 Spec. In section 5.1 The Document Character Set [w3.org] it says that SGML requires that each application (including HTML) specify its document character set. Character set and encoding are not the same (see note at the bottom of this post). So I´m a bit baffled by that. To me it seems, that you are not required to specify the character encoding. Although I would suggest you do, if you want your pages to show correctly. Perhaps someone more knowledgeable in these matters may shed some light on that.
if not does the browser default to one?
The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter.
Fourth paragraph after the Specifying the character encoding [w3.org] heading.
There is also a section on Using national and special characters in HTML [cs.tut.fi] on Jukka Korpela´s excellent IT and communication [cs.tut.fi] website.
Hope this helps
Andreas
------
At least in this context they are not. Evidence: same level section headings in the HTML spec and the following quote:
The document character set, however, does not suffice to allow user agents to correctly interpret HTML documents as they are typically exchanged -- encoded as a sequence of bytes in a file or during a network transmission. User agents must also know the specific character encoding that was used to transform the document character stream into a byte stream.In RFC 2616 - HTTP Protocol [ietf.org] they are used interchangeably. See note in section 3.4 Character Sets.