XHTML Strict and Textarea Element - a question - HTML forum at WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

XHTML Strict and Textarea Element - a question

Leaving the ampersand as literal text within the element?

ricfink

6:30 am on Nov 29, 2003 (gmt 0)

I often use a hidden textarea element to hold blocks of text containing HTML so that I can access it using the innerText property later on to make changes to the page.

Example:
<textarea style="display: none;"><p>here is some text about this & that</p></textarea>

My question is: does anyone know offhand if this is legal in XHTML strict? Or do I technically have to use, for instance, < to designate the "lesser than" characters in the example. (As well change the "greater than" chars and the ampersand.)

iamlost

10:16 pm on Nov 29, 2003 (gmt 0)

I suggest you validate all you code attempts at:

[validator.w3.org ]

I ran tests your code fragment using both xhtml1 strict and xhtml1.1 strict. Both returned the following errors:
1. character "&" is the first character of a delimiter but occurred as data
2. required attribute "rows" not specified
3. required attribute "cols" not specified
4. document type does not allow element "p" here

ricfink

12:47 am on Dec 1, 2003 (gmt 0)

Thanks iamlost. I was being lazy because I'm away on vacation.

Actually, I believe the prohibition against the ampersand not being encoded as & only applies to attribute values.
In other words, if I had a logo image for the accounting firm H & R Block and put it in an alt attribute, it would have to read alt="H & R Block Logo" in order to be compliant.
At least, that's the way I now remember it.

However, I'm intrigued by what the heck those "error" messages mean...

1. character "&" is the first character of a delimiter but occurred as data

Obviously the validator doesn't like my ampersand being where it is. However, my understanding is that anything between the textarea tags is to be treated as default text - no & required. In fact, I just realized that what I've just typed proves the point: if the textarea required & instead of & it would show up as & when I type & and it doesn't.
So what the heck is the validator barking about?
--------------------------------------------------------

2. required attribute "rows" not specified
3. required attribute "cols" not specified

These two I understand but I don't think the spec insists on these attributes being present. A nice tip, but not an indication of non-compliance
--------------------------------------------------------

4. document type does not allow element "p" here

Let's play, "Fool the Validator!" Hey, I understand why the validator is flagging this but the validator is wrong. No reason that I can see why HTML code can't be placed within a textarea tag under XHTML. I wonder what would happen if I put the same p tag within a pre element?

Anyway, thanks.

iamlost

2:09 am on Dec 1, 2003 (gmt 0)

w3c <textarea> specs require both "rows" and "cols". Other attributes are considered implied:

<textarea rows="20" cols="80"></textarea> is correct minimum.

<p> is not allowed because the textarea is for the user to enter data to be sent to you, not for you to format anything.

Anything being sent back to you is data and that explains the "&" problem.

You may be using <textarea> incorrectly. It is not a "display text" element it is a (usually in a form) "get lengthy user input" element.

ricfink

7:19 am on Dec 2, 2003 (gmt 0)

OK, now you've done it. You made me look up the spec. I said I was on vacation!
OK - rows and cols are required. Granted.

Thanks for trying to teach me about forms, but it's incorrect to say:
"<p> is not allowed because the textarea is for the user to enter data to be sent to you, not for you to format anything."
Wrong. This is what the spec says:

"User agents should use the contents of this element as the initial value of the control and should render this text initially."
In other words, whatever is in between the tags <textarea></textarea> should be rendered. The validator is providing a useful tip but the markup isn't non-compliant.

And the reason why I'm using a textarea element is because it's the simplest element to use when creating a textRange object.

Appreciate the help.

rich

DrDoc

7:33 am on Dec 2, 2003 (gmt 0)

<p> is not text, it's a tag ;)

In other words, whatever is in between the tags <textarea></textarea> should be rendered.

Then, how about this:

What is the browser supposed to do about that?

ricfink

3:28 pm on Dec 2, 2003 (gmt 0)

Whoa, boy. The doctor is in on this too, now...
With a trick question, no less.
OK, let's see how many angels can dance on the head of this pin:

What the browser is supposed to do is exactly what it would do if a user was inputting that text into the textarea element: that is, render the text </textarea> as text, not markup. Just as I am doing now in responding to you. (Before I hit submit, that is.) You can think of it as a macro of sorts.

This is why the textarea element is called a textarea element. That's why it has a mandatory closing tag as opposed to a text input element which has no closing tag.
With a text input element, if you want to show a default bit of text, it has to be written as a value attribute. As in: <input type="text" value="Show this text as default" />
This is also why the W3C spec says what it says, unless you have another explanation. (See previous post.) This is also why the text within a button element - which also requires a mandatory closing tag - can be used to easily create a textrange object, as well. This is why some DHTML tutorial sites I've seen opt for displaying markup within a textarea element - it saves them the hassle of having to encode the tags.

Now, in the case of your example in IE, it -incorrectly -does not display the text because IE is making a value judgment about the intent of the author. It assumes that the textarea element is being closed twice by mistake and displays the textarea with no default text and simply ignores the second </textarea> as an unwanted duplication.
If you substitute </p> for </textarea> it works just fine.
It will also work just fine if you hard-code:
</textarea>

Obligingly, IE automatically will convert it to unencoded text if you view it with the innerText property and leave it encoded if you view it with the innerHTML property.
Which leads me to a DHTML tip: If you put straight unencoded markup within a textarea element, you can use the innerHTML property to encode the tags and ampersands with an absolute minimum of javascript code - no regular expressions necessary, IE will convert it automatically.

Make sense?

DrDoc

4:17 pm on Dec 2, 2003 (gmt 0)

What about the other browsers?

DrDoc

5:02 pm on Dec 2, 2003 (gmt 0)

<!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field -->
<!ATTLIST TEXTAREA
%attrs; -- %coreattrs, %i18n, %events --
name CDATA #IMPLIED
rows NUMBER #REQUIRED
cols NUMBER #REQUIRED
disabled (disabled) #IMPLIED -- unavailable in this context --
readonly (readonly) #IMPLIED
tabindex NUMBER #IMPLIED -- position in tabbing order --
accesskey %Character; #IMPLIED -- accessibility key character --
onfocus %Script; #IMPLIED -- the element got the focus --
onblur %Script; #IMPLIED -- the element lost the focus --
onselect %Script; #IMPLIED -- some text was selected --
onchange %Script; #IMPLIED -- the element value was changed --
>

...any character string not containing elements, known as "PCDATA"

iamlost

7:37 pm on Dec 2, 2003 (gmt 0)

ricfink - On vacation? You need lessons!

Clarification of issues (I hope! ;-):

text input
Authors may create two types of controls that allow users to input text. The INPUT element creates a single-line input control and the TEXTAREA element creates a multi-line input control. In both cases, the input text becomes the control's current value.

One is designed for short input, one for long.

<!ENTITY % InputType
"(TEXT � PASSWORD � CHECKBOX �
RADIO � SUBMIT � RESET �
FILE � HIDDEN � IMAGE � BUTTON)"
>
This attribute specifies the type of control to create. The default value for this attribute is "text".
Creates a single-line text input control.

INPUT is a multipurpose container for short input that can be of varying appearance and use. The input can be a click or a word, visible or not, etc. Note that text is the default and does not have to attributed.

The TEXTAREA element creates a multi-line text input control. User agents should use the contents of this element as the initial value of the control and should render this text initially.

TEXTAREA is a way to get text input that requires more space than a line provides. Note that all such input is to be rendered as text i.e. as in a text editor - formatting is not implied.

Setting the readonly attribute allows authors to display unmodifiable text in a TEXTAREA.

>>input element:

value = cdata [CA]
This attribute specifies the initial value of the control. It is optional except when the type attribute has the value "radio" or "checkbox".

So both elements may also be given display ability (one of your points I believe) of (as I understand it) unformatted text. This use is something I had forgotten as have never had occassion to think it a viable display medium. I would be quite interested in why you find <textarea> a useful display method.

Now go put in some vacation practice!

ricfink

6:49 am on Dec 3, 2003 (gmt 0)

I'm back in New York and it's very cold. My house in Naples Florida was much preferable.
I'm beginning to think we're going to have to settle this in Webmasterworld's virtual parking lot.
Step outside, DrDoc will referee.

Seriously, I've never used the textarea element as a display device but I've seen it done - especially on code tutorial sites, like I said - for two reasons:
1) You don't have to go to the trouble of replacing greater than, lesser than, and ampersands with their encoded equivalents.
2) It's a lot easier for the user to cut and paste out of a textarea box than anything else. (Just right click - choose "select All" - Ctrl+C - and you're ready to paste.) In IE you can even script it using the execCommand method.

What I use the textarea element for in addition to it's main use as a form element to get user input, is as a hidden element where I can put raw, unencoded HTML and script code for different purposes.
As just an example: let's say you don't want to navigate away from the page to keep server calls to a minimum. Well, you can hide an entire document's worth of HTML inside a hidden textarea element, grab it using the innerText property, and assign it as the new innerHTML for the entire page. This way, multiple pages can easily be included in a single file.
It's another, quite valid use, for the element. After all, what else could I use to get the same effect as easily? Where could I hide the HTML and still write valid XHTML code?
Hey, it sure isn't the first time an element that was designed for one primary purpose got used for another secondary purpose.
How often does a table element actually get used to display tabular data as opposed to using it to create an onscreen grid for layout purposes?
Why use an image to display text?

The original inventor of the steam engine saw absolutely no other use for it other than to pump water out of coal mines.
Robert Fulton and others had a few other ideas and the rest is history.

Did the inventors of HTML foresee all of the things HTML is being used for today? If everybody got hung up on what the original intentions were, all we would have is technical documents on the web.

alexhudson

11:28 am on Dec 3, 2003 (gmt 0)

You can't do that "validly" in any version of HTML - this isn't an XHTML Strict question (though, doubtless, it would actually work in tag soup mode).

There seems to be some confusion over the difference between a character and an entity. The character & is the same as & - it's just a different way of writing it. Certain characters (<, >, &, ", etc.) are used for marking the document up: you cannot use them for data, since you would not be able to distinguish between data and markup. So, as you know, you have the literal substitutes. But these do represent the characters themselves: & is not the five characters & + a + m + p + ;, it's the ampersand.

Think about this:

<textarea id="test">
<p>This is some <b>styled</b> html</p>
</textarea>

<script language="Javascript">
area = document.getElementById('test').value;
document.write (area);
</script>

SuzyUK

11:55 am on Dec 3, 2003 (gmt 0)

After all, what else could I use to get the same effect as easily? Where could I hide the HTML and still write valid XHTML code?

would a hidden div do it too?

It will validate with sub elements inside it (which is the reason textarea doesn't validate ).. you will still need to encode ampersands and maybe some other characters that XHTML requires but the HTML can remain "raw"

and using alexhudsons script example (thankyou;))... this validates

<div id="test2" style="display: none;">
<h1>This is some <em>styled</em> html & more</h1>
</div>
<script language="JavaScript" type="text/javascript">
area2 = document.getElementById('test2').innerHTML;
document.write (area2);
</script>

Suzy

alexhudson

2:49 pm on Dec 3, 2003 (gmt 0)

SuzyUK - there is another difference between your example and mine. Because I'm acting on data, the HTML is 'neutral'. Yours is actually markup - so is not neutral. Even though the div could/would be hidden, it will still side-effect the page.

The difference between mark-up and data is crucial here, and I think is what is being lost. If you don't understand the difference you will get odd bugs like this one:

This is a very contrived example (and is easily fixed ;), but shows what is going on. Our clever 'active' navigation is being thwarted. In Mozilla, changing the order of the navholder / subnav divs will make it work. However, this is relying on undefined behaviour.

Also, this page would never validate (for many reasons ;) - by using a mark-up structure rather than a data structure, you need to avoid id conflict (for example), and although div allows many more tags to be contained within it, there are still tags that cannot be contained. It's not actually "fixing" the problem, it's just making it less evident.

DrDoc

4:12 pm on Dec 3, 2003 (gmt 0)

So, use display:none on the hidden div instead of visibility:hidden :)

SuzyUK

4:39 pm on Dec 3, 2003 (gmt 0)

alex, thanks for explanation

Yours is actually markup - so is not neutral

Yes I know, but that is my understanding of what ricfink is trying to do.. that is hide HTML markup (as opposed to the data), until it's required?... but I'm probably wrong!

there are still tags that cannot be contained.

I'm finding it hard to think of any since you can literally write an entire page inside a containing div.. although I'm not sure if ricfinks intention would be to include metadata

and also (presuming I'm not totally of track!) you could for example use an ID on the body element and change the entire content of the page without using a div.. I just used <div> as the best known example.. and if it's just sections of the page then divs would make more sense..

I'm not a javascript person, but I'm presuming that the innerHTML property is for retrieving just that.. help ;)

Ricfink is any of this rambling helping? If not please put me out of my misery!

Suzy

ricfink

5:43 pm on Dec 3, 2003 (gmt 0)

OK. I've been up and down the HTML 4.01 spec for about an hour now and it didn't do much for my faith in validators. (Which, by the way, can only cope with what their creators can foresee.)

Ain't nothin' invalid about this:
<textarea cols="20" rows="20"><textarea></textarea></textarea>

It doesn't display as I intend but that's because the browser is automatically trying to correct what it thinks is an error. And that's understandable and OK.(See previous post.)

Here's the central idea from the HTML 4.01 spec at W3C, emphasis is mine:
"Each control has both an initial value and a current value, both of which are CHARACTER STRINGS. Please consult the definition of each control for information about initial values and possible constraints on values imposed by the control. In general, a control's "initial value" may be specified with the control element's value attribute. However, the initial value of a TEXTAREA element is given by its contents, and the initial value of an OBJECT element in a form is determined by the object implementation (i.e., it lies outside the scope of this specification)."

drbrain

6:00 pm on Dec 3, 2003 (gmt 0)

DrDoc is right, any use of elements or unescaped & is illegal (character entities are legal). PCDATA means PCDATA, if you stick an element in there, its an error. (See the XHTML1.1 Forms module [w3.org], line 205).

In XHTML, you'll get a parse error (when parsed by an XML parser).

In HTML (SGML parser), a compliant UA is not required to do anything with these documents since they are in error. Rendering as if the contents were CDATA is something you cannot depend upon, because the UA is not required to do anything for documents with errors. A browser could "make David Siegel fly out of your nose" and it would be perfectly compliant.

To get the behavior you're looking for, put the contents in a CDATA section (note that most HTML UAs do not have full SGML parsers, so this may or may not work, it should work in every XML browser):

<textarea rows="10" cols="30"><![CDATA[&not_char_entity <b>element</b>]]></textarea>

alexhudson

6:03 pm on Dec 3, 2003 (gmt 0)

SuzyUK, there's a difference between hiding HTML and hiding structure. Take these two:

<div><p>Paragraph one</p></div>

<div><p>Paragraph two</p></div>

The first is holding the string "<p>Paragraph one</p>". The second is holding another node whose name is "p", who has no attributes and has a child text node containing the string "Paragraph two". Do you see the difference? innerHTML will serialize that to a string, but it's not actually a string itself, it's a data structure.

At the end of the day, though, you're getting back the same string (modulo the number). However, by keeping it as a string rather than a DOM structure you're not having to worry about any side-effects. (DrDoc: if your suggestion of display: none was to stop the side-effects, I don't think that will work. That just prevents the structure being rendered, it doesn't stop the browser acting on the structure).

As a more practical example, what if you want two code fragments? You can't do that structurally and retain validation. Example (again contrived ;):

<div style="display:none;">
<div id="start"><h1>My heading: </div>
<div id="end"></h1></div>
</div>

<script language="Javascript">
function str_val (x) {
return document.getElementById(x).firstChild.nodeValue;
}

start = str_val ('start');
end = str_val ('end');
document.write (start + "Product page" + end);
</script>

Incidentally, this confusion is the main cause people get so tied up with innerHTML() - it's very difficult to use that function with static strings (e.g., innerHTML = '<p>My paragraph</p>';) and keep the page validating, for all the same reasons as above. That's why it's not in the W3C spec, and why we have all these node creation functions: it prevents confusion over what precisely is going on. If you understand serialization, think of innerHTML in those terms and you won't go far wrong. To illustrate the difference, try replacing 'firstChild.nodeValue' above with 'innerHTML' - that might be clearer than all my waffling ;)

alexhudson

6:11 pm on Dec 3, 2003 (gmt 0)

ricfink, that is most definitely invalid.

.. would be parsed to:

-[node] textarea
¦- [attr] cols=20
¦- [attr] rows=20
¦- [content] ""
¦-[node] textarea
¦--[content] ""

What you want it to parse to is:

-[node] textarea
¦- [attr] cols=20
¦- [attr] rows=20
¦--[content] "<textarea></textarea>"

What information in the above do you think the browser can use to tell which you mean? The answer is there is none: the browser cannot tell the difference between <textarea> the tag and <textarea> the string you want to have as content. That's why <, >, & etc. are not allowed as data.

DrDoc

6:42 pm on Dec 3, 2003 (gmt 0)

If in doubt -- we can always contact W3C to get a definite answer.

ricfink

6:02 am on Dec 4, 2003 (gmt 0)

Look, in order for the textarea element to function in the way we've all come to know and love, it has to treat whatever the user types in as raw text. Whether it's markup or not. And it does that. It really can't be PCDATA and still function as it always has. Don't care what the spec says. (Also, HTML 3.2 doesn't list it as PCDATA)
Hence, there's really no way to enforce XHTML when using the element because you can't control what the user is going to enter.
If the user wants to enter the word </textarea> then, at that point, it can't be legit XML. Unless the UA is automatically replacing the markup characters with character references behind the scenes.
If you check the value of the element it says </textarea>
If you check the innerHTML property it says </textarea>
If you check the innerText property it says </textarea>

I think that, in regard to this element as it is defined in the HTML spec and as it's traditionally been implemented, there is a natural conflict with XML. To bring it into line with XML would mean making it cease doing some of the useful things that it does.

And, heck, it does what I want it to do and I started this thread so what do you say we keep quiet about it, eh?
No point rocking the boat and mucking up a good thing!

ciao

P.S. alexhudson, you keep saying:
"That's why <, >, & etc. are not allowed as data."
If so, then what the heck is happening when I type in <<<&&&>>> into a textarea and that is what I see?
Not allowed as data by what, where? Is not a block of CDATA data?

alexhudson

8:44 am on Dec 4, 2003 (gmt 0)

If so, then what the heck is happening when I type in <<<&&&>>> into a textarea and that is what I see?
Not allowed as data by what, where? Is not a block of CDATA data?

Different subject ;) What you as a user enter into a textarea is completely different from what exists in the textarea in the document.

When I say <, >, & etc. are not allowed, I mean within the document. For example, this:

... is precisely identical to entering '&' in the same textarea as a user. You seem to think that there is a conflict here, but there is none. The document contains the representation of the string, and is the only valid way to get that string into the textarea. When you type information into a textarea as a user, there is no ambiguation over whether or not it's markup or data; it's just all data. Hence, there is no need to encode the string specially.

There is no conflict between XML and HTML, it's just getting your head around document encodings :) It's like specifying character sets: unless you tell the browser what character set you're using, it can't actually tell what each character is (although it will probably guess ASCII).