homepage Welcome to WebmasterWorld Guest from 54.225.1.70
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Can't save special chars from <textarea> to a file
&spades; gets saved as a question mark
MichaelBluejay




msg:439195
 3:07 am on Mar 5, 2006 (gmt 0)

I wrote a script to let me edit the content of my website. I click the Edit link on any page, which reads in the page from the file and sticks it into a <textarea> I make the changes, click Save, and the script saves the text back to the file. Works great in IE and FireFox, but Safari (which I prefer) is giving me a weird problem:

The pages contain special character codes like &spades;. The <textarea> renders them as the actual character (the spade symbol), not the code for the character itself. That doesn't really bother me, and if I do a View Source then I see the data in the <textarea> is indeed listed as &spades;. But when I save the <textarea> back to the server, the &spades; has changed into a question mark.

I'd rather not test for &spades; specifically because I'm using a bunch of different character codes and it would be silly to test for dozens of them. I'm sure there's a more elegant solution. Any ideas?

 

DrDoc




msg:439196
 5:54 am on Mar 5, 2006 (gmt 0)

The problem is that the browser does not send "&spades;", but instead sends the spade character itself. Not quite sure what can be done about that. Perhaps use JS to change the contents of the textarea before submitting?

MichaelBluejay




msg:439197
 10:49 am on Mar 5, 2006 (gmt 0)

Sure, I could change the data with Javascript before submitting, though it would be easier to change the data with Perl when pulling the file from the disk the first time. The problem is I don't see how to do this without having to step through all possible &#*$!; character codes one at a time. :(

Damn Safari.

perl_diver




msg:439198
 3:28 pm on Mar 5, 2006 (gmt 0)

maybe try converting the '&' symbol to &amp; when the text gets loaded into the textarea field.

pinterface




msg:439199
 4:47 am on Mar 6, 2006 (gmt 0)

If the problem is what I expect, it's larger than just getting actual characters rather than their entity-encoded counterparts. What happens if "</textarea>" appears in the middle of the file you want to edit? Does the latter half of the file seem to disappear from the text box?

To avoid such issues, you want to run an html-escaping function on the data before sending it to the browser. e.g.,


sub etags {
my $text = shift;
$text =~ s/&/&amp;/gs;
$text =~ s/</&lt;/gs;
$text =~ s/>/&gt;/gs;
return $text;
}
# ...
print "<textarea>", etags($file_contents), "</textarea>";

Then things like "&spades;" should show up in the page source as "&amp;spades;", in the textarea as "&spades;" and get sent back to your script as "&spades;".

perl_diver




msg:439200
 10:02 pm on Mar 6, 2006 (gmt 0)

that's a good suggestion pinterface

MichaelBluejay




msg:439201
 10:41 am on Mar 8, 2006 (gmt 0)

Thanks, Pinterface. My solution to the <textarea> problem had been to replace <textarea> with <textareatag>, then before I saved the file I converted <textareatag> back to <textarea> That part at least worked well.

The problem with converting all &'s into &amp;'s is that my partner uses a different browser and if I do that then when he edits a file it will contain code like: <a href="http://example.com?&amp;value=1">link</a>.

I guess I can do browser-sniffing, and only replace the ampersands if the browser is Safari. I've always just tried to avoid that, but I guess there's no way around it in this case.

moltar




msg:439202
 11:28 am on Mar 8, 2006 (gmt 0)

If you use CGI.pm, then there is a escapeHTML [search.cpan.org]() method that will do the conversion for you.

Thanks, Pinterface. My solution to the <textarea> problem had been to replace <textarea> with <textareatag>, then before I saved the file I converted <textareatag> back to <textarea> That part at least worked well.

Where did you get that from? I've never seen <textareatag> before. I don't think it's a valid tag...

The problem with converting all &'s into &amp;'s is that my partner uses a different browser and if I do that then when he edits a file it will contain code like: <a href="http://example.com?&amp;value=1">link</a>.

That IS a proper way to write links. Many people neglect it and just use & without escaping it, but it's not right.

I guess I can do browser-sniffing, and only replace the ampersands if the browser is Safari. I've always just tried to avoid that, but I guess there's no way around it in this case.

No need. Just escape everything and it will work fine. The reason to this is that <textarea> has a weird side effect that confuses everyone. If you put raw HTML into it - it displays it fine, when really it shouldn't.

Think about it. What if the page you want to edit had a textarea in the code? You read the file, place it into the textarea on your editing screen. Then the textarea from the page you want to edit will close the textarea tag earlier that needed... That's why you need to escape all the tags, so that the browser does not treat the literaly. All browsers un-escape all the tags back when they send it to the server, so you shouldn't worry about that.

MichaelBluejay




msg:439203
 8:13 am on Mar 10, 2006 (gmt 0)

Moltar, I think you're missing the points.

1. Of course <textareatag> isn't a valid tag. It's what I use internally in order to get the code for the textareas to show up in my editing box. Like I said, I change it back to <textarea> when saving it.

2. A url with "&amp;" instead of "&" will definitely, definitely break. There are many other things that break, for example, a Javascript statement like: "if (a &amp;&amp; b)...".

3. Regarding your "no need" comment, I don't think I can explain it better than I already have. You might want to have a look at my description of the problem again.

moltar




msg:439204
 12:46 pm on Mar 10, 2006 (gmt 0)

1. Of course <textareatag> isn't a valid tag. It's what I use internally in order to get the code for the textareas to show up in my editing box. Like I said, I change it back to <textarea> when saving it.

Sure, but you are just making your life harder.

2. A url with "&amp;" instead of "&" will definitely, definitely break. There are many other things that break, for example, a Javascript statement like: "if (a &amp;&amp; b)...".

URLs with &amp; do work and it's encouraged to use them to avoid confusion with the start of another entity.

JS will not break.

3. Regarding your "no need" comment, I don't think I can explain it better than I already have. You might want to have a look at my description of the problem again.

I think I understand it fully. It's a trivial problem. Here is my solution in Perl. If you want better formatted code, sticky me with your email address and I'll send them to you.

Once you run the code and it loads the file into the textarea, view the source and notice that all the <>& characters were escaped automatically by the CGI module.

Perl Script

#!/usr/bin/perl

my $file = 'index.html';

use strict;
use warnings;
use File::Slurp;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $q = new CGI;
write_file($file, $q->param('file')) if $q->param('file');
print
$q->header,
$q->start_html($file),
$q->h1($file),
$q->start_form,
$q->textarea('file', scalar read_file($file), 10, 80),
$q->submit,
$q->end_form,
$q->end_html;
1;

Example HTML file I used to test the script

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

<title>Form Test</title>
<script type="text/javascript">
if (a && b);
</script>
</head>

<body>
<div id="container">

<p>We've got tags and special characters such as &amp;, &spades;, and &nbsp; ...</p>

<p>Also we <a href="http://www.google.com/search?oe=UTF-8&amp;q=learning+perl">have links</a> that contain &amp; signs...</p>

<p><a href="http://www.w3.org/TR/REC-html40/charset.html#h-5.3.2">Authors should use "&amp;amp;"</a> (ASCII decimal 38) instead of "&amp;" to avoid confusion with the beginning of a character reference (entity reference open delimiter).</p>

</div>
</body>
</html>

MichaelBluejay




msg:439205
 9:29 am on Mar 13, 2006 (gmt 0)

Okay, moltar, you're completely right about this. My apologies for misunderstanding.

After some testing, I found that the magic difference between your code and mine was that you printed the form input with CGI (i.e., print $q->textarea rather than $q->textarea('file', scalar read_file($file))). I guess that CGI takes care of all the conversions. I make that change to my file editor and now it appears to properly work cross-platform, and I don't have to screw around with converting back and forth between <textarea> and <textareatag>.

Thanks very much for your help, and your patience.

moltar




msg:439206
 12:26 pm on Mar 13, 2006 (gmt 0)

That's why I suggested in my earlier post to use escapeHTML(). It's internally called by default if you use the built in form generating functions. Otherwise if you want to generate all HTML yourself, just call escapeHTML(), supply your "dirty" HTML code as an argument and it will return "clean" and escaped code back.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved