stripping html tags

Forum Moderators: coopster

Message Too Old, No Replies

stripping html tags

rokec

8:22 pm on Sep 28, 2006 (gmt 0)

I know there is strip_tags() for stripping all the tags. But i want to leave strong, i, a, img tags. How can i strip all tags exept thiese?

Thanks.

ahmedtheking

8:51 pm on Sep 28, 2006 (gmt 0)

Have a read: [uk.php.net...]

Especially this bit:

<?php
$text = '<p>Test paragraph.</p> Other text';
echo strip_tags($text);
echo "\n";

// Allow <p>
echo strip_tags($text, '<p>');
?>

The above example will output:

Test paragraph. Other text
<p>Test paragraph.</p> Other text

rokec

2:16 pm on Sep 30, 2006 (gmt 0)

Yes, but i want to allow many tags, not just p

lmo4103

3:11 pm on Sep 30, 2006 (gmt 0)

str_replace or preg_replace maybe?

rokec

4:09 pm on Sep 30, 2006 (gmt 0)

I don't know a way to solve this with str_replace() or preg_match().

lmo4103

4:19 pm on Sep 30, 2006 (gmt 0)

I copy this from the php manual about preg_replace with no warranty:

// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
"'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
"'([\r\n])[\s]+'", // Strip out white space
"'&(quot�#34);'i", // Replace html entities
"'&(amp�#38);'i",
"'&(lt�#60);'i",
"'&(gt�#62);'i",
"'&(nbsp�#160);'i",
"'&(iexcl�#161);'i",
"'&(cent�#162);'i",
"'&(pound�#163);'i",
"'&(copy�#169);'i",
"'&#(\d+);'e"); // evaluate as php
$replace = array ("",
"",
"\\1",
"\"",
"&",
"<",
">",
" ",
chr(161),
chr(162),
chr(163),
chr(169),
"chr(\\1)");
$text = preg_replace ($search, $replace, $document);

It would have to be modified to suit.

[edited by: lmo4103 at 4:23 pm (utc) on Sep. 30, 2006]

jatar_k

4:21 pm on Sep 30, 2006 (gmt 0)

take a read through the user comments on the strip tags [php.net] page

ahmedtheking

5:11 pm on Sep 30, 2006 (gmt 0)

All you do is add:

strip_tags($string,"<p> <br> <tagyouwannakeep>"); that's it!

rokec

5:44 pm on Sep 30, 2006 (gmt 0)

Thank you, Hamedtheking, that is exacly what i ment.

cyt0plasm

4:51 am on Oct 4, 2006 (gmt 0)

Do be aware strip_tags has some _big_ issues. Depending on the version, you can put tags inside of tags, and have only the inner one removed.

An even bigger issue exists with onMouseOver, onClick, etc. javascript elements - you can still sneak a lot of javascript into a page protected by strip_tags. Then there's other XSS methods involving CSS abuse and other tricks.

Depending on the use, you might want to consider a bbcode library instead, or even writing a simple html->bbcode converter, then stripping all tags after it's run, _then_ running it through htmlspecialchars and back through bbcode->html again.

It's a pain, but it's really difficult to not let cross site scripting occur. By doing a lossy conversion to bbcode, you ensure that anything not matching your regex gets stripped in the stripped_tags stage. Anything remaining is converted to harmless HTML entities in htmlentities(), and the formatting is put back through your regex, which you know produces clean HTML.