Forum Moderators: coopster

Message Too Old, No Replies

What are current best practices for safely handling form input?

         

zollerwagner

10:59 am on Apr 24, 2009 (gmt 0)

10+ Year Member



I've been using ereg with regular expressions to test HTML form input before storing it in a database, but I've read that ereg won't be supported by PHP6. I'd like to be sure that the code I'm writing today will still work five years from now.

I've considered replacing ereg with PHP'a filter_var functions.

$filter_text = filter_var($text, FILTER_SANITIZE_STRING);

But there seems to be controversy about whether they are good enough to provide security. Some say that a combination of strip_tags() and htmlentities() is better. Others say that those two are too simple for hackers to break through.

What are the best practices for filtering or cleaning form input now?

midtempo

2:25 pm on Apr 24, 2009 (gmt 0)

10+ Year Member



don't know about "best practice", but i parse everything through this...


function my_htmlspecialchars($t="") {
// use forward look up to convert & and not {
$t = preg_replace("/&(?!#[0-9]+;)/s", '&', $t );
$t = str_replace( "<", "&lt;" , $t );
$t = str_replace( ">", "&gt;" , $t );
$t = str_replace( '"', "&quot;", $t );
$t = str_replace( "'", '&#039;', $t );

// and some bad stuff
$t = preg_replace( "/javascript/i" , "j&#097;v&#097;script", $t );
$t = preg_replace( "/alert/i" , "&#097;lert" , $t );
$t = preg_replace( "/about:/i" , "&#097;bout:" , $t );
$t = preg_replace( "/onmouseover/i", "&#111;nmouseover" , $t );
$t = preg_replace( "/onclick/i" , "&#111;nclick" , $t );
$t = preg_replace( "/onload/i" , "&#111;nload" , $t );
$t = preg_replace( "/onsubmit/i" , "&#111;nsubmit" , $t );
$t = preg_replace( "/document\./i" , "&#100;ocument." , $t );

return trim($t);
}

[edited by: eelixduppy at 1:25 am (utc) on April 25, 2009]
[edit reason] disabled smileys [/edit]

Frank_Rizzo

3:12 pm on Apr 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



$filter_text = filter_var($text, FILTER_SANITIZE_STRING);

Works fine but I would also limit the length of the string:

$filter_text = substr(filter_var($text, FILTER_SANITIZE_STRING), 0, nnn);

If the $text is a username and your account creation policy limits usernames to upto 10 chars then substr for 10 chars. If the $text is a state field then substr for 2 or 3 chars etc.

For numeric fields you could put a 1 * multiplier in there.

$filter_number = 1 * filter_var($number, FILTER_SANITIZE_STRING);

If someone does try to enter bad code in the numeric field box it will always return 0.

zollerwagner

1:03 am on Apr 25, 2009 (gmt 0)

10+ Year Member



Thanks for the input.

I've been trying to combine your approaches. Overall it's working, but if I type in <script or <script> (with or without the terminal > ) it chokes. The server reads everything that follows as if it's code and truncates the screen output.

It does the same if I type in <b or <b>, with less catastrophic results.

I've tried to use this:

$t = preg_replace( "/script/i" , "&#115;cript", $t ); 

to trim or remove the script, but it doesn't help.

Running

$t = filter_var($t, FILTER_SANITIZE_STRING);

first didn't help either.

How do I fix this problem?

I've added this:

if (preg_match("/<script/i", $t)) die("Prohibited input"); 

but it stops the screen display and removed all navigation.

midtempo

2:21 pm on Apr 25, 2009 (gmt 0)

10+ Year Member



this section:

$t = str_replace( "<", "&lt;" , $t );
$t = str_replace( ">", "&gt;" , $t );

should already deal with the mark-up, including your <script> issue.

perhaps if you provide your entire function i can see whether there's anything i can spot?

zollerwagner

8:31 am on Apr 26, 2009 (gmt 0)

10+ Year Member



Thanks, again MidTempo.

I've been studying the CMS I'm writing this plugin for (Textpattern) to see how it handles validation. I don't think it does!

They use mysql_real_escape_string before adding input to the database and then use htmlspecialchars when displaying the data later. I don't see anything else.

It appears that the incompatibility arises when I use

$text = filter_var($text, FILTER_SANITIZE_STRING);

That stumped me because I didn't think that FILTER_SANITIZE_STRING encoded ' and ", but apparently it does. Once they get encoded, Textpattern will show the code: &#39;

That's because the htmlspecialchars function translates the & and displays the underlying code rather than the ' glyph. So a name like O'Hara becomes O&#39;Hara.

Well, at least I know what's going on.

Any suggestions?

------

For what it's worth, I also found that typing: <code><script<code> into an input field in Textpattern also breaks the display, so it's not just my script, it's the way TXP, too. The script never has a chance to replace the < or translate "script". But I can at least stop processing with something like this:

if (preg_match("/<script/i", $text)) die("Prohibited input");

dreamcatcher

8:41 am on Apr 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use a similar method to midtempo, but also run the filter through strip_tags [uk3.php.net] for additional filtering. If HTML isn`t allowed in your code it offers a pretty effective defence.

dc

idfer

6:05 pm on Apr 26, 2009 (gmt 0)

10+ Year Member



I'm curious to know how htmlentities() can be hacked. I've done a quick search on the net and also php.net and the only thing i found was a reference to a buffer overflow vulnerability (which was fixed in PHP 5.2?).

For my part, i do this:

- Validate all user input, make sure it doesn't have any strange characters for the type of value in question, make sure it's not longer than the db field size, etc.
- When i build the SQL statement, i wrap all text values with mysql_real_escape_string(), and all numeric values with intval().
- When i output values to HTML, i wrap them with htmlentities(), this is both for user input and values coming from the database.

There's really no danger of storing HTML tags in the database without translation. It's on the output side that you have to be careful, and as long as you pass all data through htmlentities(), you should be fine. In fact, if you translate HTML special characters like < and > before storing the data in the db, you'll have to untranslate them for other types of output like csv files.