Forum Moderators: coopster
I've considered replacing ereg with PHP'a filter_var functions.
$filter_text = filter_var($text, FILTER_SANITIZE_STRING); What are the best practices for filtering or cleaning form input now?
function my_htmlspecialchars($t="") {
// use forward look up to convert & and not {
$t = preg_replace("/&(?!#[0-9]+;)/s", '&', $t );
$t = str_replace( "<", "<" , $t );
$t = str_replace( ">", ">" , $t );
$t = str_replace( '"', """, $t );
$t = str_replace( "'", ''', $t );// and some bad stuff
$t = preg_replace( "/javascript/i" , "javascript", $t );
$t = preg_replace( "/alert/i" , "alert" , $t );
$t = preg_replace( "/about:/i" , "about:" , $t );
$t = preg_replace( "/onmouseover/i", "onmouseover" , $t );
$t = preg_replace( "/onclick/i" , "onclick" , $t );
$t = preg_replace( "/onload/i" , "onload" , $t );
$t = preg_replace( "/onsubmit/i" , "onsubmit" , $t );
$t = preg_replace( "/document\./i" , "document." , $t );
return trim($t);
}
[edited by: eelixduppy at 1:25 am (utc) on April 25, 2009]
[edit reason] disabled smileys [/edit]
Works fine but I would also limit the length of the string:
$filter_text = substr(filter_var($text, FILTER_SANITIZE_STRING), 0, nnn);
If the $text is a username and your account creation policy limits usernames to upto 10 chars then substr for 10 chars. If the $text is a state field then substr for 2 or 3 chars etc.
For numeric fields you could put a 1 * multiplier in there.
$filter_number = 1 * filter_var($number, FILTER_SANITIZE_STRING);
If someone does try to enter bad code in the numeric field box it will always return 0.
I've been trying to combine your approaches. Overall it's working, but if I type in <script or <script> (with or without the terminal > ) it chokes. The server reads everything that follows as if it's code and truncates the screen output.
It does the same if I type in <b or <b>, with less catastrophic results.
I've tried to use this:
$t = preg_replace( "/script/i" , "script", $t ); Running
$t = filter_var($t, FILTER_SANITIZE_STRING); How do I fix this problem?
I've added this:
if (preg_match("/<script/i", $t)) die("Prohibited input"); I've been studying the CMS I'm writing this plugin for (Textpattern) to see how it handles validation. I don't think it does!
They use mysql_real_escape_string before adding input to the database and then use htmlspecialchars when displaying the data later. I don't see anything else.
It appears that the incompatibility arises when I use
$text = filter_var($text, FILTER_SANITIZE_STRING); That stumped me because I didn't think that FILTER_SANITIZE_STRING encoded ' and ", but apparently it does. Once they get encoded, Textpattern will show the code: '
That's because the htmlspecialchars function translates the & and displays the underlying code rather than the ' glyph. So a name like O'Hara becomes O'Hara.
Well, at least I know what's going on.
Any suggestions?
------
For what it's worth, I also found that typing: <code><script<code> into an input field in Textpattern also breaks the display, so it's not just my script, it's the way TXP, too. The script never has a chance to replace the < or translate "script". But I can at least stop processing with something like this:
if (preg_match("/<script/i", $text)) die("Prohibited input");
dc
For my part, i do this:
- Validate all user input, make sure it doesn't have any strange characters for the type of value in question, make sure it's not longer than the db field size, etc.
- When i build the SQL statement, i wrap all text values with mysql_real_escape_string(), and all numeric values with intval().
- When i output values to HTML, i wrap them with htmlentities(), this is both for user input and values coming from the database.
There's really no danger of storing HTML tags in the database without translation. It's on the output side that you have to be careful, and as long as you pass all data through htmlentities(), you should be fine. In fact, if you translate HTML special characters like < and > before storing the data in the db, you'll have to untranslate them for other types of output like csv files.