Forum Moderators: coopster

Message Too Old, No Replies

Input cleansing

         

wheelie34

11:38 am on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As with all forms on the web they eventually get found and abused, if I am expecting the following format from an input, what do I use to check its valid

01/03/2009 (a date that the user needs to type in)

Getting 2 to 4 a day containing this sort of cr@p 'hhfgdgdf'

I get the post value to process like this

$startdate = $_POST['startdate'];

I am not asking for the code to be written, I am asking for the method to use, pseudocode I believe.

Where do I start, thanks

trillianjedi

12:09 pm on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use a RegEx to match the pattern.

Here's the pattern I'm using:-

'\b(0?[1-9]¦[12][0-9]¦3[01])[- /.](0?[1-9]¦1[012])[- /.](19¦20)?[0-9]{2}\b'

Check it using PHP's preg_match, something like:-

function validDate($date) {
$datePatternRegEx = "'\b(0?[1-9]¦[12][0-9]¦3[01])[- /.](0?[1-9]¦1[012])[- /.](19¦20)?[0-9]{2}\b'";

return (preg_match($datePatternRegEx, $date));
}

$startdate = $_POST['startdate'];

if (!validDate($startdate)) {
// Define an outcome for failed dates
} else {
// Carry on processing
}

The above RegEx pattern should match:-

d/m/yy
dd/mm/yyyy

It is not intelligent enough to stop bad dates like 31st February getting through.

Should be sufficient to stop a bot though, unless the bot is sending a dd/mm/yyyy format date of course...

Change any broken pipes in the above to solid pipes - WebmasterWorld changes them when posting. They're meant to be solid.

I'm not a RegEx expert and the pattern above is one I used a while ago for a non-critical application. Have it double-checked by a RegEx expert and don't rely on mine.

wheelie34

12:47 pm on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks trillianjedi, I always like to know what everything means as it makes finding problems easier.

'\b(0? \b' # what are these bits for?

[1-9]¦[12][0-9]¦3[01]) # the day part of the date

[- /.] # the seperator (does it also check for - or .)

(0?[1-9]¦1[012]) # the month

[- /.] # seperator again

(19¦20)?[0-9]{2} # year part

I know about the broken pipes here, just if someone could explain what each part does in a nutshell

Thanks

rocknbil

3:06 pm on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, any time you put it up to the user to input data, even valid input, you're asking for a headache.

Another angle on this, even though your form probably says "Enter a date in the format "MM/DD/YYYY" there are a variety of valid date formats users are familiar with:

YYYY/MM/DD
MM-DD-YY
MM.DD.YY
.....

I'd do this with select lists, so that each field is an individual input. This allows you to control each field. You might think this is more trouble than it's worth, but you can output the select lists as automated loops:

$day_list = '<select name="start_day" id="start_day">';

for ($i=0;$i<=31,$i++) {
$dd = (strlen($i)<2)?'0' . $i:$i;
$day_list .= "<option value=\"$dd\">$dd</option>\n";
}

For year, you can set the start and end of the list with internal variables, or database settings controlled by the client, or base it off the current year ($yearstart = current year -2, etc.) If you set these, you won't need strlen():

$startyear=2000;
$endyear=2020;
for ($i=$startyear;$i<=$endyear,$i++) {
$year_list .= "<option value=\"$i\">$i</option>\n";
}

Now that you have input for each of the three values day, month, and year, it simplifies your regexps:

if (
preg_match("/^\d{2}$/",$input['start_day']) and
preg_match("/^\d{2}$/",$input['start_month']) and
preg_match("/^\d{4}$/",$input['start_year'])
) {
// it's all good
}
else { error_func("Invalid date entered"); }

\d = any digit, {2}/{4} = specifically that number of digits. ^ = starts with, $ = ends with, which are really meant for multiline regexps, meaning "line starts/ends with" - trillianjedi's solution works too, instead you could use word boundary, \b for both. But this does work.

You might say, well then I have to recode how I accept the date field - you don't. If your current field is "startdate",

if (
preg_match("/^\d{2}$/",$input['start_day']) and
preg_match("/^\d{2}$/",$input['start_month']) and
preg_match("/^\d{4}$/",$input['start_year'])
) {
$input['startdate'] = "$input['start_year']-$input['start_month']-$input['start_day']";
.....
}

if (! isset($input['startdate'])) { error_func("invalid date"); }

... so now "startdate" is an internal program variable, not a result of user input, and it's "safe." Being so, it's easy to just see if it's set, because if your three date fields don't pass, it won't be set.

You should be able to plug all this right in above wherever you enter the date into your database.

So you've done five things:

- Created a reusable chunk of code for date lists (create a function or class that accepts the list name, then returns the three lists - you could use it for startdate, enddate . . . anywhere in your programming.)
- Removed "startdate" from user input entirely.
- Eliminated the possibility of user error or junk input on this field.
- Simplified your validation tasks on this field.
- adhered to one of the fundamental rules of web apps, accept only valid, expected data and throw everything else away, not try to guess at "bad data" and block it.

wheelie34

3:48 pm on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks rocknbil, I do have dynamic date fields on my personal sites, this is a customer sites form on an html page that I cant get access to the htaccess file, so I created a form handler.php and removed their (formmail.cgi). They have one of those fancy calendars that popup and the date gets put in in the format in my opening post, they say they have never had any different date format, everyone uses the calendar, I did explain about javascript being turned off, didn't phase them "we want the calendar to stay".

I am just reading up on regular expressions and testing tj's example