Forum Moderators: coopster

Message Too Old, No Replies

Basic Form Input Validation in PHP using Regular Expression Matching

A 2 Minute How-To

         

dmorison

3:41 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does this look familiar:

if (
(strlen($username)<3) ¦¦
(strlen($username)>15) ¦¦
(!ctype_isalpha($username))
)
{
// invalid username
}

It's the bog standard way to check that an input variable (in this case a username that must be between 3 and 15 characters long and contain numbers and letters only) satisfies your application's formatting requirements.

Regular Expressions provide a much neater (although not necessarily any more efficient) way to validate form input against a required pattern and length. This can be achieved by using PHP's ereg function. The function reference can be found at:

[uk.php.net...]

ereg is defined as:

bool ereg ( string pattern, string string [, array regs])

...where $pattern is the regular expression we are going to test against, and $string is the item to be tested. We can ignore the third optional parameter.

The most important aspect of the regular expression pattern when performing form input validation are the line beginning and end anchors. So, before working on the pattern itself, any form input validation pattern must begin with "^" (the start of line anchor), and end with "$" (the end of line anchor).

So our starting point is always:

"^$"

Now what goes between the anchors depends on the input types we wish to match. Our $username example must contain only letters and numbers. Within a regular expression, square brackets are used to match any one of the characters contained between them. However, instead of having to list every character within a range you can just specify the range itself, using a hyphen. So for example, the upper case letters A thru Z can be matched with "[A-Z]". Similarly, we can use "[a-z]" and "[0-9]".

You can concatenate multiple range values, so to specify the valid characters for our $username example, we could use "[A-Za-z0-9]".

So now we have a regular expression that looks like:

"^[A-Za-z0-9]$"

But we're not there yet. The above expression will only match one single character. Our $username has to be between 3 and 15 characters. Any item within a regular expression may be followed by curly brackets containing one or 2 numbers that indicate how many times the previous item must match. If one number is specified, such as "{3}", that would indicate that the previous item must match exactly 3 times. If two numbers are specified, separated by a coma, the previous item must match at least the first number of times, and no more than the second number; so for our $username example we can use "{3,15}".

This gives us our final $username regular expression pattern of:

"^[A-Za-z0-9]{3,15}$"

So to put this into code; the initial example is simply replaced by:

if (!ereg("^[A-Za-z0-9]{3,15}$",$username))
{
// invalid username
}

Notice the "!" operator, since ereg returns TRUE if the pattern matches.

So that's the basics. Within my applications I tend to create an array of regular expressions within a common include file covering various different input types (such as username, password, email address etc.), which can then be called up at any point in the code.

As the best way to learn (in my experience) is by example; here are a couple other patterns for a ficticous "Customer Accout" number:

// Customer Account number - exactly 7 numbers only

"^[0-9]{7}$"

// More complex Customer Account number - exactly 2 letters, and then between 1 and 4 numbers

"^[a-zA-Z]{2}[0-9]{1,4}$"

They should give you the mechanics for basic form field validation using regular expressions. They can of course be made complicated to the hilt; and sadly most web based tutorials I've come across dive straight in at the deep end!

I hope this helps somebody!

hakre

3:57 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



a nice entry into the materia! is there a way to specify 'exactly or more then 2'?

dmorison

4:14 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry - yes!

{2,}

would do the trick - leave out the second number but include the comma.