Forum Moderators: coopster

Message Too Old, No Replies

Checking if a string includes punctuation

I'm not sure how to check for this

         

MatthewHSE

9:32 pm on Sep 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm processing some form data that I want to include only numbers, letters, periods, commas, and the & symbol. <edit> And the @ symbol. </edit> If the data includes slashes, quotes, etc., I need to end the script and send an error.

I know I'm going to have to use regular expressions for this. I can probably figure out the exact regex to use, too. But my problem is how to check the string to see if it contains any of the conditions in the regex. What function would I use? And, I hate to ask, but what do I need to do to my regex so it will "trigger" if one of the illegal characters is found, given that they could come up at any point in the string?

Basically I'm having a mental block. I'm sure I've done this before but I can't remember how. Any pointers, references, tips, etc. will be appreciated.

Thanks,

Matthew

coopster

10:17 pm on Sep 12, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Create a user-defined function (your own function) that accepts a string and runs the regular expression against the string. If a bad character is found, exit or do whatever you need to do.
function badchars($string) 
{
if (preg_match("/[^0-9a-z\.\&\@]/i", $string)) {
exit('I found a bad character in ' . $string);
}
}
badchars('example');
badchars('EXAMPLE');
badchars('Example*');

MatthewHSE

11:37 pm on Sep 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, that works great, but I'm a little confused as to why it works. I would have thought that the contents of the "if" statement would only be executed if the data did not contain any of the disallowed characters. I would have thought
[b]![/b]preg_match...
would have been necessary, as in the data does not meet our standards, so do such-and-such. So just what is this doing?

Also, perl regular expressions are entirely new to me. I know a little about the "other" kind of regex (non-perl, can't remember the name right now) but not much. So I'm having a hard time understanding your syntax. Could you explain a bit of what's going on there? I've read around the Internet some, but haven't really found a guide that started where I am - at the beginning! ;)

One more thing: how can I add characters that are allowed? For instance, some of my fields will be allowed to have periods and commas; how can I add these to the list of permitted characters?

Thanks again,

Matthew

jd01

12:13 am on Sep 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The expression would be TRUE if there were any characters that were NOT in the class definition... You could also reverse it and use !preg_match with a positive definition.

Either will get you the results. (I prefer preg_match/!preg_match, because then I know my defined characters are always positive, and I can see positive/negative at the beginning of the expression (easier for me to read quickly), but that is personal preference.)

Justin

coopster

4:18 pm on Sep 13, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



if (preg_match("/[^0-9a-z\.\&\@]/i", $string)) {

The caret symbol (^) negates anything in the character class which is the characters and character ranges between the brackets ([]). Any symbols we aren't certain may be part of the regular expression special characters we can escape with a backslash (\). This expression says, if there is any character other than 0 through 9, a through z, a period, an ampersand, or an @ sign found in the variable '$string', then perform the necessary action. The lowercase "i" at the end of the pattern tells it that it is case-insenstive, so the a through z would also include capitalized letters as well.

POSIX is the other engine you are thinking of. Sorry for the switchup on you here, I always use the PCRE [php.net] (Perl Compatible Regular Expressions) as they are binary-safe and usually always faster.

To add more acceptable characters you simply add them to the rest of the class. Periods are already included, so now we'll add the comma and this time I won't escape the characters that I am certain do not have any other special meaning just to show that it works with or without escaping characters that don't have a special meaning:

if (preg_match("/[^0-9a-z\.&@,]/i", $string)) {