homepage Welcome to WebmasterWorld Guest from 107.21.187.131
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Regular expressions
A bit of a reference
ShawnR




msg:1277788
 3:37 pm on Jun 25, 2003 (gmt 0)

I think regular expressions are cool, but it seems that some avoid them like the plague. e.g. [webmasterworld.com...]

And I've also seen quite a number faq's asking really basic regex questions.

So that inspired me to put together this brief overview of regular expressions. Since there are lots of good regex tutorials out there, I thought I'd rather put down a very brief overview of the regex commands; sort of a reference rather than a tutorial, although aimed at the absolute beginner. I thought I'd try make this applicable to wherever regex's can be used (php, perl, javascript), so I'm avoiding some of the cool (and not so cool) additions that php gives.

Here goes:

The traditional character to use before and after your pattern is /
e.g. pregmatch("/hello/", $my_string) will return 1 if $my_string contains "hello", and 0 if it does not.

If your string contains the / character, you then need to escape it with the \ character (e.g. \/ )

But you can use other characters such as #, {}, (), [], etc instead of /, and then you don't need to escape the /. e.g. pregmatch("#hello#", $my_string) or pregmatch("{hello}", $my_string) or pregmatch("(hello)", $my_string) Pretty much any characters which naturally match will work.

Some special characters (meta characters)
==========================================

\ - The escape character. Treats the next character as a literal character instead of a metacharacter
^ - The begining of the string (or start of line... see /m later)
$ - The end of the string (or end of line... see /m later)
. - Any single character
- OR. Matches either the regex on the left of it OR the regex on the right of it
[ ] - Allows you to specify a set or a range of characters to match. e.g. [aeiou] matches any one vowel character. [a-z0-9] matches any lower case alpha or digit
[^ ] - specifies characters not to match. e.g. [^aeiou] matches any one non-vowel character.
() - brackets to group things together. This allows you to use $1, $2, etc in replacement strings to represent the strings that matched the regex's in the brackets.

Some special character sequences (Not all of them)
==================================================

\w - Word characters: upper or lower case alphanumeric (i.e. alphabetical and numeric) characters and the underscore _
\W - non Word characters
\s - white-space: spaces, tabs, etc
\S - non-white-space
\d - digit: 0 to 9
\D - non-digit
\b - word boundary (e.g. a space)
\B - non-word-boundary
\t - Tab
\n - new line
\r - carriage return
\f - form feed
\a - alarm character
\e - escape character
\0nn - Octal character nn (e.g. \026)
\xhh - Hex character hh (e.g. \x1b)
\cC - control character C. e.g \cM is control-M

Quantifiers
===========

Adding the following quantifiers after a regex adjusts the number of times that regex matches

* - 0 or more times. e.g. .* matches any string. [aeiou]* matches any number of consecutive vowels.
+ - 1 or more times
? - 0 or 1 times (i.e. optional)
{n} - n times
{n,} - n or more times
{n,m} - between n and m times

By default, the regex will try and eat us much of the string it is tesing as possible. e.g. print preg_replace("/.*model (\d.*\d).*/", "model number: $1", $my_string); will print "model number: 12.23 serial number za4" if $my_string is "model 12.23 serial number za4asdj"

The ? modifier changes that to eating as little as possible, so e.g. print preg_replace("/.*model (\d.*\d)?.*/", "model number: $1", $my_string); will behave better in this circumstance.

Some Modifiers (Not all of them)
================================

modifiers go at the end, after the last /.

g - global: match every occurance of the regex, throughout the string. Don't just stop after finding the first one
i - ignore case
m - If the string contains newlines or carriage returns, The ^ and $ match the start and end of the the lines, not the start & end of the entire string
x - Allows you to specify white space in the expression for clarity, without the white space being considered part of what needs to match.

Hope that helps. regex's are really cool, and I hope I've shown that they are quite easy too.

Shawn

 

jatar_k




msg:1277789
 4:47 pm on Jun 25, 2003 (gmt 0)

Awesome post ShawnR, I will add a pair of threads that might help as well

Regular Expression Basics [webmasterworld.com]
Forgotten my regex basics [webmasterworld.com]

some avoid them like the plague
<------ raises hand ;)

They definitely are a tough thing to get a handle on at first. I have always found ways around them so I seldom use them but I do write some very basic ones once in a while.

It is one of those things that I kick myself about all the time, I'll immerse myself one of these days.

dingman




msg:1277790
 5:13 pm on Jun 25, 2003 (gmt 0)

Yes, very nice post. I'll be bookmakring it just 'cause I always forget the \x character classes.

Oddly enough, pretty much the only thing I like better about Perl than PHP is the integration of regexps into the language. I very much like being able to just stick regexp operators into the flow of my code rather than having to call preg_foo($whats,$the,$argument,$order,$again?)

jamie




msg:1277791
 6:19 pm on Jun 25, 2003 (gmt 0)

thanks shawn! very helpful post.

one for the library me thinks.

lorax




msg:1277792
 8:49 pm on Jun 25, 2003 (gmt 0)

Lovely contribution ShawnR. And timely too I might add. I still get caught up on REGEX.

vincevincevince




msg:1277793
 11:21 pm on Jun 25, 2003 (gmt 0)

great post :-)

i personally find the two top entries in google a great reference

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved