Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regular expressions

A bit of a reference

3:37 pm on Jun 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 27, 2003
votes: 0

I think regular expressions are cool, but it seems that some avoid them like the plague. e.g. [webmasterworld.com...]

And I've also seen quite a number faq's asking really basic regex questions.

So that inspired me to put together this brief overview of regular expressions. Since there are lots of good regex tutorials out there, I thought I'd rather put down a very brief overview of the regex commands; sort of a reference rather than a tutorial, although aimed at the absolute beginner. I thought I'd try make this applicable to wherever regex's can be used (php, perl, javascript), so I'm avoiding some of the cool (and not so cool) additions that php gives.

Here goes:

The traditional character to use before and after your pattern is /
e.g. pregmatch("/hello/", $my_string) will return 1 if $my_string contains "hello", and 0 if it does not.

If your string contains the / character, you then need to escape it with the \ character (e.g. \/ )

But you can use other characters such as #, {}, (), [], etc instead of /, and then you don't need to escape the /. e.g. pregmatch("#hello#", $my_string) or pregmatch("{hello}", $my_string) or pregmatch("(hello)", $my_string) Pretty much any characters which naturally match will work.

Some special characters (meta characters)

\ - The escape character. Treats the next character as a literal character instead of a metacharacter
^ - The begining of the string (or start of line... see /m later)
$ - The end of the string (or end of line... see /m later)
. - Any single character
- OR. Matches either the regex on the left of it OR the regex on the right of it
[ ] - Allows you to specify a set or a range of characters to match. e.g. [aeiou] matches any one vowel character. [a-z0-9] matches any lower case alpha or digit
[^ ] - specifies characters not to match. e.g. [^aeiou] matches any one non-vowel character.
() - brackets to group things together. This allows you to use $1, $2, etc in replacement strings to represent the strings that matched the regex's in the brackets.

Some special character sequences (Not all of them)

\w - Word characters: upper or lower case alphanumeric (i.e. alphabetical and numeric) characters and the underscore _
\W - non Word characters
\s - white-space: spaces, tabs, etc
\S - non-white-space
\d - digit: 0 to 9
\D - non-digit
\b - word boundary (e.g. a space)
\B - non-word-boundary
\t - Tab
\n - new line
\r - carriage return
\f - form feed
\a - alarm character
\e - escape character
\0nn - Octal character nn (e.g. \026)
\xhh - Hex character hh (e.g. \x1b)
\cC - control character C. e.g \cM is control-M


Adding the following quantifiers after a regex adjusts the number of times that regex matches

* - 0 or more times. e.g. .* matches any string. [aeiou]* matches any number of consecutive vowels.
+ - 1 or more times
? - 0 or 1 times (i.e. optional)
{n} - n times
{n,} - n or more times
{n,m} - between n and m times

By default, the regex will try and eat us much of the string it is tesing as possible. e.g. print preg_replace("/.*model (\d.*\d).*/", "model number: $1", $my_string); will print "model number: 12.23 serial number za4" if $my_string is "model 12.23 serial number za4asdj"

The ? modifier changes that to eating as little as possible, so e.g. print preg_replace("/.*model (\d.*\d)?.*/", "model number: $1", $my_string); will behave better in this circumstance.

Some Modifiers (Not all of them)

modifiers go at the end, after the last /.

g - global: match every occurance of the regex, throughout the string. Don't just stop after finding the first one
i - ignore case
m - If the string contains newlines or carriage returns, The ^ and $ match the start and end of the the lines, not the start & end of the entire string
x - Allows you to specify white space in the expression for clarity, without the white space being considered part of what needs to match.

Hope that helps. regex's are really cool, and I hope I've shown that they are quite easy too.


4:47 pm on June 25, 2003 (gmt 0)


WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
votes: 0

Awesome post ShawnR, I will add a pair of threads that might help as well

Regular Expression Basics [webmasterworld.com]
Forgotten my regex basics [webmasterworld.com]

some avoid them like the plague
<------ raises hand ;)

They definitely are a tough thing to get a handle on at first. I have always found ways around them so I seldom use them but I do write some very basic ones once in a while.

It is one of those things that I kick myself about all the time, I'll immerse myself one of these days.

5:13 pm on June 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 12, 2002
votes: 0

Yes, very nice post. I'll be bookmakring it just 'cause I always forget the \x character classes.

Oddly enough, pretty much the only thing I like better about Perl than PHP is the integration of regexps into the language. I very much like being able to just stick regexp operators into the flow of my code rather than having to call preg_foo($whats,$the,$argument,$order,$again?)

6:19 pm on June 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 24, 2002
votes: 0

thanks shawn! very helpful post.

one for the library me thinks.

8:49 pm on June 25, 2003 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
votes: 0

Lovely contribution ShawnR. And timely too I might add. I still get caught up on REGEX.
11:21 pm on June 25, 2003 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
votes: 0

great post :-)

i personally find the two top entries in google a great reference