Regular expressions

I think regular expressions are cool, but it seems that some avoid them like the plague. e.g. [webmasterworld.com...]

And I've also seen quite a number faq's asking really basic regex questions.

So that inspired me to put together this brief overview of regular expressions. Since there are lots of good regex tutorials out there, I thought I'd rather put down a very brief overview of the regex commands; sort of a reference rather than a tutorial, although aimed at the absolute beginner. I thought I'd try make this applicable to wherever regex's can be used (php, perl, javascript), so I'm avoiding some of the cool (and not so cool) additions that php gives.

Here goes:

The traditional character to use before and after your pattern is /
e.g. pregmatch("/hello/", $my_string) will return 1 if $my_string contains "hello", and 0 if it does not.

If your string contains the / character, you then need to escape it with the \ character (e.g. \/ )

But you can use other characters such as #, {}, (), [], etc instead of /, and then you don't need to escape the /. e.g. pregmatch("#hello#", $my_string) or pregmatch("{hello}", $my_string) or pregmatch("(hello)", $my_string) Pretty much any characters which naturally match will work.

Some special characters (meta characters)
==========================================
\ - The escape character. Treats the next character as a literal character instead of a metacharacter
^ - The begining of the string (or start of line... see /m later)
$ - The end of the string (or end of line... see /m later)
. - Any single character
Ś - OR. Matches either the regex on the left of it OR the regex on the right of it
[ ] - Allows you to specify a set or a range of characters to match. e.g. [aeiou] matches any one vowel character. [a-z0-9] matches any lower case alpha or digit
[^ ] - specifies characters not to match. e.g. [^aeiou] matches any one non-vowel character.
() - brackets to group things together. This allows you to use $1, $2, etc in replacement strings to represent the strings that matched the regex's in the brackets.

Some special character sequences (Not all of them)
==================================================
\w - Word characters: upper or lower case alphanumeric (i.e. alphabetical and numeric) characters and the underscore _
\W - non Word characters
\s - white-space: spaces, tabs, etc
\S - non-white-space
\d - digit: 0 to 9
\D - non-digit
\b - word boundary (e.g. a space)
\B - non-word-boundary
\t - Tab
\n - new line
\r - carriage return
\f - form feed
\a - alarm character
\e - escape character
\0nn - Octal character nn (e.g. \026)
\xhh - Hex character hh (e.g. \x1b)
\cC - control character C. e.g \cM is control-M

Quantifiers
===========
Adding the following quantifiers after a regex adjusts the number of times that regex matches

* - 0 or more times. e.g. .* matches any string. [aeiou]* matches any number of consecutive vowels.
+ - 1 or more times
? - 0 or 1 times (i.e. optional)
{n} - n times
{n,} - n or more times
{n,m} - between n and m times

By default, the regex will try and eat us much of the string it is tesing as possible. e.g. print preg_replace("/.*model (\d.*\d).*/", "model number: $1", $my_string); will print "model number: 12.23 serial number za4" if $my_string is "model 12.23 serial number za4asdj"

The ? modifier changes that to eating as little as possible, so e.g. print preg_replace("/.*model (\d.*\d)?.*/", "model number: $1", $my_string); will behave better in this circumstance.

Some Modifiers (Not all of them)
================================
modifiers go at the end, after the last /.

g - global: match every occurance of the regex, throughout the string. Don't just stop after finding the first one
i - ignore case
m - If the string contains newlines or carriage returns, The ^ and $ match the start and end of the the lines, not the start & end of the entire string
x - Allows you to specify white space in the expression for clarity, without the white space being considered part of what needs to match.

Hope that helps. regex's are really cool, and I hope I've shown that they are quite easy too.

Shawn

Regular expressions

A bit of a reference

ShawnR

jatar_k

dingman

jamie

lorax

vincevincevince

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week