homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
regular expression basics
anyone want to supply some?
jatar_k




msg:1285818
 6:56 am on Aug 29, 2002 (gmt 0)

Regular expressions are the bane of my existence. I always get someone else to write them and I am more than happy to cash in favors to make sure it happens.

What I am looking for is the basics. If I was going to start using regular expressions and needed to understand the fundamentals involved how would you explain it to me?

I know what they are and I am very familiar with all of the functions that use them, I am just talking straight up regex. What do all the complicated strings of chars in those functions mean?

 

Damian




msg:1285819
 8:27 am on Aug 29, 2002 (gmt 0)

Maybe this helps:
This page [artswebsite.com] has an overview with the basics, taken from the Macromedia Dreamweaver help files and from O'Reilly.

jatar_k




msg:1285820
 4:35 pm on Aug 29, 2002 (gmt 0)

Thanks Damian, looks like a good place to start.

Robber




msg:1285821
 8:11 pm on Aug 29, 2002 (gmt 0)

I reckon the best advice I could give regarding regex is to look at the expression one character at a time. If you try and figure out the whole thing all at once you're heading for trouble - especialy when you start assigning with parentheses!

Another thing I find (which is probably obvious!) but take the time to figure out which parts of the regex are special characters, eg, \w would match all whitespace characters, but at a glance its easy to miss that.

Oh yeah, and one other thing thats useful to remember - ^ means match at the start of a string, except in a character class when it means negate the character class, that catches me out quite a bit.

Well thats my 2p worth!

ergophobe




msg:1285822
 10:08 pm on Aug 29, 2002 (gmt 0)

jatar,

why don't you try this article

So What's A $#!%% Regular Expression, Anyway?! [devshed.com] from devshed.

Are you working under *nix? If so, just play around with grep and such.

If under Windows, there are lots of regex utilities. BKReplacem is a good one and there are various ports of grep to Win.

Tom

transistor




msg:1285823
 11:44 pm on Aug 29, 2002 (gmt 0)

A simple one:

<?
if (eregi("^[a-zA-Z0-9\ ]+$",$var)) { // a space after the backslash
echo "Passed!";
} else {
echo "Failed";
}
?>

Like Robber said: ^ starts with...
between brackets are the characters allowed, in this case:
a-zA-Z (any lowercase or uppercase letter)
0-9 (any number)
\ (this is a space escaped, allows a space, doh!)
the brackets end and then
+$ (which I understand as "ends with")
So, this code will return:
Passed! for $var="My name is Transistor"
Failed for $var="No time, to lose!" // the comma and the exclamation mark are not allowed
Passed! for $var="12345678"
Passed! for $var="Regex 101"
Failed for $var="$100.50" // Dollar sign and period not allowed

jdMorgan




msg:1285824
 11:44 pm on Aug 29, 2002 (gmt 0)

jatar,

Another comment I read somwhere that I've found to be absolutely true is that regex are easier to write than they are to read! A good comparison is that writing your own scripts is much easier than reading someone else's. So do give it a try.

Jim

jatar_k




msg:1285825
 7:22 am on Aug 30, 2002 (gmt 0)

Ok so I have written a few good ones previously and debugged a ton but oooh do I hate them.

These all look like good resources and tips. Who knows I might even get good at them.

I will add my own personal one. I have always used the perl in a nutshell from oreilly.

The times I have been forced to do them it has gotten me through quite well but I am going to add this thread to my resource list.

<added>transistor, sweet little example, very intuitive

jdMorgan




msg:1285826
 8:19 pm on Aug 30, 2002 (gmt 0)

Just a minor correction to this "resource" :)

^[a-zA-Z0-9\ ]+$

...the brackets end and then +$ (which I understand as "ends with")

The "+" means "require one or more of the preceding character or group - in this case the group contents of the square brackets.

The "$" means, "and this must match at the end of the string being tested."

Jim

tonic




msg:1285827
 8:56 pm on Aug 30, 2002 (gmt 0)

and for more help this tool is very kewl :
[gotdotnet.com...]

you enter the regexp, some text, it displays the output

mdharrold




msg:1285828
 9:29 pm on Aug 30, 2002 (gmt 0)

The regex page I use to sort it all out. [troubleshooters.com]

Another comment I read somwhere that I've found to be absolutely true is that regex are easier to write than they are to read! A good comparison is that writing your own scripts is much easier than reading someone else's. So do give it a try.

I agree completely. I have to take several minutes of complete silence to understand someone else's regular expression.

gsx




msg:1285829
 9:54 pm on Aug 30, 2002 (gmt 0)

They are easy to write.

If you want to match a letter, type the letter. If you want to match a symbol, always type a backslash then the symbol.

Then there is the special codes, the opposite of the above: a letter preceeded by a backslash or a symbol without a backslash.

You will most likely use:
[...-...] : Range of chars : e.g. [A-Z]
^ : Match start of string
$ : Match end of string
. : Match any character
\b : Match any word boundary : e.g. \bit\b will match 'a it b' but not 'a bite b'

Then there are qualifiers:
* : Match zero or more times : e.g. X* will match '', 'X', 'XX', 'XXX' etc...
*?: Match zero or more times (same as above but will take as few characters as possible, above will take as many characters as possible)
+ : Match one or more times : e.g. X+ will match 'X', 'XX' etc.. but not ''
+?: Match one or more times : (same as above but will take as few characters as possible, above will take as many characters as possible)

Minimal and Maximal are as follows:
if you have the string <b><a href=x>ThisLink</a></b>
then you match with \<.*\>, you will get the whole string matching because it takes as many characters as possible with the .*
but if you match with \<.*?\> you will get <b> returned, but no more, because it is the shortest possible (from the left)

You will find .*? invaluable: e.g. \<span.*?\/span\> will get any string <span....>....</span> matched.

(Technically, you do not need to backslash the < and >, but it makes it easier to understand that it is a literal char when you read it in years to come)

I recommend O'Reilly books for further information, very in depth but brilliant for quick reference.

Robber




msg:1285830
 9:15 am on Aug 31, 2002 (gmt 0)

Nice one gsx, can't forget the ?, first time I saw it was something simple like .*?, at the time I hadnt come across the concept of greediness and wondered what the hell was going on - I assumed the ? meant zero or one, which in that context was rubbish, so watch out folks, it doesnt mean that at all.

lorax




msg:1285831
 12:25 pm on Aug 31, 2002 (gmt 0)

Having spent my share of tripping over reg expressions I'll add that just like everything else - syntax is everything. The difference can be that when reg expressions don't work it can be awfully hard to find that little typo.

A good comparison is that writing your own scripts is much easier than reading someone else's. I almost agree. ;) Reading isn't so much the problem for me (and this may be what you were really getting at) but rather wrapping my mind around where the programmer was headed with the code and building a mental picture of all the pieces. When you write it yourself that develops naturally. Much the same for regular expressions - but on a smaller scale.

I personally found it easier to work my way through regular expressions by taking someone's example code and playing with it. I used one that checked email addresses for the "@" and looked for the "." as well. The problem I noted is that it didn't account for the fact that some email addresses use a . before @ like "john.smith@roger.com". Playing with that code taught me a lot. I spun my wheels for a time over a syntax problem. So after a cup of tea and a bit of lunch I came back to it and spotted the bugger right off. That's how it usually goes.

Just my 2 cents.
GB

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved