Forum Moderators: coopster
What I am looking for is the basics. If I was going to start using regular expressions and needed to understand the fundamentals involved how would you explain it to me?
I know what they are and I am very familiar with all of the functions that use them, I am just talking straight up regex. What do all the complicated strings of chars in those functions mean?
Another thing I find (which is probably obvious!) but take the time to figure out which parts of the regex are special characters, eg, \w would match all whitespace characters, but at a glance its easy to miss that.
Oh yeah, and one other thing thats useful to remember - ^ means match at the start of a string, except in a character class when it means negate the character class, that catches me out quite a bit.
Well thats my 2p worth!
why don't you try this article
So What's A $#!%% Regular Expression, Anyway?! [devshed.com] from devshed.
Are you working under *nix? If so, just play around with grep and such.
If under Windows, there are lots of regex utilities. BKReplacem is a good one and there are various ports of grep to Win.
Tom
<?
if (eregi("^[a-zA-Z0-9\ ]+$",$var)) { // a space after the backslash
echo "Passed!";
} else {
echo "Failed";
}
?>
These all look like good resources and tips. Who knows I might even get good at them.
I will add my own personal one. I have always used the perl in a nutshell from oreilly.
The times I have been forced to do them it has gotten me through quite well but I am going to add this thread to my resource list.
<added>transistor, sweet little example, very intuitive
^[a-zA-Z0-9\ ]+$...the brackets end and then +$ (which I understand as "ends with")
The "+" means "require one or more of the preceding character or group - in this case the group contents of the square brackets.
The "$" means, "and this must match at the end of the string being tested."
Jim
you enter the regexp, some text, it displays the output
Another comment I read somwhere that I've found to be absolutely true is that regex are easier to write than they are to read! A good comparison is that writing your own scripts is much easier than reading someone else's. So do give it a try.
I agree completely. I have to take several minutes of complete silence to understand someone else's regular expression.
If you want to match a letter, type the letter. If you want to match a symbol, always type a backslash then the symbol.
Then there is the special codes, the opposite of the above: a letter preceeded by a backslash or a symbol without a backslash.
You will most likely use:
[...-...] : Range of chars : e.g. [A-Z]
^ : Match start of string
$ : Match end of string
. : Match any character
\b : Match any word boundary : e.g. \bit\b will match 'a it b' but not 'a bite b'
Then there are qualifiers:
* : Match zero or more times : e.g. X* will match '', 'X', 'XX', 'XXX' etc...
*?: Match zero or more times (same as above but will take as few characters as possible, above will take as many characters as possible)
+ : Match one or more times : e.g. X+ will match 'X', 'XX' etc.. but not ''
+?: Match one or more times : (same as above but will take as few characters as possible, above will take as many characters as possible)
Minimal and Maximal are as follows:
if you have the string <b><a href=x>ThisLink</a></b>
then you match with \<.*\>, you will get the whole string matching because it takes as many characters as possible with the .*
but if you match with \<.*?\> you will get <b> returned, but no more, because it is the shortest possible (from the left)
You will find .*? invaluable: e.g. \<span.*?\/span\> will get any string <span....>....</span> matched.
(Technically, you do not need to backslash the < and >, but it makes it easier to understand that it is a literal char when you read it in years to come)
I recommend O'Reilly books for further information, very in depth but brilliant for quick reference.
A good comparison is that writing your own scripts is much easier than reading someone else's. I almost agree. ;) Reading isn't so much the problem for me (and this may be what you were really getting at) but rather wrapping my mind around where the programmer was headed with the code and building a mental picture of all the pieces. When you write it yourself that develops naturally. Much the same for regular expressions - but on a smaller scale.
I personally found it easier to work my way through regular expressions by taking someone's example code and playing with it. I used one that checked email addresses for the "@" and looked for the "." as well. The problem I noted is that it didn't account for the fact that some email addresses use a . before @ like "john.smith@roger.com". Playing with that code taught me a lot. I spun my wheels for a time over a syntax problem. So after a cup of tea and a bit of lunch I came back to it and spotted the bugger right off. That's how it usually goes.
Just my 2 cents.
GB