Welcome to WebmasterWorld Guest from 54.144.243.34

Forum Moderators: coopster & jatar k & phranque

RegExp Perl

   
12:14 pm on Jul 21, 2008 (gmt 0)

5+ Year Member



I am new to regular expressions. I believe to actually understand the subject would be to attempt some questions and solve the problem.

I have read the following question frOM Perl in 24 hours, Third edition; Clinton Pierce.

The Question is as follows-

Write a short program that does the following: -
1. Opens a file,
2. Reads all the lines into an array;
3. Extracts all the words that have at least four consecutive constanants, or non-vowels.

I am not expecting the answer to this questions but maybe some sugestions along with what i know may help me solve the question and start getting a better understanding for Perl.

12:34 pm on Jul 21, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



the trouble you're having is at step 3?

well, in regular expressions, you can define character groups, for example [a-z] to represent all the lowercase characters. you can also invert the meaning by adding a ^ to the beginning, like [^a-z], which would mean "any character except the lowercase alphabet". combine that with the ability to specify the number of occurences needed, {1,2} and you've got something.
so, for example, [a-z]{2} would match on any two consecutive lowercase letters. You can also specify a range {1,3} (matching on one, two or three consecutive elements of the group) or a minimum value {1,} (more than one, no limit).

if you have any more questions, shoot. and post code, where appropriate.
btw: [perldoc.perl.org...] gives a pretty good overview over regular expressions in perl

4:52 pm on Jul 21, 2008 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Welcome aboard Daveo1977!

3. Extracts all the words that have at least four consecutive consonants, or non-vowels.

[] = a class of characters

[^aeiou] = anything that is NOT in this class - they kinda give you a clue by saying "non-vowels." This one can get tricky if you need to match on a carat - so in those cases, make sure it's NOT the first character after the bracket. [^....] will always mean "anything that is NOT this".

{4,} = at least four or more. This one's easier to understand than it looks. {4,6} = at least 4, no more than 6. A blank second parameter means "infinity." There are special ones for "zero or more" * or "one or more" + as shorthand:

if ($w =~ /a*/) { } #match zero or more lower case "a's"
if ($w =~ /a+/) { } #match one or more lower case "a's"
if ($w =~ /a{4,6}/) { } #match 4 to 6 lower case "a's"

the i modifier makes the match non-case sensitive, which is shorter than [^AEIOUaeiou].

So this regexp should work (tested):

if ($word =~ /[^aeiou]{4,}/i) { ...... }

Reading files in and extracting the words is the fun part of learning Perl, won't deny you that by exemplifying it. :-)

9:15 pm on Jul 21, 2008 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



[^aeiou] = anything that is NOT in this class - they kinda give you a clue by saying "non-vowels."

actually it specifies consonants, not non-vowels.
the character class [^aeiou] will include numerics, blanks, punctuation, etc.
this might be a better consonant class:
[b-df-hj-np-tv-z]

10:06 pm on Jul 21, 2008 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member



***sooooooo busted ******

<eek>

10:24 pm on Jul 21, 2008 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



busted

btdt - we all like to see the easy way out...
=8)

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month