homepage Welcome to WebmasterWorld Guest from 54.234.225.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
RegExp Perl
Davo1977




msg:3703379
 12:14 pm on Jul 21, 2008 (gmt 0)

I am new to regular expressions. I believe to actually understand the subject would be to attempt some questions and solve the problem.

I have read the following question frOM Perl in 24 hours, Third edition; Clinton Pierce.

The Question is as follows-

Write a short program that does the following: -
1. Opens a file,
2. Reads all the lines into an array;
3. Extracts all the words that have at least four consecutive constanants, or non-vowels.

I am not expecting the answer to this questions but maybe some sugestions along with what i know may help me solve the question and start getting a better understanding for Perl.

 

janharders




msg:3703391
 12:34 pm on Jul 21, 2008 (gmt 0)

the trouble you're having is at step 3?

well, in regular expressions, you can define character groups, for example [a-z] to represent all the lowercase characters. you can also invert the meaning by adding a ^ to the beginning, like [^a-z], which would mean "any character except the lowercase alphabet". combine that with the ability to specify the number of occurences needed, {1,2} and you've got something.
so, for example, [a-z]{2} would match on any two consecutive lowercase letters. You can also specify a range {1,3} (matching on one, two or three consecutive elements of the group) or a minimum value {1,} (more than one, no limit).

if you have any more questions, shoot. and post code, where appropriate.
btw: [perldoc.perl.org...] gives a pretty good overview over regular expressions in perl

rocknbil




msg:3703604
 4:52 pm on Jul 21, 2008 (gmt 0)

Welcome aboard Daveo1977!

3. Extracts all the words that have at least four consecutive consonants, or non-vowels.

[] = a class of characters

[^aeiou] = anything that is NOT in this class - they kinda give you a clue by saying "non-vowels." This one can get tricky if you need to match on a carat - so in those cases, make sure it's NOT the first character after the bracket. [^....] will always mean "anything that is NOT this".

{4,} = at least four or more. This one's easier to understand than it looks. {4,6} = at least 4, no more than 6. A blank second parameter means "infinity." There are special ones for "zero or more" * or "one or more" + as shorthand:

if ($w =~ /a*/) { } #match zero or more lower case "a's"
if ($w =~ /a+/) { } #match one or more lower case "a's"
if ($w =~ /a{4,6}/) { } #match 4 to 6 lower case "a's"

the i modifier makes the match non-case sensitive, which is shorter than [^AEIOUaeiou].

So this regexp should work (tested):

if ($word =~ /[^aeiou]{4,}/i) { ...... }

Reading files in and extracting the words is the fun part of learning Perl, won't deny you that by exemplifying it. :-)

phranque




msg:3703907
 9:15 pm on Jul 21, 2008 (gmt 0)

[^aeiou] = anything that is NOT in this class - they kinda give you a clue by saying "non-vowels."

actually it specifies consonants, not non-vowels.
the character class [^aeiou] will include numerics, blanks, punctuation, etc.
this might be a better consonant class:
[b-df-hj-np-tv-z]

rocknbil




msg:3703959
 10:06 pm on Jul 21, 2008 (gmt 0)

***sooooooo busted ******

<eek>

phranque




msg:3703970
 10:24 pm on Jul 21, 2008 (gmt 0)

busted

btdt - we all like to see the easy way out...
=8)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved