I have read the following question frOM Perl in 24 hours, Third edition; Clinton Pierce.
The Question is as follows-
Write a short program that does the following: -
1. Opens a file,
2. Reads all the lines into an array;
3. Extracts all the words that have at least four consecutive constanants, or non-vowels.
I am not expecting the answer to this questions but maybe some sugestions along with what i know may help me solve the question and start getting a better understanding for Perl.
well, in regular expressions, you can define character groups, for example [a-z] to represent all the lowercase characters. you can also invert the meaning by adding a ^ to the beginning, like [^a-z], which would mean "any character except the lowercase alphabet". combine that with the ability to specify the number of occurences needed, {1,2} and you've got something.
so, for example, [a-z]{2} would match on any two consecutive lowercase letters. You can also specify a range {1,3} (matching on one, two or three consecutive elements of the group) or a minimum value {1,} (more than one, no limit).
if you have any more questions, shoot. and post code, where appropriate.
btw: [perldoc.perl.org...] gives a pretty good overview over regular expressions in perl
3. Extracts all the words that have at least four consecutive consonants, or non-vowels.
[] = a class of characters
[^aeiou] = anything that is NOT in this class - they kinda give you a clue by saying "non-vowels." This one can get tricky if you need to match on a carat - so in those cases, make sure it's NOT the first character after the bracket. [^....] will always mean "anything that is NOT this".
{4,} = at least four or more. This one's easier to understand than it looks. {4,6} = at least 4, no more than 6. A blank second parameter means "infinity." There are special ones for "zero or more" * or "one or more" + as shorthand:
if ($w =~ /a*/) { } #match zero or more lower case "a's"
if ($w =~ /a+/) { } #match one or more lower case "a's"
if ($w =~ /a{4,6}/) { } #match 4 to 6 lower case "a's"
the i modifier makes the match non-case sensitive, which is shorter than [^AEIOUaeiou].
So this regexp should work (tested):
if ($word =~ /[^aeiou]{4,}/i) { ...... }
Reading files in and extracting the words is the fun part of learning Perl, won't deny you that by exemplifying it. :-)