#third.pl FILE
open(MYINPUTFILE,"database.txt") or die "database.txt not found!\n";
while (<MYINPUTFILE >) {
chop;
tr/;:,.!?-//d;
foreach $w (split) {
if ($w eq 'the') {
print "$.\n";
$score++;
}
}
}
print "\n'the' occurs $score++ times\n";
Assume database.txt is a file that is an essay with 35 instances of 'the' in the text.
I am displaying this on my the command display and it reads -
'the' occurs times. ?
Why doesn't the display show the amount of instances?
Thanks for helping mr.
[edited by: phranque at 10:56 am (utc) on Dec. 11, 2008]
[edit reason] disabled smileys ;) [/edit]
open(MYINPUTFILE,"database.txt") or die "database.txt not found!\n"; It's better to use the three-argument-style to open files, eg
open(FH, '<', 'file.txt') open(MYINPUTFILE,'<', "database.txt") or die "database.txt could not be opened: " . $! . "!\n"; while (<MYINPUTFILE >) { here's the major problem (and it's just a typo): the space before the closing >.
I'd recommend not to use $_ too much, but rather to read the line into a non-special variable, e.g.
while (my $line = <MYINPUTFILE>) { chop; chop will cut the last character of the string, regardless what that is, while chomp will only do this if it's a linebreak. use chomp, it'll safe you a lot of trouble debugging why your lines are mutilated when you read them from a file and didn't remember you alread chop'ed ;)
chomp $line; tr/;:,.!?-//d; I'd personally, allthough it's probably slower, use a regexp here:
$line =~ s/[;:,.!?-]/ /gis; foreach $w (split) { you should tell split where to split, e.g.
foreach $w (split(/ /, $line)) { print "\n'the' occurs $score++ times\n"; print "\n'the' occurs " . $score++ . " times\n"; As a whole, i'd do this like
my $score = 0;
open(MYINPUTFILE,'<', "database.txt") or die "database.txt could not be opened: " . $! . "!\n";
while (my $line = <MYINPUTFILE>) {
chomp $line;
$line =~ s/[;:,.!?-]/ /gis;
foreach $w (split(/ /, $line)) {
if ($w eq 'the') {
print "$.\n";
$score++;
}
}
}
print "\n'the' occurs " . $score++ . " times\n"; while a database.txt looking like this
hello this is the file to be read by the script. just to make it interesting, have a the. directly followed by a non-space-character., it prints:
1
1
1'the' occurs 3 times
hope that helps. the cost of the help is the unrequested advise ;)
Quoted from perlretut:
[perldoc.perl.org...]
An anchor useful in basic regexps is the word anchor \b . This matches a boundary between a word character and a non-word character \w\W or \W\w :
$x = "Housecat catenates house and cat";
$x =~ /cat/; # matches cat in 'housecat'
$x =~ /\bcat/; # matches cat in 'catenates'
$x =~ /cat\b/; # matches cat in 'housecat'
$x =~ /\bcat\b/; # matches 'cat' at end of stringNote in the last example, the end of the string is considered a word boundary.
\b correctly matches the end of a line, with or without a newline or whatever the OS considers an end of record character.
-krugs