Forum Moderators: coopster & phranque

Message Too Old, No Replies

regex substitution in Perl

having a problem iterating

         

volatilegx

11:43 pm on Dec 4, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm having a problem where a subroutine is being called from within a loop. In the subroutine, a subsitituion happens, where certain text in a line is being replaced with a string defined by a variable. The contents of the string change with each iteration of the loop.

The first iteration works fine, with each subsequent iteration, the string seems locked to what it was in the first iteration BUT ONLY FOR THE PURPOSES OF THE SUBSTITUTION. In other places where the variable is used within the same subroutine, the string is iterated normally.

What I need to know is if there is some limitation in the substitution function of perl that causes this or is my logic just off?

Any help or suggestions would be greatly appreciated.

sugarkane

11:59 pm on Dec 4, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> limitation in the substitution function of perl

Not that I know of... could you post a snippet?

seriesint

1:58 am on Dec 5, 2001 (gmt 0)



Hi
Gotta agree with Sugarkane, need to see the code. But doesn't the /o modifier cause Perl to "lock" the value as being compiled ?Outside of that it's probably a placement issue. So check for the /o and then post back with the code if it doesn't fix it.

HTH
later

volatilegx

5:16 pm on Dec 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well the code "snippet" itself would be HUGE but I'll post the relevent parts:

# this is the loop which takes an array @keywordphrases
# which contains a list of keyword phrases input by the
# user and iterates through it, calling the &populateStaticHTMLfromTemplate
# function each time.
foreach $currentphrase (@keywordphrases){
&populateStaticHTMLfromTemplate;
$filename = $currentphrase;
$filename =~ s/[ \W]/-/g;
$filename = $filename . $suffix;
print STDOUT "<a href=\"$stathtmldirectory/$filename\">$_</a><br>\n";
$countttt++;
}

# this function opens a template and iterates through
# each line looking for template replacement codes with
# the format #keyword# or #keywordUC# or similar.
# For each keyword phrase, a new HTML document is created.
sub populateStaticHTMLfromTemplate {
open (TEMPLATE, "template.txt");
@plate = <TEMPLATE>;
close (TEMPLATE);
$rr=0;
foreach $line (@plate) {
if ($line =~ /#keyword[ULT]?C?#/){ &prikey; }
$tobedisplayed[$rr] = $line;
$rr++;
}
open (THISPAGE, "+>$stathtmldirectory/$filename");
print THISPAGE @tobedisplayed;
close (THISPAGE);
}

# this subroutine does the substitution when the replacement
# codes are found. The substitution is where I am having the
# problem. $currentphrase is iterated within the loop above,
# but not within the confines of the s/// statement for some
# reason.
sub prikey{
@keywordarray = split (/ /, $currentphrase);
$lckey = lc($currentphrase);
$uckey = uc($currentphrase);
$ucfirstkey = "";
foreach $circ (@keywordarray){
$circ = ucfirst($circ);
$ucfirstkey = $ucfirstkey . " " . $circ;
}
$ucfirstkey =~ s/^ //;

if ($line =~ /#keywordLC#/){ $line =~ s/#keywordLC#/$lckey/; }
if ($line =~ /#keywordUC#/){ $line =~ s/#keywordUC#/$uckey/; }
if ($line =~ /#keywordTC#/){ $line =~ s/#keywordTC#/$ucfirstkey/; }
if ($line =~ /#keyword#/) { $line =~ s/#keyword#/$currentphrase/; }
if ($line =~ /#keyword.*.*#/) { &prikey; }
}

seriesint

6:45 pm on Dec 5, 2001 (gmt 0)



Is the problem the files that result? I tested the values with a simple print statment in the loop and those come out correct. But the file names would be off by 1 at the least.
Change this section

&populateStaticHTMLfromTemplate;
$filename = $currentphrase;
$filename =~ s/[ \W]/-/g;
$filename = $filename . $suffix;

to

$filename = $currentphrase;
$filename =~ s/[ \W]/-/g;
$filename = $filename . $suffix;
&populateStaticHTMLfromTemplate;

And it looks like all of it works fine to me. I end up with a file per keyword with the proper subs. Something like

Philip IV of France 1268-1314
spot the dog
Pheonix The 1670
SPOT THE DOG

Philip IV of France 1268-1314
fido another dog
Pheonix The 1670
FIDO ANOTHER DOG
with "spot the dog" and "fido another dog" as the values for the keywords array.

Does that fix it? Or did I just completely take a walk into left field.

volatilegx

11:39 pm on Dec 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In which loop did you put the print statement?

If you put it in the first loop, it will appear to work, however if you put the print statement in the loop right after:

open (THISPAGE, "+>$stathtmldirectory/$filename");
print THISPAGE @tobedisplayed;
close (THISPAGE);

the keyword that replaces the #keyword# code gets stuck on the first keyword in the array.

Changing the processing order regarding the filename was a nice catch, by the way :)

seriesint

1:19 am on Dec 6, 2001 (gmt 0)



alright, I have to be missing something. Figure there's a massive difference between your input files and the simplistic one I'm using to test with. But here's the output I get with some light snipping of repeated parts....

Calling StaticHTMLfromTemplate currentphrase is spot the dog

in the sub prikey... phrase is spot the dog
lower sub Spot lower sub The lower sub Dog
...
printing spot-the-dog.txt currentphrase keyword is spot the dog
<a href="/spot-the-dog.txt"></a><br>
Calling StaticHTMLfromTemplate currentphrase is fido another dog

in the sub prikey... phrase is fido another dog
lower sub Fido lower sub Another lower sub Dog
......
printing fido-another-dog.txt currentphrase is fido another dog
<a href="/fido-another-dog.txt"></a><br>
Calling StaticHTMLfromTemplate currentphrase is pus the cat

in the sub prikey... phrase is pus the cat
lower sub Pus lower sub The lower sub Cat
....
printing pus-the-cat.txt currentphrase is pus the cat
<a href="/pus-the-cat.txt"></a><br>

ok and the template input file for this is blasted simple as it comes.
All I did was copy the matches from the sub section at the end. so its

Philip IV of France
#keywordLC#
Phillis Has Such Charming Graces Anthony Young
Pheonix The
#keywordUC#

And I did put one of the keywords within other text to see if it matched there and it did. So at this point, I'm like ummm. It looks right to me. Print statements show it working and the files I get are the way I "think" you want as the final output. To even further check I just plastered print statements throughout those snippets. Still comes up as its passing the current $currentphrase down the line.
If you want, I can post the file I'm using and you can run it locally but could it be the input file (template?) as I said mine is really simple.

volatilegx

5:30 pm on Dec 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah if you could post your snippet it would probably help.

Maybe I've been looking at my code so long I just can't see the problem.

Thanks a million!!!!!!!

seriesint

8:19 pm on Dec 6, 2001 (gmt 0)



ok here goes,

#! perl test.pl
$suffix = ".txt";
@keywordphrases = ( "spot the dog" , "fido another dog", "pus the cat" );
foreach $currentphrase (@keywordphrases){

$filename = $currentphrase;
$filename =~ s/[\W]/-/g;
$filename = $filename . $suffix;
print "Calling populateStaticHTMLfromTemplate, currentphrase is $currentphrase\n";
&populateStaticHTMLfromTemplate;
print "$currentphrase after the populateStatic sub\n";
print STDOUT "<a href=\"$stathtmldirectory/$filename\">$_</a><br>\n";
$countttt++;
}

sub populateStaticHTMLfromTemplate {
open (TEMPLATE, "test.txt");
@plate = <TEMPLATE>;
close (TEMPLATE);
$rr=0;
foreach $line (@plate) {
if ($line =~ /#keyword[ULT]?C?#/){ &prikey; }
#print $line;
$tobedisplayed[$rr] = $line;
$rr++;
}
print "\nprinting $filename currentphrase is $currentphrase \n";
open (THISPAGE, "+>$filename");
print THISPAGE @tobedisplayed;
close (THISPAGE);
print "finished printing $filename currentphrase is $currentphrase \n";
}

sub prikey{
@keywordarray = split (/ /, $currentphrase);
print "in the sub prikey... phrase is $currentphrase \n";
$lckey = lc($currentphrase);
$uckey = uc($currentphrase);
$ucfirstkey = "";
foreach $circ (@keywordarray){
$circ = ucfirst($circ);
$ucfirstkey = $ucfirstkey . " " . $circ;
print "lower sub $circ\t";
}
$ucfirstkey =~ s/^ //;

if ($line =~ /#keywordLC#/){ $line =~ s/#keywordLC#/$lckey/; }
if ($line =~ /#keywordUC#/){ $line =~ s/#keywordUC#/$uckey/; }
if ($line =~ /#keywordTC#/){ $line =~ s/#keywordTC#/$ucfirstkey/; }
if ($line =~ /#keyword#/) { $line =~ s/#keyword#/$currentphrase/; }
if ($line =~ /#keyword.*.*#/) { &prikey; }
}

that's the code and here's the file I was using, its short & simple.

Philip IV of France
#keywordLC#
Phillis Has Such Charming Graces Anthony Young
Pheonix The
#keywordUC#
Physical Snob The c
Pheonix #keywordTC# Pheonix
Piano Concerto #2 Rachmaninoff
#keyword#
Piano Concerto Grieg

Maybe try running the snippet with your template file to see if it's something in the simplicity of my template.

volatilegx

12:35 am on Dec 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK I figured out my problem.

It turns out I was opening the template file at the wrong time and it was only being used in virgin form once. For each subsequent time the template was being referred to, actually the already-parsed text was being used, so no template codes were in the text, only the already used keywords.

I fixed it by reopening the template for each iteration instead of reading it only once.

Thanks for your help, seriesint!