Forum Moderators: coopster & phranque

Message Too Old, No Replies

Any Regex Experts in the House?

         

Robber

4:44 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Heres the problem, I'm trying to write a Perl ad managment script - I know theres a load already out there, but Im trying to educate myself.

I've got myself an admin screen where images and urls can be added (they get written into a txt file), that works fine. But before they get added I do a check to make sure they arent already in there, to do this Im using a Regex to check the file contents and see if the new entry is matched. But something funny is going on. The match expression is actually a variable which is made by concatenating two variables (img url and link url) with a pipe separating them. But the match evaluates to true even if change the img url. Does Perl have a problem using scalars as the match expression?

Hope that makes some sense. Thanks for any help.

amoore

4:55 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Does Perl have a problem using scalars as the match expression?

Nope.

Any chance of an example of the failing piece of code?

Robber

5:04 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



OK heres an example, but try not to laugh at my sloppy code too much!!

Heres the sub:

sub write_file {

$entry = $image . "¦" . $link;
$bannerFound = "False";

open (FILE, "banner_db.txt");
@banner_info = <FILE>;
close FILE;

foreach $banner (@banner_info){
if ($banner =~ m!$entry!){
$bannerFound = "True";
}
}

open(FILE, ">>banner_db.txt");

if ($bannerFound eq "False"){
print FILE "$image¦$link\n";
}

close FILE;

}

Basically I can change the value of $image so that it shouldnt be matched but it fails to write to the file. It seems to work if the value of $link is changed though.

Robber

5:24 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Could it be something such as the match getting messed up because the value in the expression variable contains a URL and hence has slashes, which would usually need to be escaped?

amoore

5:27 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



I'm not sure if you realize this or not, but the pipe symbol, "¦", means "or" in your regular expression. Are you trying to search for something that matches $image or $link? or are you trying to match anything that matches the concatenation of $image with "¦" and $link (like in a pipe-delimited text file)?

That could be your problem.

Also, you're obviously not using "-w" and "use strict". there's no reason for that in this case, is there?

DylanW

5:28 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



You might try adding $entry = quotemeta($entry); to escape it.

jatar_k

5:30 pm on Jul 16, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WmW DylanW

Brett_Tabke

5:37 pm on Jul 16, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Looks to me like the pipe character (regex OR) is out of place. You need to escape that "\¦".

Also, I wouldn't use a regex to compare the "equality" of strings.

Instead of the regex, I would use a straight "eq" to compare them.

sub write_file {

$entry = $image . "¦" . $link;
$bannerFound = "False";

open (FILE, "banner_db.txt");
@banner_info = <FILE>;
close FILE;
chomp @banner_info; #delete all returns from the array

foreach $banner (@banner_info){
if ($banner eq $entry){
$bannerFound = "True";
}
}

open(FILE, ">>banner_db.txt");

if ($bannerFound eq "False"){
print FILE "$image¦$link\n";
}

close FILE;

}

It's tempting to do it the way you are doing it and it is a common method for those new to perl.
There's an easier, faster, and longer term solution though.

By using a HASH instead of a scalar array, you will have longer term mantainable code.

sub write_file {
$entry = "$image¦$link";
open (FILE, "banner_db.txt");
@banner_info = <FILE>;
close FILE;
chomp @banner_info; #delete all returns from the scalar

#build a temp hash
foreach $banner (@banner_info){
$temphash{$banner}++;
}

#test if your key exists and write it if it does.
if (!exists $temphash($entry)) {
open(FILE, ">>banner_db.txt");
print FILE "$image¦$link\n";
close FILE;
}

}

Robber

6:12 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Thats starting to explain things a bit now, should've known about the pipe, whoops. I was using it delimit the fields in my text filed, but it makes perfect sense that when it appears in the regex it operates as an OR.

As Brett suggests, I had tried a straight "eq" first but that didnt work too well either - I guess that could also be because I had a pipe in their.

Also thanks for the tip re using the hash.

Thanks for the help everyone.

amoore

6:14 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Looks like we're making progress on this script. It just may work after all ;)

I wouldn't take all of the file into memory like that (especially not twice.)
I also wouldn't open the file unless I was going to write to it.
Also, there's no reason not to make your code run under 'strict':
You might as well catch the file open and close errors and print them out and die as well.

sub write_file {
my $entry = $image."¦".$link;
my $bannerfound = 0;
open (FILE, "banner_db.txt") or die( "can't open banner file.");
while (my $banner = <FILE>) {
chomp $banner;
$bannerfound = 1 if ($entry eq $banner);
}
close FILE;

unless( $bannerfound ) {
open (FILE, ">>banner_db.txt") or die( "can't open banner file for writing");
print FILE $entry, "\n";
close FILE;
}
}

Robber

6:14 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Just one more quick question then.

Is it possible that when I used the eq instead of the regex it didnt work because I hadnt got rid of line ends using chomp?

Cheers

amoore

6:17 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Is it possible that when I used the eq instead of the regex it didnt work because I hadnt got rid of line ends using chomp?

yep.

Robber

6:26 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Hi amoore, we seem to have come back at the same time! Thanks for the ongoing support. I'll have to post the final script when its up and running.

Your last post there is useful, I had wondered about opening the file twice but hadnt got around to streamlining it yet, that seems like an excellent solution. Just noticed that Bretts sample also does this by only opening the file to write if the key exists in the hash.

Also, just to run through your suggestion, rather than storing each line of the file in the array, you are now just looking at each line individually yes, hence not needing to store the whole file in memory? I guess we could stream line that more if as soon as $bannerfound = 1 we exit the loop. Does that sound sensible?

The script will probably be serving upwards of 10k banners a day (assuming I can get it finished!). Are then any special considerations I should make to ensure the script can handle it all, or should it cope fine?

Cheers

[edited by: Robber at 6:46 pm (utc) on July 16, 2002]

amoore

6:44 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



I guess we could stream line that more if as soon as $bannerfound = 1 we exit the loop. Does that sound sensible?

Absolutely. I'd change that line:
$bannerfound = 1 if ($entry eq $banner );

to something like:

if ( $entry eq $banner ) {
$bannerfound = 1;
last;
}

I think this script will cope just fine. Your file may get pretty big, but I'm not sure what to do about that right now.

Good luck.

Robber

6:53 pm on Jul 16, 2002 (gmt 0)

10+ Year Member



Thanks for all the pointers, cheers.

Just so as you all know I have got as far as testing Bretts suggestions and the script works fine now. I will be trying all the rest to, so by the time I have worked my way to the end of the thread I should have had quite an education - and all in a single evening. Excellent, thanks!!