Forum Moderators: coopster & phranque

Message Too Old, No Replies

Perl pattern matching

Help! I can't figure it out!

         

gsx

5:50 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



I'm using Perl to insert a <span class=...> </span> around a result page for a search on my site so visitors can quickly see how their results match their expectations.

I can easily insert the span commands but this inserts them everywhere (including links).

A result may look like:

---
Widget (Product Code)
This widget is one that you can buy.
You may be interested in <a href='cgi-bin/showproducts.pl?bluewidget,redwidget,greenwidget' target='content'>a bluewidget, a redwidget or even a greenwidget</a>
---

Each result (i.e. all of the above) is stored in a variable $pdesc.

When I swap the word 'widget' (if that is what the user searched for, it corrupts the link as showproducts.pl?bluewidget becomes showproducts.pl?blue<span class=...>widget</span>

So what I want is to replace 'widget' globally within the string, but not if it is between the < and > symbols. Is it possible to do this?

I couldn't figure it out:

$pdesc =~ s/($search)/<span class=...>$1<\/span>/gi;
is what I have used to replace all occurences. What else do I need to add to do what I want. Bearing in mind that it may contain many < and >'s for various links and formatting.

idiotgirl

6:00 pm on Aug 8, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



Are you using a separate print statement using the search value? It looks like you're altering your query by inserting the span class as a combined value, rather than altering the print output. (Or maybe I'm misundertanding.)

gsx

7:25 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



I have one variable called $pdesc this contains:
Widget (Product Code)<br>This widget is one that you can buy.<br>You may be interested in <a href='cgi-bin/showproducts.pl?bluewidget,redwidget,greenwidget' target='content'>a bluewidget, a redwidget or even a greenwidget</a>

Another variable is what the user searches on ($search) and may contain:

widget

I want to adjust $pdesc to become:

<span class="...">Widget</span> (Product Code)<br>This <span class="...">widget</span> is one that you can buy.<br>You may be interested in <a href='cgi-bin/showproducts.pl?bluewidget,redwidget,greenwidget' target='content'>a blue<span class="...">widget</span>, a red<span class="...">widget</span> or even a green<span class="...">widget</span></a>

Changes in bold. Performing the above would be simple, however notice href (in italics) which contains widget several times, but I obviously do not want to surround that with the <span...> commands. I want to replace all occurences of widget except those between any '<' and '>'.

ergophobe

9:13 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know Perl, but as a general idea, how about this for a possiblity...

If your $pdesc content is being put together inside a function (it looks like it's running through a data set and building a set of links, so I'm assuming it's returned by a function or could be), could you do the substitution inside the function by passing the search term to it. At that point I'm assuming that the product descriptions and the links to them are separated so there's no problem is there?

Otherwise, since I stink at regex, I would probably end up writing a function that parsed the text and then returned it marked up as needed, but I'm sure there's a better solution.

Tom

mdharrold

10:31 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



I have a fix, but it is ugly.


$pdesc = "Widget (Product Code)<br>This widget is one that you can buy.<br>You may be interested in <a href='cgi-bin/ showproducts.pl?bluewidget,redwidget,greenwidget' target='content'>a bluewidget, a redwidget or even a greenwidget</a>";

$pdesc =~ s/widget/<span class=\"\.\.\.\">widget<\/span>/ig;

&fix;

sub fix
{$pdesc =~ s/<a href(.*)<span class=\"\.\.\.\">widget<\/span>(.*)target='content'>/<a href$1widget$2target='content'>/ig;
if ($pdesc =~ /<a href(.*)<span class(.*)<\/span>(.*)target='content'>/i)
{&fix;}}

It does work, but I'm sure someone can come up with a simpiler solution.

mdharrold

2:13 am on Aug 9, 2002 (gmt 0)

10+ Year Member



Prettier version:


$pdesc =~ s/widget/<span class=\"\.\.\.\">widget<\/span>/ig;
until ($pdesc !~ /<a href(.*)<span class(.*)<\/span>(.*)target='content'>/i)
{$pdesc =~ s/<a href(.*)<span class=\"\.\.\.\">widget<\/span>(.*)target='content'>/<a href$1widget$2target='content'>/ig;}

Brett_Tabke

3:00 am on Aug 9, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This thread might help:

[webmasterworld.com...]

Robber

7:04 am on Aug 9, 2002 (gmt 0)

10+ Year Member



Hi gsx,

Could you extend your regex so that you include something that will look for the angle brackets, and only do the change if the text is not inside them eg:

/<[^>]($search)/

I'm no pro on regex so that is probably not quite right, but it might look something like that.

Cheers

gsx

2:07 pm on Aug 9, 2002 (gmt 0)

10+ Year Member



Thanks for your help.

mdharold, your solution does work! thanks