Forum Moderators: coopster & phranque

Message Too Old, No Replies

Help on Perl Regular Expressions

Striping last character from specific strings

         

frodo

5:47 pm on Oct 22, 2002 (gmt 0)



Hello all.

I have a variable ($sch_body_file) in my perl script which holds the contents of a html page. All links in this html page have double quotes (") at the end of the URL. For example

<A href=http://www.cokama.com/" target=_blank>a link to cokama</A>

Basically I need to strip the ending doublequote from all strings which begin with href=http://

The resulting code should look like
<A href=http://www.cokama.com/ target=_blank>a link to cokama</A>

Bearing in mind also there could be multiple links in the HTML page. Unfortunatly I'm not that good at regular expressions so if anybody can help, I would appreciate it.

Cormac.

andreasfriedrich

8:41 pm on Oct 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



While I´m not sure what you will want to use such invalid HTML for here goes.

($without_quotes = $sch_body_file) =~ s{(href=http://[^"]+?)"}{$1}g;

This RE matches 'href=http://' followed by one or more characters that are not '"' followed by a '"' and substitutes it with 'href=http://' followed by one or more characters that are not '"'.

Hope this helps.

Andreas

amoore

9:27 pm on Oct 22, 2002 (gmt 0)

10+ Year Member



you might try chop( $sch_body_file ).

Here's a little bit on chop from "perldoc -f chop"

chop Chops off the last character of a string and
returns the character chopped. It is much more
efficient than "s/.$//s" because it neither scans
nor copies the string. If VARIABLE is omitted,
chops $_. If VARIABLE is a hash, it chops the
hash's values, but not its keys.

amoore

9:29 pm on Oct 22, 2002 (gmt 0)

10+ Year Member



Oh, I see. that variable holds the contents of the whole page. Yeah, in that case, you may want to do a bunch of regular expression stuff.
Sorry.

Robber

9:38 pm on Oct 22, 2002 (gmt 0)

10+ Year Member



You might want to think about adding a " after the = rather than stripping the end one - this would bring you more inline with xhtml.

frodo

10:00 am on Oct 23, 2002 (gmt 0)



Thanks very much andreasfriedrich. Your solution worked perfectly. I'll have to learn Regular Expressions, they are so powerful.