Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regexp challenge

         

sugarkane

1:31 pm on Jun 9, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After reading this post [webmasterworld.com] on switching the case of HTML tags, I thought it'd be fun to try and do it in Perl. (Fun?? I need to get out more...)

Anyhow, I came up with this:

#!/usr/bin/perl

open (FP,"foo.html");
@lines=<FP>;
close (FP);

open (OUT,">bar.html");
foreach $i (@lines) {
$i=~s/<((?:(?!<).)*)>/"<".lc($1).">"/eg;
print OUT "$i";
}
close(OUT);

The snag is that it lowers the case of *everything* in a tag eg alt text. Can anyone come up with a better solution?

Brett_Tabke

4:18 am on Jun 12, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I looked at that the other day for quite awhile. I never could get it to work. There should be a way to look for a <(.*) and then a space/tab or <(.*)>
(would take a better more regex friendly person than me).

Bolotomus

9:00 pm on Aug 3, 2001 (gmt 0)

10+ Year Member



This is actually pretty tricky if you want it to be 100% compitable with all the weird tags and syntaxes you see out there.

Just to make sure I understand, how should it handle a tag like this:

<IMG SRC=MyFile.gif Alt="La De Da" bOrDeR=3>

I assume you want it to produce

<img src=MyFile.gif alt="La De Da" border=3>

The hard part comes in identifying what's a value and what's not, especially when the use of quotes around URLs etc. is optional.

I think I'd have to suggest using the HTML module to parse the HTML, but that takes all the 'fun' out of finding the solution.