Forum Moderators: coopster & phranque

Message Too Old, No Replies

Changing Text in a HTML Tag

Question about changing text

         

Bally

8:08 pm on May 6, 2005 (gmt 0)

10+ Year Member



Hello everyone

I'm trying to extract some text from between a <H1> and </H1> tag and change it to sentence case I can do the opening files and change the information once it's out however I can't get the information out cleanly.

My problems occur if the <H1> tag has attributes e.g. <H1 align="center"> etc. I can't find any way of matching all the possibilities and then working out where the text I want starts.

Thanks in advance for your help

wruppert

10:49 pm on May 6, 2005 (gmt 0)

10+ Year Member



You can only get so far in HTML with patterns, then you have to start mucking with HTML::Parse. But this may work:


use strict;
use warnings;
use Carp;

sub titlecase($) {
join(' ', map{ucfirst("$_")} split(/\s/, lc shift));
}

my $old_h1 = qq(<h1 class="This is a class" align="center">THIS IS THE HEADER</h1>);

my ($head_start, $head_text, $head_finish) =
$old_h1 =~ m¦(<h1\s.*?\>)(.*?)(</h1>)¦i;

my $new_h1 = $head_start . titlecase $head_text . $head_finish;

print "Old: $old_h1\n";
print "New: $new_h1\n";

Output:
Old: <h1 class="This is a class" align="center">THIS IS THE HEADER</h1>
New: <h1 class="This is a class" align="center">This Is The Header</h1>

SeanW

3:56 pm on May 7, 2005 (gmt 0)

10+ Year Member



/<\s*h1.*?>(.*?)<\/h1>/i

Or, use HTML::TreeBuilder (OTTOMH):

[perl]
$p = HTML::TreeBuilder->new();
$root = $p->parse_file($file)
my @h1s = $root->look_down('_tag', 'h1');
foreach my $h1 (@h1s) {
print $h1->as_text;
}
[/perl]