Forum Moderators: coopster & phranque

Message Too Old, No Replies

Perl Scripting

replacing HREF tag

         

savvy

6:29 am on Aug 3, 2001 (gmt 0)

10+ Year Member



Hiii All,

I am newbie to perl coding. I am struggling very much to delete <a href="">TEXT</a> tag from the html page ie., I need to remove the hyperlink of a given text, text will be shown without hyperlinks.

Could anyone please help me.

Thanx in advance.

Savs

Key_Master

7:15 am on Aug 3, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are removing all of the <a href=""> and </a> tags in the file the following will work:

$line =~ s/<a href="(.*?)">//g;
$line =~ s/<\/a>//g;

If the <a href=""> tags wrap from one line to another then it gets more trickier to remove them.

savvy

8:23 am on Aug 3, 2001 (gmt 0)

10+ Year Member



Thanx key_master :)

Brett_Tabke

11:52 am on Aug 3, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you are doing it on a line by line basis, it's often easier to tack the whole thing together as it comes.

open(FILE,"$filetoopen");
@htmlfile = <FILE>;
close(FILE);

$line =join(" ",@htmlfile);
$line =~ s/\n//gi;
$line =~ s/\<a href\=\"(.*?)\"\>//g;
$line =~ s/\<\/a\>//g;

littleman

6:05 pm on Aug 3, 2001 (gmt 0)



You could also use the 's' modifier which will allow '.' to match newline characters '\n'.

$line =~ s/<a href=\"(.*?)\">//gis;

Bolotomus

8:20 pm on Aug 3, 2001 (gmt 0)

10+ Year Member



I don't like this solution:

$line =~ s/<a href="(.*?)">//g;
$line =~ s/<\/a>//g;

What if the page contains <a name="dsfds">...</a> tags as well? It would strip out the </a> that don't belong to href's and leave you with a page that's not proper HTML anymore. To strip out all <a href="">...</a>'s I might suggest to first read the entire page into a single scalar called $text and then

$text =~ s,<a\s+[^>]*href=[^>]*>(.*?)</a>,$1,gis;

That should work for all sorts of variations of the <a href...> such as <a target='window' HREF='/some/url.html'>. Now you can use single quotes, double quotes, no quotes, multi-line links, upper/lowers, weird field orders, etc. But an <a name=...>...</a> tag should be passed over.

Anyhow, this might all be moot as the original poster said
<<I need to remove the hyperlink of a GIVEN text, text will be shown without hyperlinks.>> As if to say, don't clobber every single link, just the ones that link a GIVEN text-phrase.

If that is what we're looking for, then I would suggest this

$text =~ s,<a\s+[^>]*href=[^>]*>\Q$phrase\E</a>,$phrase,gi;

where $phrase is the phrase that you are delinking.

In my own work I've usually wanted to delink a certain URL, e.g. to make it so that no page has a link to itself.

savvy

5:14 am on Aug 6, 2001 (gmt 0)

10+ Year Member



Hii All,

This forum is really very great.

Actually, I want to delink all the hrefs from the anchor link. I got it now with your help.

Thanks to every one.

Savs