Welcome to WebmasterWorld Guest from 54.147.236.192

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regular Expressions help

Need to extract links and anchor text from HTML

     
10:27 pm on Jul 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 2, 2004
posts:130
votes: 0


OK,

I thought I was proficcient in PHP until I came across regex.

In short, I have tonnes of HTML which I need to extract "href" links from complete with anchor text (if applicable).

I need a preg_match_all function regex that can give me an array of 1) The entire HTML <A> tag (e.g. <a href='http:\\www.example.com'>Example link</a> 2) Just the URL (full URL, not just domain) and 3) Anchor text

I've only seen a few suggested expressions, all of which havent worked on searches.

The regex needs to be able to take care of <a> tag anomalies where the <a is separated from the 'href' through 'onclick's etc.

Thanks

10:53 pm on July 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 1, 2002
posts:1834
votes: 0


This is the place to start

[etext.lib.virginia.edu...]

WBF

9:14 am on July 17, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 2, 2004
posts:130
votes: 0


Unfortunately, I've already tried teaching myself regex but have got nowhere with this particular task. Any help is appreciated.
9:35 am on July 17, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 16, 2005
posts:456
votes: 0


robster,

[php-html.sourceforge.net ] could do the job.

arran.

9:46 pm on July 17, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 2, 2004
posts:130
votes: 0


Thx

Ill give it a go

3:58 pm on July 18, 2005 (gmt 0)

New User

10+ Year Member

joined:Oct 1, 2004
posts:5
votes: 0


Do a Google on "Regexbuddy". The accompanying site is the best introduction to regex I've seen yet.

cheers,
Ivan