Welcome to WebmasterWorld Guest from 54.226.189.112

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regular Expressions help

Need to extract links and anchor text from HTML

     

robster124

10:27 pm on Jul 16, 2005 (gmt 0)

10+ Year Member



OK,

I thought I was proficcient in PHP until I came across regex.

In short, I have tonnes of HTML which I need to extract "href" links from complete with anchor text (if applicable).

I need a preg_match_all function regex that can give me an array of 1) The entire HTML <A> tag (e.g. <a href='http:\\www.example.com'>Example link</a> 2) Just the URL (full URL, not just domain) and 3) Anchor text

I've only seen a few suggested expressions, all of which havent worked on searches.

The regex needs to be able to take care of <a> tag anomalies where the <a is separated from the 'href' through 'onclick's etc.

Thanks

willybfriendly

10:53 pm on Jul 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is the place to start

[etext.lib.virginia.edu...]

WBF

robster124

9:14 am on Jul 17, 2005 (gmt 0)

10+ Year Member



Unfortunately, I've already tried teaching myself regex but have got nowhere with this particular task. Any help is appreciated.

arran

9:35 am on Jul 17, 2005 (gmt 0)

10+ Year Member



robster,

[php-html.sourceforge.net ] could do the job.

arran.

robster124

9:46 pm on Jul 17, 2005 (gmt 0)

10+ Year Member



Thx

Ill give it a go

ivanw

3:58 pm on Jul 18, 2005 (gmt 0)

10+ Year Member



Do a Google on "Regexbuddy". The accompanying site is the best introduction to regex I've seen yet.

cheers,
Ivan

 

Featured Threads

Hot Threads This Week

Hot Threads This Month