Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regular Expressions help

Need to extract links and anchor text from HTML

10:27 pm on Jul 16, 2005 (gmt 0)

10+ Year Member


I thought I was proficcient in PHP until I came across regex.

In short, I have tonnes of HTML which I need to extract "href" links from complete with anchor text (if applicable).

I need a preg_match_all function regex that can give me an array of 1) The entire HTML <A> tag (e.g. <a href='http:\\www.example.com'>Example link</a> 2) Just the URL (full URL, not just domain) and 3) Anchor text

I've only seen a few suggested expressions, all of which havent worked on searches.

The regex needs to be able to take care of <a> tag anomalies where the <a is separated from the 'href' through 'onclick's etc.


10:53 pm on Jul 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

This is the place to start



9:14 am on Jul 17, 2005 (gmt 0)

10+ Year Member

Unfortunately, I've already tried teaching myself regex but have got nowhere with this particular task. Any help is appreciated.
9:35 am on Jul 17, 2005 (gmt 0)

10+ Year Member


[php-html.sourceforge.net ] could do the job.


9:46 pm on Jul 17, 2005 (gmt 0)

10+ Year Member


Ill give it a go

3:58 pm on Jul 18, 2005 (gmt 0)

10+ Year Member

Do a Google on "Regexbuddy". The accompanying site is the best introduction to regex I've seen yet.



Featured Threads

Hot Threads This Week

Hot Threads This Month