Welcome to WebmasterWorld Guest from 54.227.14.23

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regular Expressions help

Need to extract links and anchor text from HTML

     
10:27 pm on Jul 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 2, 2004
posts:129
votes: 0


OK,

I thought I was proficcient in PHP until I came across regex.

In short, I have tonnes of HTML which I need to extract "href" links from complete with anchor text (if applicable).

I need a preg_match_all function regex that can give me an array of 1) The entire HTML <A> tag (e.g. <a href='http:\\www.example.com'>Example link</a> 2) Just the URL (full URL, not just domain) and 3) Anchor text

I've only seen a few suggested expressions, all of which havent worked on searches.

The regex needs to be able to take care of <a> tag anomalies where the <a is separated from the 'href' through 'onclick's etc.

Thanks

10:53 pm on July 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 1, 2002
posts:1834
votes: 0


This is the place to start

[etext.lib.virginia.edu...]

WBF

9:14 am on July 17, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 2, 2004
posts:129
votes: 0


Unfortunately, I've already tried teaching myself regex but have got nowhere with this particular task. Any help is appreciated.
9:35 am on July 17, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 16, 2005
posts:456
votes: 0


robster,

[php-html.sourceforge.net ] could do the job.

arran.

9:46 pm on July 17, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 2, 2004
posts:129
votes: 0


Thx

Ill give it a go

3:58 pm on July 18, 2005 (gmt 0)

New User

10+ Year Member

joined:Oct 1, 2004
posts:5
votes: 0


Do a Google on "Regexbuddy". The accompanying site is the best introduction to regex I've seen yet.

cheers,
Ivan

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members