Forum Moderators: coopster

Message Too Old, No Replies

reguler expression fun

         

bleak26

2:20 pm on Oct 2, 2005 (gmt 0)

10+ Year Member



I am trying to create a reguler expresion to find all the links in a page of html , i am using the expresion shown bellow ,but it does not seem to work. can you see what the problem is.

^<html>(.*)$</a>

Thanks guys / girls

claus

3:20 pm on Oct 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It doesn't catch a link, because a link does not start with <html>

Further, you have included a "stop sign" (the $) inside your expression, so the </a> isn't even included.

I would think that this expression should match some links:

<a\shref.*/a> 

ergophobe

4:18 pm on Oct 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Regular expressions are "greedy" by default, so if you have


this is <a href="link1.htm">link1</a> and this is <a href="link2.html">link2</a>

The regex that Claus gave will match


="link1.htm">link1</a> and this is <a href="link2.html">link2<

To match a single link, you need to make it ungreedy, as in

$pattern = '/<a\s[^>]*href=["\']?(.+)[\'"\s>]/U';

In English: find a tag that starts with '<a' followed by white space and zero or more other characters that do not close the tag, stopping at the first "href=" encountered, followed or not by single or double quotes, then after that grab anything you find up until you encounter either a close quote or a space or a close tag.

So in


this is <a href="link1.htm">link1</a> and this is <a href="link2.html">link2</a>

This will match


link1.htm

If you want all them, plug that pattern into preg_match_all. If you want the whole thing (from <a> to </a>) you will need to enclose it all in parens for the capture.