Forum Moderators: coopster
I have a long html page in which the chunk of code I am interested in looks like below,
Code:
<ul>
<li><strong><a href="http://www.example.com/string1/string2/string3/string4/#*$!-#*$!-#*$!x-xx-#*$!x.html" title="lin title here">link name here</a></strong>
<br /><address>
address here<br />
blah blah blah<br />
blah blah
</address>
</li>
</ul>
</div>
all the target urls are within <li></li> tags
I want to fetch the URL PART from this html i.e
Code:
http://www.example.com/string1/string2/string3/string4/#*$!-#*$!-#*$!x-xx-#*$!x.html
for which I am using following code
PHP Code:
preg_match_all("/(http¦https)?:\/\/?([a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,5})(:[a-zA-Z0-9]*)?\/?([a-zA-Z0-9.-_]*\/)?([a-zA-Z0-9.-_?&=%+$]+)?/", $url , $arr );
this regex gives me URLs of all patterns however I am interested only in the above pattern URLs for which I tried many regex but all give empty results.
and something like this
Code:
preg_match_all("http\:\/\/www\.[a-zA-Z0-9-_.]\.com\/[a-zA-Z0-9-_.]\/[a-zA-Z0-9-_.]\/[a-zA-Z0-9-_.]\/[a-zA-Z0-9-_.]\/[a-zA-Z0-9-_.]\.html",$url,$arr);
gives me following error
Code:
Delimiter must not be alphanumeric or backslash in
I wonder if someone could help me to write a regex that can get URLs on only this scheme and no other URLs as there are many other schemes of the URLs too in the same long html output.
thank you very much.
[edited by: eelixduppy at 12:40 pm (utc) on Aug. 27, 2007]
[edit reason] use example.com, thanks [/edit]