Forum Moderators: coopster

Message Too Old, No Replies

scraping image from piece of html

         

hanyaz

3:39 pm on Apr 5, 2009 (gmt 0)

10+ Year Member



Hello,
I am trying to extract a thumbnail url from a piece of html,
it returns me 0 everytime. Could somebody advice me on how to do it ?
Here is the code :

$chaine= preg_match_all('/<img.*src="(.*)"\s\/>/', '<a href=\"http://www.example.com/video/x8ncll_agression-sur-les-poles-emplois_news?from=rss\"><img align=\"right\" width=\"120\" height=\"90\" src=\"http://example.com/dyn/preview/160x120/14526345.jpg?20090313195207\" style=\"border: 2px solid #B9D3FE;\"></a><p>Content here</p><p>Author: <a href=\"http://www.example.com/Marianne2fr?from=rss\"><img src=\"http://example.com/dyn/avatar/80x80/14308393.jpg?20080915094417\" width=\"80\" height=\"80\" alt=\"avatar\"/>Marianne2fr</a><br />Tags: <a href=\"http://www.example.com/tag/pôles\">pôles</a> <a href=\"http://www.example.com/tag/emploi\">emploi</a> <a href=\"http://www.example.com/tag/chômeurs\">chômeurs</a> <a href=\"http://www.example.com/tag/agression\">agression</a> <br />Posted: 12 March 2009<br />Rating: 3.5<br />Votes: 8<br /></p>', $Texte);
echo $chaine;

i am trying to extract the following url :
http://example.com/dyn/preview/160x120/14526345.jpg?20090313195207

thanks in advance for your help
hanyaz

[edited by: eelixduppy at 3:45 pm (utc) on April 5, 2009]
[edit reason] exemplified [/edit]

IanKelley

5:16 pm on Apr 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note: I have not tested the expression myself, before or after.

There are a couple things I would change about the expression... Try this:

$chaine= preg_match_all('/<img.*?src="([^"]+)"\s*\/?>/', '<a href=\"http://www.example.com/video/x8ncll_agression-sur-les-poles-emplois_news?from=rss\"><img align=\"right\" width=\"120\" height=\"90\" src=\"http://example.com/dyn/preview/160x120/14526345.jpg?20090313195207\" style=\"border: 2px solid #B9D3FE;\"></a><p>Content here</p><p>Author: <a href=\"http://www.example.com/Marianne2fr?from=rss\"><img src=\"http://example.com/dyn/avatar/80x80/14308393.jpg?20080915094417\" width=\"80\" height=\"80\" alt=\"avatar\"/>Marianne2fr</a><br />Tags: <a href=\"http://www.example.com/tag/pôles\">pôles</a> <a href=\"http://www.example.com/tag/emploi\">emploi</a> <a href=\"http://www.example.com/tag/chômeurs\">chômeurs</a> <a href=\"http://www.example.com/tag/agression\">agression</a> <br />Posted: 12 March 2009<br />Rating: 3.5<br />Votes: 8<br /></p>', $Texte); 
echo $chaine;

This will work when there are multiple image tags (separate match for each) and has a safer match for the actual image URL. Most importantly the whitespace match is now conditional (that's why your expression was failing)... Also made / conditional so it will work for now XHTML code.