Forum Moderators: coopster
<?php
$link = 'http://www.example.com/';
$page = file_get_contents($link);
preg_match_all('~([a-z0-9\.\_\-]+(\.gif¦\.jpe?g))~i', $page, $matches);
for($i=0; $i < count($matches[0]); $i++) {
$source = $link . $matches[0][$i];
$filenames = $matches[1][$i];
$ext = $matches[2][$i];
echo "$source<br/><br/>";
echo "$filenames<br/><br/>";
echo "$ext<br/><br/>";
}
Unofrtunately this does not grab their entire URL, nor does it account for if the site's image source is <img src="../pic.jpg"> or other non absolute paths.
Does anyone have a bullet proof script of a great regular expression to solve this problem.
Thanks heaps.
Unofrtunately this does not grab their entire URL...
'~<img ....... >~i';
As what you have at the moment will not only pick up image tags, but will also pick up text like this_is_a_pic.jpg. So is not actually going to give you correct results.
nor does it account for if the site's image source is <img src="../pic.jpg"> or other non absolute paths.
So how about something like:
'~<img .*src="([a-z0-9\.\_\-]+(\.gif¦\.jpe?g))".*>~i';
// or you could split up your searches
'~<img([^>]+)>~i'; // find all image tags
'~src=['"]([\w\.\-]+(\.gif¦\.jpe?g))['"]~i'; // get the src attribute
You may find that the split regex runs faster, as it avoids the .* I have put in the first one to take into account spaces and other attributes. You are also only looking for gif and jpg images. While these are common you are missing out and all of the other image types. This may or may not be something you are worried about.