Welcome to WebmasterWorld Guest from 54.196.232.162

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regex help please!

     
11:08 am on Feb 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Feb 22, 2009
posts:1396
votes: 0


Hello all,

src="http://subsub.sub.sub.adomain.com/path/to/123456.text.gif"

I would like to know what the regex would be to capture the filename irrespective of what it is, but retrieve the file type so that I have a file name to work with. Then I can feed that into another function so that I can run a cron job and periodically 'check' this file.

If anyone has a better suggestion, please let me know.

Hope that makes sense.

Cheers,
MRb
8:36 pm on Feb 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The filename will always be after the final slash.

The extension will always be after the final period.

There will be no need for any
(.*)
patterns at all; stuff like
(([^/]+/)*)
will strip the folder structure.

When you have found the filename and extension,
(([^.]+\.)+)([^./]+)
will split the filename and extension.
11:48 pm on Feb 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 1, 2003
posts:815
votes: 0


Just to be sure we are on the same page, I am assuming you want to match the full filename and also the file extension from an image tag that points to a specific directory on a specific domain. Also note, this solution will only match the first image tag found in the supplied string.

preg_match(
'/src\=\"http\:\/\/subsub\.sub\.sub\.adomain\.com\/path\/to\/([\w\.]+\.(\w+))\"/',
$string,
$matches
);

$filename = $matches[1]; # Contents of the first backreference
$filetype = $matches[2]; # Contents of the second backreference


Note the parentheses:

([\w\.]+\.(\w+))


These identify “backreferences” which are returned by the matches array starting at index 1.