Forum Moderators: coopster

Message Too Old, No Replies

preg match script src

         

FiRe

11:17 am on Apr 2, 2009 (gmt 0)

10+ Year Member



Can anyone suggest a regular expression to match the src in a script tag?

Possible variations are:

<script src="blah.js" type="text/javascript">
<script src='blah.js' type='text/javascript'>
<script type="text/javascript" src="blah.js" >
<script src=blah.js>

I would need to extract the blah.js part?

astupidname

12:32 pm on Apr 2, 2009 (gmt 0)

10+ Year Member



This should do the job:
<?php
$str = '<script type="text/javascript" src="foo-foo number_23.js"></script>';
//matches letters and numbers and underscores and dash-> '-' plus white space,
//ending in .js (the period in .js is required to find the correct match, as the pattern '.js' is key),
//and assigns to $matches array of returned results
//(there will/should only be one returned from a script tag)
$rex = '/[\w+-\s*]+\.js/';
preg_match($rex,$str,$matches);
echo $matches[0]; //foo-foo number_23.js
?>

FiRe

12:55 pm on Apr 2, 2009 (gmt 0)

10+ Year Member



What I forgot to mention was that it needs to search for any file within an HTML page (not just a string with the code in it). So it needs to be something like...

<script ? src=?>

Thanks

rocknbil

4:41 pm on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This will bear some experimentation, but try this. When working with regexps, it's good to map it out first:

<script - starts with

.* followed by zero or more of any character

src= followed by src=

['"]* followed by zero or more of any of these, ' or " (zero is needed for unquoted attributes)

([^'">]+) followed by one or more characters that are NOT ', ", or >, as designated by the leading not ^ in the character class. The "()" saves the match in $1. Again, for non-quoted tags, so in this end of the regular expression we are including >.

> followed by >

i case - insensitive

m - treat as multiple lines

g - apply globally (note, not sure if these 3 modifiers are all supported by PHP, give it a go.) Also, m is only needed if you slurp the whole file into a variable which is what I think astupidname intended.

As for your "in page," open the file and read it line by line, and add "$cleaned" to some array.

$pattern = '/\<script.*src=['"]*([^'">]+)\>/ig';
// note I left "m" off for this line by line approach
// $pattern = '/\<script.*src=['"]*([^'">]+)\>/igm';


while (read in lines into $line . . . )
$cleaned='';
$cleaned = preg_replace($pattern,$line,$1);
if ($cleaned != '') { array_push ($found_array,$cleaned); }
}

Or you just may wish to replace it in $line:

$line = preg_replace($pattern,$line,$1);

Again, I haven't tested this and there's always a better way to do it, but break it down - the above might very well work . . .

rocknbil

3:38 pm on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



$line = preg_replace($pattern,$line,$1);

DOH! :-( These are backwards, sorry. S/B

$cleaned = preg_replace($pattern,$1,$line);
$line = preg_replace($pattern,$1,$line);