Forum Moderators: coopster
Possible variations are:
<script src="blah.js" type="text/javascript">
<script src='blah.js' type='text/javascript'>
<script type="text/javascript" src="blah.js" >
<script src=blah.js>
I would need to extract the blah.js part?
<script - starts with
.* followed by zero or more of any character
src= followed by src=
['"]* followed by zero or more of any of these, ' or " (zero is needed for unquoted attributes)
([^'">]+) followed by one or more characters that are NOT ', ", or >, as designated by the leading not ^ in the character class. The "()" saves the match in $1. Again, for non-quoted tags, so in this end of the regular expression we are including >.
> followed by >
i case - insensitive
m - treat as multiple lines
g - apply globally (note, not sure if these 3 modifiers are all supported by PHP, give it a go.) Also, m is only needed if you slurp the whole file into a variable which is what I think astupidname intended.
As for your "in page," open the file and read it line by line, and add "$cleaned" to some array.
$pattern = '/\<script.*src=['"]*([^'">]+)\>/ig';
// note I left "m" off for this line by line approach
// $pattern = '/\<script.*src=['"]*([^'">]+)\>/igm';
while (read in lines into $line . . . )
$cleaned='';
$cleaned = preg_replace($pattern,$line,$1);
if ($cleaned != '') { array_push ($found_array,$cleaned); }
}
Or you just may wish to replace it in $line:
$line = preg_replace($pattern,$line,$1);
Again, I haven't tested this and there's always a better way to do it, but break it down - the above might very well work . . .