Forum Moderators: coopster
I would search and replace '>*<' with '><'
Next import the file to Excel '<' delimited.
Then I would have another look, and another think.
Usually saving as text files and importing with new delimiters will sort out difficult problems.
/.*((href¦src) *= *'.+').*/
Take care of possible spaces around the = sign.
Replace the broken pipe by a closed pipe.
Didn't check it but you can use tools to check a regular expression online.
[edited by: coopster at 1:45 pm (utc) on Mar. 29, 2006]
[edit reason] removed url [/edit]
With a regular expression you need to locate the 'href' OR 'src' attributes followed by an equal sign. Then match any single or double quotation mark which may open the attribute value. Of course, in some older html this may not even be present so we make it optional with a question mark. Next is the part we really want, the value. We know that the value will be anything in between until we hit another single quotation mark, double quotation mark, space (because there just may be another attribute such as a class or something, or it just may be the end of the element which is marked by a closing greater than sign. And there has to be at least one or more of something that doesn't match one of these characters to constitute a match for us, so that is what the plus sign does. You'll also notice the character class began with a caret sign, which negates the values inside. It says to find me anything that DOES NOT match any of the following characters. Lastly, we use the same characters without negation to tell the regex engine that that will mark the end of our pattern.
$pattern = '/(href¦src)=[\'"]?([^\'" >]+)[\'" >]/';
preg_match_all($pattern, $string, $matches);
// $matches[2] will contain the values:
print '<pre>'; print_r($matches[2]); print'</pre>';
<added>thanks, adb64. Middle of posting when I got sidetracked. And good point, I often forget to remind folks to rekey that pipe character as the forum breaks them</added>
<?
$str_text = file_get_contents("http://php.net");
$pattern = '/(href¦src)=[\'"]?([^\'" >]+)[\'" >]/';
preg_match_all($pattern, $str_text, $matches);
echo "<pre>";
var_dump($matches);
echo "</pre>";
?>
this code prints
array(3) {
[0]=>
array(0) {
}
[1]=>
array(0) {
}
[2]=>
array(0) {
}
}
would you mind write your result here? my php version is 4
<?
error_reporting(E_ALL);
$str_text = file_get_contents("http://php.net");
$pattern = '/(href¦src)=[\'"]?([^\'" >]+)[\'" >]/';
preg_match_all($pattern, $str_text, $matches);
echo "<pre>";
var_dump($matches);
echo "</pre>";
?>