Hi, Can anyone help me with the code below? I want to be able to open web pages and find all the sections containing the meta element with the "name" attribute, and then grab the meta descriptions i.e. values of content attribute when 'name="description"' is present. The problem that I am having with the code below is that it doesn't find such sections when they are preceded by other '<meta name...' sections. Eg.
<!---
It will work for the following lines in the
content....
-->
<title>here is a title</title>
<meta NAME="DESCRIPTION" CONTENT="It works with
this description spread over
several lines">
<!---
but it won't work for.....
-->
.
.
<meta name="robots" content="all" />
<meta name="author" content="Fred Smith" />
<meta name="Copyright" content="Copyright (c) 2005 ..." />
<title>page title goes here</title>
<meta name="description" content="this is the content I want,
but it fails to grab it">
<!-- Now, here is the code. I suspect there is a problem with the regex or perhaps the way I am
calling preg_match_all
-->
<?
include_once("class.errorHandler.php");
$urlOpen = // url for web page to open
$content = '';
ErrorHandler::set();
try
{
if ($handle = fopen($urlOpen, "r"))
{
while (!feof($handle))
{
$content .= fread($handle, 8192);
}
}
fclose($handle);
} catch(Exception $e)
{
// "Couldn't open file";
}
if ($content != "")
{
$pattern = '/<meta ([^>]*)name="([^"\'>]*)"([^>]*)/im';
if (preg_match_all($pattern,$content,$matches))
{
for ($i=0;$i<count($matches)&&!isset($descr);$i++)
// loop will terminate when $descr gets assigned the result, or when
// all matches have been looped through
{
$str_matches=strtolower($matches[$i][0]);
$pos = strpos($str_matches,'name=');
if (!(is_bool($pos) && !$pos))
{
$name = trim(substr($str_matches,$pos+5),'"');
$pos = strpos($name,'"');
$name = substr($name,0,$pos);
if (strcasecmp("description",$name)==0)
{
$pos = strpos($str_matches,'content=');
if (!(is_bool($pos) && !$pos))
$descr=trim(substr($str_matches,$pos+8),'"');
}
}
}
}
else
{
$descr='no description found';
}
}
----------------
Any help will be appreciated. Thanks in advance,
Phil