Forum Moderators: coopster

Message Too Old, No Replies

How to read meta tags from website?

         

toplisek

10:04 am on Jan 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How to read with script meta tags from website and using PHP form?

I have code to read but it does not work. Do you have any idea why is not detecting meta tags?
[PHP]

<?

/*
** Extracts and formats meta tag content
*/

function get_meta_data($url, $searchkey='') {
$data = get_meta_tags($url); // get the meta data in an array
foreach($data as $key => $value) {
if(mb_detect_encoding($value, 'UTF-8, ISO-8859-1', true) != 'ISO-8859-1') { // check whether the content is UTF-8 or ISO-8859-1
$value = utf8_decode($value); // if UTF-8 decode it
}
$value = strtr($value, get_html_translation_table(HTML_ENTITIES)); // mask the content
if($searchkey != '') { // if only one meta tag is in demand e.g. 'description'
if($key == $searchkey) {
$str = $value; // just return the value
}
} else { // all meta tags
$pattern = '/ ¦,/i'; // ' ' or ','
$array = preg_split($pattern, $value, -1, PREG_SPLIT_NO_EMPTY); // split it in an array, so we have the count of words
$str .= '<p><span style="display:block;color:#000000;font-weight:bold;">' . $key . ' <span style="font-weight:normal;">(' . count($array) . ' words ¦ ' . strlen($value) . ' chars)</span></span>' . $value . '</p>'; // format data with count of words and chars
}
}
return $str;
}

$content .= get_meta_data("http://www.amazon.com/");
?>

[/PHP]

Mahabub

11:42 am on Jan 2, 2009 (gmt 0)

10+ Year Member



Dear toplisek,

I read your code. Everything is perfect after then I tried and it read the meta data. Did u print the result. In the above code you did not do so. Please try the below one or if u get any error then please let us know....


<?

/*
** Extracts and formats meta tag content
*/

function get_meta_data($url, $searchkey='') {
$data = get_meta_tags($url); // get the meta data in an array
foreach($data as $key => $value) {
if(mb_detect_encoding($value, 'UTF-8, ISO-8859-1', true) != 'ISO-8859-1') { // check whether the content is UTF-8 or ISO-8859-1
$value = utf8_decode($value); // if UTF-8 decode it
}
$value = strtr($value, get_html_translation_table(HTML_ENTITIES)); // mask the content
if($searchkey != '') { // if only one meta tag is in demand e.g. 'description'
if($key == $searchkey) {
$str = $value; // just return the value
}
} else { // all meta tags
$pattern = '/ ¦,/i'; // ' ' or ','
$array = preg_split($pattern, $value, -1, PREG_SPLIT_NO_EMPTY); // split it in an array, so we have the count of words
$str .= '<p><span style="display:block;color:#000000;font-weight:bold;">' . $key . ' <span style="font-weight:normal;">(' . count($array) . ' words ¦ ' . strlen($value) . ' chars)</span></span>' . $value . '</p>'; // format data with count of words and chars
}
}
return $str;
}

$content .= get_meta_data("http://www.amazon.com/");
echo $content;
?>

Thanks
Mahabub

toplisek

6:38 pm on Jan 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,
1. WORDS:
did you check 1 words for Amazon.
What is actually this value as there are many words...not one
2. TITLE:
There is also:
<title>Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs &amp; more</title>
But it does not read it. How to do it also this?
3. ENCODING VALUE
Currently there is checking whether the content is UTF-8 or ISO-8859-1.

If I would like to make it universal, how to do it as there is not detecting encoding as value like charset=windows-1252", charset=windows-1250"? Is this possible to be achieved?

4. Strange character: &#65533; within text(1 words &#65533; 188 chars)
I have put at the top of PHP also code not to be shown this:
header('Content-Type: text/html; charset=UTF-8');

Mahabub

7:19 pm on Jan 2, 2009 (gmt 0)

10+ Year Member



toplisek,

1.
$pattern = '/[¦,]/i'; // Use this one instead of below one then word counting problem will be solved
$pattern = '/ ¦,/i';

2.
I think Title isn't META tag. Its called title tag. To get the value of title tags you can use the below regex
preg_match('/<title>([^>]*)<\/title>/si',(url_content), $match);

3 or 4.

still I dont have any idea how you can make it universal or about character encoding. I'll try later and if get any then I must let u know.

Thanks
Mahabub

toplisek

7:50 pm on Jan 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1. There is not the same word and double word.These script will us double word in the middle of ,, as one word.

2. How to display title variable and its value?
your code shows text Array as valeu for $match

3. How to separate each meta tag and its Array to separate varaible?

Example: [amazon.com...] has the following meta tags:
description (18 words ¦ 344 chars)
keywords (41 words ¦ 455 chars)

I would like to store value of words and characters for each meta tag.

toplisek

9:57 pm on Jan 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



4. How to extract one meta tag like description from Array values?

toplisek

6:11 am on Jan 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



5. Is cURL the best approach or this is better?

Mahabub

9:54 am on Jan 3, 2009 (gmt 0)

10+ Year Member



Dear toplisek,

[1.] Actually this word doesn't mean traditional word this word mean meta key word which can contain more then one word. Also through net lot of functions available to count words.

[2.] Title tag value is $match[1].

[3.] You can store its meta tag, Number of words , Number of characters in a array like the below way...and can use it as your own way you like

$output[$key]["words"] = count($array);
$output[$key]["character"] = strlen($value);
$output[$key]["value"] = $value;

[4.] To extract one meta tag you have already the options in your function.

$content .= get_meta_data("http://www.amazon.com/","specify the tag name here");

Also you can specyfy array of meta tag like the below way...

<?php

/*
** Extracts and formats meta tag content
*/

function get_meta_data($url, $searchkey='') {
$data = get_meta_tags($url); // get the meta data in an array
foreach($data as $key => $value) {
if(mb_detect_encoding($value, 'UTF-8, ISO-8859-1', true) != 'ISO-8859-1') { // check whether the content is UTF-8 or ISO-8859-1
$value = utf8_decode($value); // if UTF-8 decode it
}
$value = strtr($value, get_html_translation_table(HTML_ENTITIES)); // mask the content
if(!empty($searchkey)) { // if only one meta tag is in demand e.g. 'description'
if(in_array($key,$searchkey)) {
$str .= $value."<br/>"; // just return the value
}
} else { // all meta tags
$pattern = '/ ¦,/i'; // ' ' or ','
$array = preg_split($pattern, $value, -1, PREG_SPLIT_NO_EMPTY); // split it in an array, so we have the count of words
$str .= '<p><span style="display:block;color:#000000;font-weight:bold;">' . $key . ' <span style="font-weight:normal;">(' . count($array) . ' words ¦ ' . strlen($value) . ' chars)</span></span>' . $value . '</p>'; // format data with count of words and chars
}
}
return $str;
}

$searchkey = array("description","keywords");
$content .= get_meta_data("http://www.amazon.com/",$searchkey);
echo $content;
?>

[5.] To get the meta tag data only i think this approach is better. But if you want to get the value of title tag then you need the content of the page for that purpuse i think curl is the best option.

Thanks
Mahabub

toplisek

8:18 pm on Jan 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



6. How to display ARRAY values?

In variable each value of $searchkey = array("description","keywords"); like keywords and also

7. array value:
$output[$key]["words"] = count($array);
$output[$key]["character"] = strlen($value);
$output[$key]["value"] = $value;

8. How to store variable of encoding.
Its issue that it has not name as it is:
meta http-equiv="Content-Type" content="text/html; charset=..." />

9. I have put:
preg_match('/<title>([^>]*)<\/title>/si',(url_content), $match);
echo $match[1];

Please send me more information how to set url_content and how to echo correct value echo $match[1]; as this will not be displayed.

toplisek

10:04 pm on Jan 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have put
$output[$key]["words"] = count($array);
$output[$key]["character"] = strlen($value);
$output[$key]["value"] = $value;
foreach($output[$arraykey] as $key => $value){
echo $value;
}

Why it does not echo key and values?