Forum Moderators: coopster

Message Too Old, No Replies

data parse

data parse

         

moussa854

7:48 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



I am triying to get data from a website but I get the "Please Enable Cookies"

my code:
$url = 'http://mysite.com';

$data = LoadCURLPage($url);

$string_one = '<title>';
$string_two = '</title>';

$info = extract_unit($data, $string_one, $string_two);

echo 'title '.$info;

any suggestion to bypass the coockies

janharders

7:51 pm on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



parse the cookies the server sent you on a previous request and pass them along.

moussa854

7:54 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



I need some help with this please

janharders

7:55 pm on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you'll have to look at the http-headers in the response. Look for
Set-Cookie:
and make your way from there.

moussa854

8:05 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



Here is the full code: I will appreciate the help:

<?php
function LoadCURLPage($url, $agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4)
Gecko/20030624 Netscape/7.1 (ax)', $cookie = '', $referer = '', $post_fields = '', $return_transfer = 1, $follow_location = 1, $ssl = '', $curlopt_header = 0)
{
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

if($ssl)
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
}

curl_setopt ($ch, CURLOPT_HEADER, $curlopt_header);

if($agent)
{
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
}

if($post_fields)
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
}

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

if($referer)
{
curl_setopt($ch, CURLOPT_REFERER, $referer);
}

if($cookie)
{
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
}

$result = curl_exec ($ch);

curl_close ($ch);

return $result;
}

function extract_unit($string, $start, $end)
{
$pos = stripos($string, $start);

$str = substr($string, $pos);

$str_two = substr($str, strlen($start));

$second_pos = stripos($str_two, $end);

$str_three = substr($str_two, 0, $second_pos);

$unit = trim($str_three); // remove whitespaces

return $unit;
}

$url = 'http:www.mysite.com';

$data = LoadCURLPage($url);

$string_one = '<title>';
$string_two = '</title>';

$info = extract_unit($data, $string_one, $string_two);

echo 'title '.$info;
?>

punisa

9:38 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



Ok, the code is here. Can you explain what is its purpose? As I figure from bits, you are trying to extract the TITLE tag from some site on a different domain, correct?

There might be much easier solution to accomplish this task, if that is the case.

moussa854

10:17 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



yes that what I need but in addition to the title I will get some other data as well from the code source

punisa

9:22 am on May 1, 2009 (gmt 0)

10+ Year Member



Aha, I get it. Ok, how about this bit of code:

<?php
function fetchsite($path){
$file = fopen($path, "r");
if (!$file){
exit("The was a connection error!");
}
$data = '';
while (!feof($file)){
$data .= fgets($file, 1024);
}
return $data;
}
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
// Put the full URL you want to fetch
$url = "http://www.site-i-wanna-read.com/";
$fetched = fetchsite($url);
// Extract what data you want
$title = get_string_between($fetched,'<title>','</title>');
echo $title;
?>

You can use the improvised "get_string_between" function and extract the code you want : )
Hope it helps !