Forum Moderators: coopster
//check whether the url exists or not and validate it
function check_url($url)
{
$check = @fopen($url,"r"); // open the url with fopen
if($check)
$status = true;
else
$status = false;
return $status;
}
//the following url works perfectly, notice the capital letters.
$url = "http://en.wikipedia.org/wiki/George_Washington";
//however the following url comes back false, even though
//resulting link is perfectly "clickable" and leads
//directly to the page that my script says doesn't exist.
//$url="http://en.wikipedia.org/wiki/george_washington";
if(check_url($url))
{
echo "<a href=$url>$url</a> is a <b>valid</b> URL";
}
else
{
echo "<a href=$url>$url</a> is an <b>invalid</b> URL";
}
?>
http://en.wikipedia.org/wiki/george_washington
GET /wiki/george_washington HTTP/1.1
Host: en.wikipedia.org
.
.
.
HTTP/1.0 301 Moved Permanently
Date: Thu, 24 Feb 2011 16:55:40 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Vary: Accept-Encoding,Cookie
Last-Modified: Thu, 24 Feb 2011 16:55:40 GMT
Location: http://en.wikipedia.org/wiki/George_washington
.
.
.
<?php
//>
//$url = 'http://en.wikipedia.org/wiki/George_Washington'; //correct character case url
$url = 'http://en.wikipedia.org/wiki/george_washington';
$headers = array(
'User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)',
'Accept: text/plain,text/html;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive',
'Cache-Control: max-age=0'
);
$opts = array(
'http' => array(
'header' => implode("\r\n", $headers)
)
);
$context = stream_context_create($opts);
$f = @file_get_contents($url, false, $context);
//$f = @file_get_contents($url);
if ($f !== false) {
echo $f;
} else {
echo '<pre>';
print_r($http_response_header);
echo '</pre>';
}
?>
...I assume that file_get_contents will follow 301's if received.
Yes, I think so. Although I believe a feature of URL open wrappers, not just file_get_contents()? But then it should work with fopen() as well? Or should it?
Is there a limit to how many times it redirects (as you can set with cURL)?
<?php
if (!defined('PHP_VERSION_ID')) {
$version = explode('.', PHP_VERSION);
define('PHP_VERSION_ID', ($version[0] * 10000 + $version[1] * 100 + $version[2]));
}
function stream_notification_callback($notification_code, $severity, $message, $message_code, $bytes_transferred, $bytes_max) {
switch($notification_code) {
case STREAM_NOTIFY_RESOLVE:
case STREAM_NOTIFY_AUTH_REQUIRED:
case STREAM_NOTIFY_COMPLETED:
case STREAM_NOTIFY_FAILURE:
case STREAM_NOTIFY_AUTH_RESULT:
var_dump($notification_code, $severity, $message, $message_code, $bytes_transferred, $bytes_max);
/* Ignore */
break;
case STREAM_NOTIFY_REDIRECTED:
echo "Being redirected to: ", $message;
break;
case STREAM_NOTIFY_CONNECT:
echo "Connected...";
break;
case STREAM_NOTIFY_FILE_SIZE_IS:
echo "Got the filesize: ", $bytes_max;
break;
case STREAM_NOTIFY_MIME_TYPE_IS:
echo "Found the mime-type: ", $message;
break;
case STREAM_NOTIFY_PROGRESS:
echo "Made some progress, downloaded ", $bytes_transferred, " so far";
break;
}
echo "\n";
}
//$url = "http://en.wikipedia.org/wiki/George_Washington"; //correct character case url
$url="http://en.wikipedia.org/wiki/george_washington";
$headers = array(
'User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)',
'Accept: text/plain,text/html;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive',
'Cache-Control: max-age=0'
);
$opts = array(
'http' => array(
'header' => implode("\r\n", $headers)
)
);
$context = stream_context_create($opts);
if (PHP_VERSION_ID >= 50200) { //support for the "notification" option callback started with php 5.2
stream_context_set_params($context, array("notification" => "stream_notification_callback"));
} else {
echo "Not able to track stream progress, php version not >= 5.2\r\n";
}
$f = @file_get_contents($url, false, $context);
if ($f !== false) {
echo '<p>File size: '.strlen($f).'</p>';
echo '<pre>'.htmlentities($f).'</pre>';
} else {
echo '<pre>';
print_r($http_response_header);
echo '</pre>';
}
?>
Odd thing is it will tell you the filesize is 408939 but the final "Made some progress" message will be "downloaded 817878 so far" which would be actually double the file size. I don't understand that, but oh well...
http://en.wikipedia.org/wiki/George_washington
http://en.wikipedia.org/wiki/George_Washington
<link rel="canonical" href="/wiki/George_Washington" />
http://en.wikipedia.org/wiki/GEORGE_washington
Playing around a bit more, this one returns a 404 Not Found:
"http://en.wikipedia.org/wiki/GEORGE_washington"
Not for me ... File size: 408854. Can you reproduce that every time?