Wikipedia URLs

Hi, I need some assistance with a little issue that is has been driving me crazy for a couple of days now.

I am creating a "six degrees of separation" game that involves links from one page to another in wikipedia. The script I'm writing is to verify that the links actually exist in the pages that the contestants post as their answers. The problem is, if there are capital letters in the wikipedia url, it doesn't work unless the contestant posts the exact matching case.

It's a simple copy the url from the address bar, paste it in an input box, click a button to verify scenario, but not all browsers behave the same and some don't include the capital letters in url in the address bar, and the capital letters seem to be required in order for the script to find the page.

I've actually run the script against other url's that aren't wikipedia, mixing the case of the letters and the case seems to make no difference at all.

Here's what I have:


//check whether the url exists or not and validate it

function check_url($url)
{
$check = @fopen($url,"r"); // open the url with fopen
if($check)
 $status = true;

else
 $status = false;
 
return $status;
}

//the following url works perfectly, notice the capital letters.
$url = "http://en.wikipedia.org/wiki/George_Washington";

//however the following url comes back false, even though
//resulting link is perfectly "clickable" and leads
//directly to the page that my script says doesn't exist.

//$url="http://en.wikipedia.org/wiki/george_washington";

if(check_url($url))
{
 echo "<a href=$url>$url</a> is a <b>valid</b> URL";
}
else
{
 echo "<a href=$url>$url</a> is an <b>invalid</b> URL";
}
?>

Obviously there's much more to the game than this, once it's able to open the file, there's other code to check to make sure the anchor exists, etc. This code, however is sufficient to show what's happening.

I'm sure it's something simple that I'm over looking, but after two days, I have given up.

Any help would be appreciated.

Thanks.

P.S. - I can't use any add-on classes or anything that would require installing any packages (CAKE, PEAR), client says no to "extra stuff".

http://en.wikipedia.org/wiki/george_washington
GET /wiki/george_washington HTTP/1.1
Host: en.wikipedia.org
.
.
.
HTTP/1.0 301 Moved Permanently
Date: Thu, 24 Feb 2011 16:55:40 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Vary: Accept-Encoding,Cookie
Last-Modified: Thu, 24 Feb 2011 16:55:40 GMT
Location: http://en.wikipedia.org/wiki/George_washington
.
.
.

<?php
 
//$url = "http://en.wikipedia.org/wiki/George_Washington"; //correct character case url
$url="http://en.wikipedia.org/wiki/george_washington";
 
$f = @file_get_contents($url);
if ($f !== false) {
    echo $f;
}
 
?>

<?php
 
//$url = "http://en.wikipedia.org/wiki/George_Washington"; //correct character case url
$url="http://en.wikipedia.org/wiki/george_washington";
 
$f = @file_get_contents($url);
if ($f !== false) {
    echo $f;
}
 
?>

<?php
//>
 
//$url = 'http://en.wikipedia.org/wiki/George_Washington'; //correct character case url
$url = 'http://en.wikipedia.org/wiki/george_washington';
 
$headers = array(
    'User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)',
    'Accept: text/plain,text/html;q=0.9,*/*;q=0.8',
    'Accept-Language: en-us,en;q=0.5',
    'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
    'Keep-Alive: 115',
    'Connection: keep-alive',
    'Cache-Control: max-age=0'
);
 
$opts = array(
    'http' => array(
        'header' => implode("\r\n", $headers)
    )
);
 
$context = stream_context_create($opts);
 
$f = @file_get_contents($url, false, $context);
//$f = @file_get_contents($url);
 
if ($f !== false) {
    echo $f;
} else {
    echo '<pre>';
    print_r($http_response_header);
    echo '</pre>';
}
 
?>

<?php
 
if (!defined('PHP_VERSION_ID')) {
    $version = explode('.', PHP_VERSION);
    define('PHP_VERSION_ID', ($version[0] * 10000 + $version[1] * 100 + $version[2]));
}
 
function stream_notification_callback($notification_code, $severity, $message, $message_code, $bytes_transferred, $bytes_max) {
    switch($notification_code) {
        case STREAM_NOTIFY_RESOLVE:
        case STREAM_NOTIFY_AUTH_REQUIRED:
        case STREAM_NOTIFY_COMPLETED:
        case STREAM_NOTIFY_FAILURE:
        case STREAM_NOTIFY_AUTH_RESULT:
            var_dump($notification_code, $severity, $message, $message_code, $bytes_transferred, $bytes_max);
            /* Ignore */
            break;
        case STREAM_NOTIFY_REDIRECTED:
            echo "Being redirected to: ", $message;
            break;
        case STREAM_NOTIFY_CONNECT:
            echo "Connected...";
            break;
        case STREAM_NOTIFY_FILE_SIZE_IS:
            echo "Got the filesize: ", $bytes_max;
            break;
        case STREAM_NOTIFY_MIME_TYPE_IS:
            echo "Found the mime-type: ", $message;
            break;
        case STREAM_NOTIFY_PROGRESS:
            echo "Made some progress, downloaded ", $bytes_transferred, " so far";
            break;
    }
    echo "\n";
}
 
//$url = "http://en.wikipedia.org/wiki/George_Washington"; //correct character case url
$url="http://en.wikipedia.org/wiki/george_washington";
 
$headers = array(
    'User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)',
    'Accept: text/plain,text/html;q=0.9,*/*;q=0.8',
    'Accept-Language: en-us,en;q=0.5',
    'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
    'Keep-Alive: 115',
    'Connection: keep-alive',
    'Cache-Control: max-age=0'
);
 
$opts = array(
    'http' => array(
        'header' => implode("\r\n", $headers)
    )
);
 
$context = stream_context_create($opts);
 
if (PHP_VERSION_ID >= 50200) { //support for the "notification" option callback started with php 5.2
    stream_context_set_params($context, array("notification" => "stream_notification_callback"));
} else {
    echo "Not able to track stream progress, php version not >= 5.2\r\n";
}
 
$f = @file_get_contents($url, false, $context);
 
if ($f !== false) {
    echo '<p>File size: '.strlen($f).'</p>';
    echo '<pre>'.htmlentities($f).'</pre>';
} else {
    echo '<pre>';
    print_r($http_response_header);
    echo '</pre>';
}
 
?>

Wikipedia URLs

dashrockstone

rocknbil

dashrockstone

coopster

dashrockstone

astupidname

astupidname

astupidname

dashrockstone

astupidname

penders

astupidname

dashrockstone

coopster

dashrockstone

astupidname

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week