Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Unable to check header of some website

Receive 400 Bad Request



3:17 pm on May 11, 2007 (gmt 0)

10+ Year Member

I use the following script to check header of some URLs:

define('HOST_NAME', 'www.example.com');
$test_url = 'http://' . HOST_NAME;

$socket = @fsockopen(@gethostbyname(HOST_NAME), 80);
fwrite($socket, "HEAD $test_url HTTP/1.1\r\nHost: " . HOST_NAME . "\r\nConnection: Close\r\n\r\n");

$i = 0;
$header = '';
$s = fgets($socket, 4096);
$header = $header . $s;
if(strcmp($s, "\r\n") == 0 ¦¦ strcmp($s, "\n") == 0)
echo $header;

If I change example.com into <some_other_url.com>, $header == 400 bad request. But
the Server Headers [webmasterworld.com] return 200. What's wrong with my code?


3:33 pm on May 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

The code works for me, but try setting $test_url = "/";


3:41 pm on May 11, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

It looks like the HOST_NAME and "test_url" are incorrectly constructed. "test_url" should include only the local URL-path info. Specifically, the Host: header should contain only "www.example.com" and the test_url should contain only the server-relative path to the page, starting with at least "/". Example:

Host: www.example.com

If you intend to use this code to access many Web sites, I'd like to ask that you add

User-agent: iProgramBot http://www.iProgramBot.com

and provide us with a Web page to explain why you're accessing our sites. Otherwise, I regret that on my sites, you'll always get a 403 response, unless I check your Web page and decide to allow your user-agent. I'd also recommend that you read and follow robots.txt if you intend to fetch multiple URLs from other sites; It's the polite thing to do, and saves you getting added to blacklists.



5:18 am on May 12, 2007 (gmt 0)

10+ Year Member

No it's not a BOT or something gets site content without permission. (I'm very sensitivity about this, please!) It's a customer who keeps asking why I could not access *his* file and offer the service he bought.

And thank you for your help! It now works and I will add the User-agent line.


Featured Threads

Hot Threads This Week

Hot Threads This Month