|Unable to check header of some website|
Receive 400 Bad Request
| 3:17 pm on May 11, 2007 (gmt 0)|
I use the following script to check header of some URLs:
$test_url = 'http://' . HOST_NAME;
$socket = @fsockopen(@gethostbyname(HOST_NAME), 80);
fwrite($socket, "HEAD $test_url HTTP/1.1\r\nHost: " . HOST_NAME . "\r\nConnection: Close\r\n\r\n");
$i = 0;
$header = '';
$s = fgets($socket, 4096);
$header = $header . $s;
if(strcmp($s, "\r\n") == 0 ¦¦ strcmp($s, "\n") == 0)
If I change example.com into <some_other_url.com>, $header == 400 bad request. But
the Server Headers [webmasterworld.com] return 200. What's wrong with my code?
| 3:33 pm on May 11, 2007 (gmt 0)|
The code works for me, but try setting $test_url = "/";
| 3:41 pm on May 11, 2007 (gmt 0)|
It looks like the HOST_NAME and "test_url" are incorrectly constructed. "test_url" should include only the local URL-path info. Specifically, the Host: header should contain only "www.example.com" and the test_url should contain only the server-relative path to the page, starting with at least "/". Example:
|HEAD / HTTP/1.1 |
If you intend to use this code to access many Web sites, I'd like to ask that you add
User-agent: iProgramBot http://www.iProgramBot.com
and provide us with a Web page to explain why you're accessing our sites. Otherwise, I regret that on my sites, you'll always get a 403 response, unless I check your Web page and decide to allow your user-agent. I'd also recommend that you read and follow robots.txt if you intend to fetch multiple URLs from other sites; It's the polite thing to do, and saves you getting added to blacklists.
| 5:18 am on May 12, 2007 (gmt 0)|
No it's not a BOT or something gets site content without permission. (I'm very sensitivity about this, please!) It's a customer who keeps asking why I could not access *his* file and offer the service he bought.
And thank you for your help! It now works and I will add the User-agent line.