| Unable to check header of some website Receive 400 Bad Request |
iProgram

msg:3337075 | 3:17 pm on May 11, 2007 (gmt 0) | I use the following script to check header of some URLs: define('HOST_NAME', 'www.example.com'); $test_url = 'http://' . HOST_NAME; $socket = @fsockopen(@gethostbyname(HOST_NAME), 80); fwrite($socket, "HEAD $test_url HTTP/1.1\r\nHost: " . HOST_NAME . "\r\nConnection: Close\r\n\r\n"); $i = 0; $header = ''; while($i<20) { $s = fgets($socket, 4096); $header = $header . $s; if(strcmp($s, "\r\n") == 0 ¦¦ strcmp($s, "\n") == 0) { break; } $i++; } fclose($socket); echo $header; |
| If I change example.com into <some_other_url.com>, $header == 400 bad request. But the Server Headers [webmasterworld.com] return 200. What's wrong with my code?
|
mcavic

msg:3337100 | 3:33 pm on May 11, 2007 (gmt 0) | The code works for me, but try setting $test_url = "/";
|
jdMorgan

msg:3337113 | 3:41 pm on May 11, 2007 (gmt 0) | It looks like the HOST_NAME and "test_url" are incorrectly constructed. "test_url" should include only the local URL-path info. Specifically, the Host: header should contain only "www.example.com" and the test_url should contain only the server-relative path to the page, starting with at least "/". Example: HEAD / HTTP/1.1 Host: www.example.com |
| If you intend to use this code to access many Web sites, I'd like to ask that you add User-agent: iProgramBot http://www.iProgramBot.com and provide us with a Web page to explain why you're accessing our sites. Otherwise, I regret that on my sites, you'll always get a 403 response, unless I check your Web page and decide to allow your user-agent. I'd also recommend that you read and follow robots.txt if you intend to fetch multiple URLs from other sites; It's the polite thing to do, and saves you getting added to blacklists. Jim
|
iProgram

msg:3337816 | 5:18 am on May 12, 2007 (gmt 0) | No it's not a BOT or something gets site content without permission. (I'm very sensitivity about this, please!) It's a customer who keeps asking why I could not access *his* file and offer the service he bought. And thank you for your help! It now works and I will add the User-agent line.
|
|
|