homepage Welcome to WebmasterWorld Guest from 54.242.18.232
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Unable to check header of some website
Receive 400 Bad Request
iProgram




msg:3337075
 3:17 pm on May 11, 2007 (gmt 0)

I use the following script to check header of some URLs:

define('HOST_NAME', 'www.example.com');
$test_url = 'http://' . HOST_NAME;

$socket = @fsockopen(@gethostbyname(HOST_NAME), 80);
fwrite($socket, "HEAD $test_url HTTP/1.1\r\nHost: " . HOST_NAME . "\r\nConnection: Close\r\n\r\n");

$i = 0;
$header = '';
while($i<20)
{
$s = fgets($socket, 4096);
$header = $header . $s;
if(strcmp($s, "\r\n") == 0 ¦¦ strcmp($s, "\n") == 0)
{
break;
}
$i++;
}
fclose($socket);
echo $header;

If I change example.com into <some_other_url.com>, $header == 400 bad request. But
the Server Headers [webmasterworld.com] return 200. What's wrong with my code?

 

mcavic




msg:3337100
 3:33 pm on May 11, 2007 (gmt 0)

The code works for me, but try setting $test_url = "/";

jdMorgan




msg:3337113
 3:41 pm on May 11, 2007 (gmt 0)

It looks like the HOST_NAME and "test_url" are incorrectly constructed. "test_url" should include only the local URL-path info. Specifically, the Host: header should contain only "www.example.com" and the test_url should contain only the server-relative path to the page, starting with at least "/". Example:

HEAD / HTTP/1.1
Host: www.example.com

If you intend to use this code to access many Web sites, I'd like to ask that you add

User-agent: iProgramBot http://www.iProgramBot.com

and provide us with a Web page to explain why you're accessing our sites. Otherwise, I regret that on my sites, you'll always get a 403 response, unless I check your Web page and decide to allow your user-agent. I'd also recommend that you read and follow robots.txt if you intend to fetch multiple URLs from other sites; It's the polite thing to do, and saves you getting added to blacklists.

Jim

iProgram




msg:3337816
 5:18 am on May 12, 2007 (gmt 0)

No it's not a BOT or something gets site content without permission. (I'm very sensitivity about this, please!) It's a customer who keeps asking why I could not access *his* file and offer the service he bought.

And thank you for your help! It now works and I will add the User-agent line.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved