Forum Moderators: coopster

Message Too Old, No Replies

What HTTP USER AGENT does 'fopen' or 'fsockopen' produce

I want a script on one server to grab files on another server (mine)

         

tigertom

10:52 pm on Jul 7, 2007 (gmt 0)

10+ Year Member



What HTTP_USER_AGENT (if any) does a PHP script grabbing a file on a remote server using 'fopen' or 'fsockopen' produce?

Reason: I've banned bad bots. Now I want to 'scrape' my own pages on a server of mine, but I get a 403 error (lol).

I need to know what to look for. I could set up a special file/script to find this out, but if someone knows offhand ...

phparion

5:31 am on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



what is HTTP_USER_AGENT? it is used to retrieve information about the users browser and computer operating system. so you will always get your browser and your OS values there as you are the user in this case.

now there are two conditions,

1) you do not own the other server and in fact you are writing bot for some website and they are blocking you based on your user agent etc ... in this case it is not an easy job now to pass their check as they have tracked your different info e.g static ip etc etc but there are subways about which I cant tell you :)

2) if you own that server and it is your own website then it is five minutes solution, instead of scrapping directly do it in a sophisticated way. you must have database access on that server as you own that so make a script which insert values directly into the database and from your server B , calling server for scrapping, call that script directly and send it values or an xml packet or some rss feed and it will do the job.

if you own that server you should not waste your server's resources for typical scrapping and do it directly with a script and db interface that will be many times faster and efficient than typical scrapping.

RonPK

8:49 am on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



phparion, there is no direct contact between the user-agent that calls the script with fopen/fsockopen, and the page fetched from the other server.

I ran a quick test using a phpinfo-script on my remote server and scripts from the PHP manual with fopen and fsockopen on my local Apache/Windows server.

$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30); 
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET /phpinfo.php HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}

No user agent string is reported (or sent).

Now add a line like:

$out .= "User-agent: myBrowser 0.1\r\n";

phpinfo will report

myBrowser 0.1
as
$_SERVER['HTTP_USER_AGENT']
.

phparion

12:40 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I already know this, in fact this is what we do while using php-cURL too , we mention the user agent in the remote call... and actually this was my point that he will always get whatever he mentions in his remote site call...

jatar_k

1:17 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



your best bet tigertom is to get your own real user agent string via a small php script and use that in your calls. Another option is to set a passthru for yourself in your bad bot blocker.

Use something unique and just let that through if it is found, it also makes it easier to track in your logs.