Forum Moderators: coopster
Reason: I've banned bad bots. Now I want to 'scrape' my own pages on a server of mine, but I get a 403 error (lol).
I need to know what to look for. I could set up a special file/script to find this out, but if someone knows offhand ...
now there are two conditions,
1) you do not own the other server and in fact you are writing bot for some website and they are blocking you based on your user agent etc ... in this case it is not an easy job now to pass their check as they have tracked your different info e.g static ip etc etc but there are subways about which I cant tell you :)
2) if you own that server and it is your own website then it is five minutes solution, instead of scrapping directly do it in a sophisticated way. you must have database access on that server as you own that so make a script which insert values directly into the database and from your server B , calling server for scrapping, call that script directly and send it values or an xml packet or some rss feed and it will do the job.
if you own that server you should not waste your server's resources for typical scrapping and do it directly with a script and db interface that will be many times faster and efficient than typical scrapping.
I ran a quick test using a phpinfo-script on my remote server and scripts from the PHP manual with fopen and fsockopen on my local Apache/Windows server.
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET /phpinfo.php HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
No user agent string is reported (or sent).
Now add a line like:
$out .= "User-agent: myBrowser 0.1\r\n";
phpinfo will report
myBrowser 0.1as
$_SERVER['HTTP_USER_AGENT'].