Forum Moderators: coopster
<?php
$context=array('http' => array ('header'=> 'Range: bytes=1024-', ),);
$xcontext = stream_context_create($context);
$test=file_get_contents("http://babelfish.yahoo.com/_my_site_translation_stuff)",FALSE,$xcontext);
echo $test ;
?>
However, this doesn't work. Any ideas on how this could be possible?
<?php
$context=array('http' => array ('header'=> 'Range: bytes=1024-', ),);
$xcontext = stream_context_create($context);
$test=file_get_contents("http://babelfish.yahoo.com/translate_url?doit=done&tt=url&intl=1&fr=bf-home&trurl=http%3A%2F%2Fgoogle.com&lp=en_fr&btnTrUrl=Translate)",FALSE,$xcontext);
echo $test ;
?>
Because of the way babelfish and other translators seem to work, file_get_contents doesn't work. It just returns a 403 Error.
Note that this scraper works for regular pages, just not for scraping a page that has just been translated by another page. That's what I'm trying to do.
However, the 403 error you are getting may be unavoidable and it may still be there using curl. For those unaware, 403 is forbidden [w3.org], and they *probably* have tools in place to stop you from scraping results unless you are actually on the site.
Essentially, whether you know it or not, you are stealing content from another site, and we all know where that leads . . .
If this turns out to be the case, just link to the site: "Translate this page" and you're done.