Msg#: 3501248 posted 2:55 am on Nov 10, 2007 (gmt 0)
When I try to read a page from Google with fopen() I get an error like this:
Warning: fopen(http://www.google.com/search?source=ig&hl=en&rlz=&q=whatever&btnG=Google+Search) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /home/foo/bar/functions.php on line 25
I'm working on a project that fetches pages and parses a few elements from them. It's not critically important that I be able to fetch pages from Google specifically, I was just doing some testing and noticed this error from their pages. That got me thinking that I may end up seeing this problem from other websites in the future.
Msg#: 3501248 posted 4:00 pm on Nov 10, 2007 (gmt 0)
Hello Srirangan, Welcome to WebmasterWorld! [webmasterworld.com] Your method should work the same way as the fopen() method. The problem identified originally is correct, Google do not encourage automated querying of their search engine.
Just changing the user agent used by PHP will be enough I believe but do be aware that repeated or frequent automated access to Google will mean your IP gets blocked.
ini_set('user_agent','Custom Script for example.com');
As I recall, it is only the default PHP user agent which is blocked; you don't need to pretend to be a browser and nor should you do so.
Do remember to pay attention to the robots.txt file for Google which bans your URL explicitly as it starts with /search: [google.com...]
Google do offer an wide range of APIs: [code.google.com...] This is the recommended method for accessing the content and the one which Google allows.