homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

Reading a Google web page in PHP
They seem to be blocking fopen()

 2:55 am on Nov 10, 2007 (gmt 0)

When I try to read a page from Google with fopen() I get an error like this:

Warning: fopen(http://www.google.com/search?source=ig&hl=en&rlz=&q=whatever&btnG=Google+Search) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /home/foo/bar/functions.php on line 25

I'm working on a project that fetches pages and parses a few elements from them. It's not critically important that I be able to fetch pages from Google specifically, I was just doing some testing and noticed this error from their pages. That got me thinking that I may end up seeing this problem from other websites in the future.

Here's my code:

$fp = @fopen($url, 'r');
$contents = '';
$data = @fread($fp, 8192);
$contents .= $data;
// some parsing happens here...

Is there another method I could use to read pages from Google or am I just doing something wrong?



 3:38 am on Nov 10, 2007 (gmt 0)

try using cURL, make it look like a user


depending on what you are doing, watch the number of requests over time


 3:46 pm on Nov 10, 2007 (gmt 0)

Try this:

// An example, get a web page into a string. See also file_get_contents().
$html = implode('', file('http://www.example.com/'));

Also, RTFM! :)


 4:00 pm on Nov 10, 2007 (gmt 0)

Hello Srirangan, Welcome to WebmasterWorld! [webmasterworld.com] Your method should work the same way as the fopen() method. The problem identified originally is correct, Google do not encourage automated querying of their search engine.

Just changing the user agent used by PHP will be enough I believe but do be aware that repeated or frequent automated access to Google will mean your IP gets blocked.

ini_set('user_agent','Custom Script for example.com');

As I recall, it is only the default PHP user agent which is blocked; you don't need to pretend to be a browser and nor should you do so.

Do remember to pay attention to the robots.txt file for Google which bans your URL explicitly as it starts with /search:

Google do offer an wide range of APIs:
This is the recommended method for accessing the content and the one which Google allows.


 4:11 pm on Nov 10, 2007 (gmt 0)

As I recall, it is only the default PHP user agent which is blocked...

This does indeed appear to be the case. I changed the user agent and my script is now working as expected.


 4:19 pm on Nov 10, 2007 (gmt 0)

Woops.. sorry missed that vince.. :o)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved