homepage Welcome to WebmasterWorld Guest from 54.224.202.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Reading a Google web page in PHP
They seem to be blocking fopen()
SixTimesEight




msg:3501250
 2:55 am on Nov 10, 2007 (gmt 0)

When I try to read a page from Google with fopen() I get an error like this:

Warning: fopen(http://www.google.com/search?source=ig&hl=en&rlz=&q=whatever&btnG=Google+Search) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /home/foo/bar/functions.php on line 25

I'm working on a project that fetches pages and parses a few elements from them. It's not critically important that I be able to fetch pages from Google specifically, I was just doing some testing and noticed this error from their pages. That got me thinking that I may end up seeing this problem from other websites in the future.

Here's my code:

$fp = @fopen($url, 'r');
$contents = '';
$data = @fread($fp, 8192);
$contents .= $data;
// some parsing happens here...
@fclose($fp);

Is there another method I could use to read pages from Google or am I just doing something wrong?

 

jatar_k




msg:3501263
 3:38 am on Nov 10, 2007 (gmt 0)

try using cURL, make it look like a user

[php.net...]

depending on what you are doing, watch the number of requests over time

Srirangan




msg:3501524
 3:46 pm on Nov 10, 2007 (gmt 0)

Try this:

<?php
// An example, get a web page into a string. See also file_get_contents().
$html = implode('', file('http://www.example.com/'));
?>

Also, RTFM! :)

vincevincevince




msg:3501530
 4:00 pm on Nov 10, 2007 (gmt 0)

Hello Srirangan, Welcome to WebmasterWorld! [webmasterworld.com] Your method should work the same way as the fopen() method. The problem identified originally is correct, Google do not encourage automated querying of their search engine.

Just changing the user agent used by PHP will be enough I believe but do be aware that repeated or frequent automated access to Google will mean your IP gets blocked.

ini_set('user_agent','Custom Script for example.com');

As I recall, it is only the default PHP user agent which is blocked; you don't need to pretend to be a browser and nor should you do so.

Do remember to pay attention to the robots.txt file for Google which bans your URL explicitly as it starts with /search:
[google.com...]

Google do offer an wide range of APIs:
[code.google.com...]
This is the recommended method for accessing the content and the one which Google allows.

SixTimesEight




msg:3501541
 4:11 pm on Nov 10, 2007 (gmt 0)

As I recall, it is only the default PHP user agent which is blocked...

This does indeed appear to be the case. I changed the user agent and my script is now working as expected.

Srirangan




msg:3501551
 4:19 pm on Nov 10, 2007 (gmt 0)

Woops.. sorry missed that vince.. :o)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved