homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

Grabbing 2 parts of a remote webpage

Msg#: 4577734 posted 2:32 pm on May 25, 2013 (gmt 0)

I created a Google CSE and did not like that it was limited to 100 results. So now I figured out how to search multiple sites on Google and I want to "rip" the results and the link to the next set of results from the search. I figured out what elements to rip. #pnnext and #rso

So how do I do this?



WebmasterWorld Senior Member 5+ Year Member

Msg#: 4577734 posted 11:08 pm on May 26, 2013 (gmt 0)

You could use cURL to get the raw content, and then use a fairly simple regular expression to split out just the #pnnext and #rso elements.

Something like this should do you. Please bear in mind that this has been typed on the fly, and if I was sensible I would have been in bed a while ago, so I offfer no guaruntee that this will work out the box.


// Get the contents of your Google search. Define $url as wherever you're querying
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $url,
$page_contents = curl_exec($ch);

// Get the search results out of the ol#rso
preg_match('/<ol[^>]+id="rso"[^>]*>((?:<li.*?<\/li>)+)</ol>/ms', $page_contents, $results);
// echo $results[1]; // Should give you the content of #rso

// Get the URL from the a#pnnext
preg_match('/<a[^>]+id="pnnext"[^>]*href="([^"]+])"/', $page_contents, $next_url);
// echo $next_url[1]; // Should be the next page link


Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved