Welcome to WebmasterWorld Guest from 54.145.235.72

Forum Moderators: coopster & jatar k

Grabbing 2 parts of a remote webpage

   
2:32 pm on May 25, 2013 (gmt 0)



I created a Google CSE and did not like that it was limited to 100 results. So now I figured out how to search multiple sites on Google and I want to "rip" the results and the link to the next set of results from the search. I figured out what elements to rip. #pnnext and #rso

So how do I do this?
11:08 pm on May 26, 2013 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



You could use cURL to get the raw content, and then use a fairly simple regular expression to split out just the #pnnext and #rso elements.

Something like this should do you. Please bear in mind that this has been typed on the fly, and if I was sensible I would have been in bed a while ago, so I offfer no guaruntee that this will work out the box.

<?php

// Get the contents of your Google search. Define $url as wherever you're querying
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_CONNECTTIMEOUT => 5
));
$page_contents = curl_exec($ch);
curl_close($ch);

// Get the search results out of the ol#rso
preg_match('/<ol[^>]+id="rso"[^>]*>((?:<li.*?<\/li>)+)</ol>/ms', $page_contents, $results);
// echo $results[1]; // Should give you the content of #rso

// Get the URL from the a#pnnext
preg_match('/<a[^>]+id="pnnext"[^>]*href="([^"]+])"/', $page_contents, $next_url);
// echo $next_url[1]; // Should be the next page link

?>
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month