Msg#: 4577734 posted 2:32 pm on May 25, 2013 (gmt 0)
I created a Google CSE and did not like that it was limited to 100 results. So now I figured out how to search multiple sites on Google and I want to "rip" the results and the link to the next set of results from the search. I figured out what elements to rip. #pnnext and #rso
Msg#: 4577734 posted 11:08 pm on May 26, 2013 (gmt 0)
You could use cURL to get the raw content, and then use a fairly simple regular expression to split out just the #pnnext and #rso elements.
Something like this should do you. Please bear in mind that this has been typed on the fly, and if I was sensible I would have been in bed a while ago, so I offfer no guaruntee that this will work out the box.
// Get the contents of your Google search. Define $url as wherever you're querying $ch = curl_init(); curl_setopt_array($ch, array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => 1, CURLOPT_CONNECTTIMEOUT => 5 )); $page_contents = curl_exec($ch); curl_close($ch);
// Get the search results out of the ol#rso preg_match('/<ol[^>]+id="rso"[^>]*>((?:<li.*?<\/li>)+)</ol>/ms', $page_contents, $results); // echo $results; // Should give you the content of #rso
// Get the URL from the a#pnnext preg_match('/<a[^>]+id="pnnext"[^>]*href="([^"]+])"/', $page_contents, $next_url); // echo $next_url; // Should be the next page link