Forum Moderators: coopster

Message Too Old, No Replies

Make a script break out of loop if not responding

         

dermotirl

10:40 am on Jul 26, 2006 (gmt 0)

10+ Year Member



I have a script that scrapes content from a number of partner sites, this works fine but I have a problem if one of the site's does not respond or if it take a very long time to respond.

This long response time causes to entire script to time out but what I want it to do it just move onto the next site after 5seconds.

What is the best ay to do this, it it possible to use set_time_limit() or will this cause the entine script to time out. Any help would be appreciated.

coopster

11:01 pm on Jul 26, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



set_time_limit is for script execution time, not the stream timeout. Maybe you should write up your own quick little function which opens it's own socket so you can stream_set_timeout() [php.net]?

dermotirl

11:38 am on Aug 3, 2006 (gmt 0)

10+ Year Member



Hi coopster

Thanks for pointing me in what i think is the right direction, i have not heard of stream_set_timeout until now, but could you explain it a little further there is not much information out there.

This is what I want to do, i have a function that gets the content of a number of webpage's one at a time.

//execute this function 3 time's, i have 3 $url's
function get_data ($url){
$ch = curl_init();
$agent = "Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0";
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);

ob_start();
curl_exec ($ch);
curl_close ($ch);
$data = ob_get_contents();

ob_end_clean();
return $data;
}

Sometimes the webpage does not respond or takes a long time to respond, which i think is this line curl_exec ($ch);

If the first webpage does not respond after 5 seconds i want it to stop trying to get the content and move onto the next url. Can stream_set_timeout do this or will it stop executing the entire loop, and if so should i put stream_set_timeout before or after the line curl_exec ($ch); or where.

Thanks for your help so far and any more would be greatly appreciated.

zCat

12:00 pm on Aug 3, 2006 (gmt 0)

10+ Year Member



Just wondering: why do you set the user-agent as
Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0
rather than something more informative such as "
dermotirl's bot; http://example.com
"? If nothing else this would help your "partners" who are being "scraped" to filter your hits from their statistics.

dermotirl

1:20 pm on Aug 3, 2006 (gmt 0)

10+ Year Member



I wasn't aware that you could set the user-agent to something such as "dermotirl's bot; http://example.com". I will change it. Thanks for the tip.

zCat

1:50 pm on Aug 3, 2006 (gmt 0)

10+ Year Member



Pleasure :-). (I spend too much time looking at Apache logs weeding out fishy-looking user-agent names; the more information contained, the lower the risk that it will get blocked)