Welcome to WebmasterWorld Guest from 54.234.153.197

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Why does my curl loop breaks after 100 cycles?

     
4:37 pm on Mar 20, 2014 (gmt 0)

Preferred Member from GB 

10+ Year Member Top Contributors Of The Month

joined:July 25, 2005
posts:389
votes: 10


Hi,

I've got a series of scripts one of which communicates with an external source via curl.

Let's say it has to download 1200 snippets of data every day. There's a script that populates a database with the 1200 urls that will have to be visited today.

Then a simple curl function loops through those urls and saves data back into the database.

Because there are several services communicating with the data source, I've introduced a 10 second pause between the requests.

Unfortunately, after looping through 70 - 100 urls, the script suddenly stops. It's not the case of the source blocking my connections. So, why is it breaking and how do I make sure it loops through the entire 1200 urls?


<?php
set_time_limit(0);
function get_data($url)
{
blah blah blah - a simple curl function...
}

$database = "***";
mysql_connect("***", "***", "***") or die("Error connecting to database: ".mysql_error());
mysql_select_db($database) or die(mysql_error());

$sql = mysql_query("SELECT * FROM `list`) or die(mysql_error());


//while loop here
while($row = mysql_fetch_array($sql))
{//opens while brackets
$url = $row['url'];
$id = $row['id'];
$data = get_data($url);
$clean_data = mysql_real_escape_string($data);

$sql2 = "UPDATE `list` SET raw='$clean_data' WHERE id='$id'";
$result2 = mysql_query($sql2);
sleep(10);
flush();
}
?>

Maybe there's a way to detect the loop has stopped and pick it up from the next empty table row?
Thanks
5:35 pm on Mar 20, 2014 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:4842
votes: 1


Are you doing this via the command line? A browser can timeout too if you're not.

Also make you you have a timeout in your get_data curl settings.

Your script doesn't show whether you use standalone curl or PHP's library for it. If you're using standalone make sure to pass the flag --globoff as URLs like http://www.example.com/[1-1000].htm would make curl fetch the page 1000 times over.

Add some error checking and that may give you a better idea.
12:02 pm on Mar 24, 2014 (gmt 0)

Preferred Member from GB 

10+ Year Member Top Contributors Of The Month

joined:July 25, 2005
posts:389
votes: 10


Thank you for the answer. I'm using a browser, you're right. It times out. I have tried to increase the timeout via regedit but it hasn't changed anything, which makes you wonder why there are those values in regedit if they don't make a difference :)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members