Welcome to WebmasterWorld Guest from 3.81.29.226

Forum Moderators: phranque

if cURL doesn't respond in time?

     
6:14 pm on Nov 9, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


I'm having an unexpected issue, and I'd be interested in some feedback. This isn't a live page yet, so there's no traffic on any of these except for me.

I'm curling RSS feeds for news headlines. I run the same script on my homepage twice, once for national news and once for local. I use jQuery and Ajax to fetch them, like so:

<script>
var newsPath = home + '/news.php?feed=national',
newsLocalPath = home + '/news.php';

$(function() {
if ($('#news_national').length)
natNews = setInterval("$('#news_national').ajax(newsPath)", 900000); // 15 min; 15 * 60 * 1000

if ($('#news_local').length)
locNews = setInterval("$('#news_local').ajax(newsLocalPath)", 3420000); // 57 min
});
</script>

<div id="news_national"></div>
<script>
$('#news_national').ajax(newsNationalPath);
</script>

<div id="news_local"></div>
<script>
setTimeout(function() {
$('#news_local').ajax(newsLocalPath)
}, 2500);
</script>


So you can see that on the first load I fetch the national news immediately (which is above the fold), and use setInterval to refresh it every 15 minutes.

Then I use setTimeOut to fetch the local news 2.5s after fetching the national news, and use setInterval to refresh it every 57 minutes.

The logic for the 2.5s delay and to refresh every 57 minutes instead of 60 is to prevent the script from running twice at the exact same time.

In the PHP script that's loaded via Ajax, I cURL like so:

function getFile($url, $getInfo) {
$t = false;

if (strpos($url, 'http') !== false) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$t = curl_exec($ch);

// in this example I don't load $getInfo so this section shouldn't run
// I'm including it here, though, in case you see a problem
if ($getInfo && $t) {
$arr = curl_getinfo($ch);
$http = $arr['http_code'];

if ($http == 200)
$t = $arr[$getInfo];

else
$t = false;
}

curl_close($ch);
}

return $t;
}

// I'm leaving out code to select a list of news feeds from MySQL, but
// the logic is that the scripts reads them in order, then shows the
// first one that responds

while (list($news_id, $news_feed, $courtesy) = mysqli_fetch_row($sth_feed)) {

// I use a text file to store the last updated timestamp
$news_filename = $datapath . '/news/' . $news_id . '.dat';

if (is_file($news_filename))
$last_modified = time() - filemtime($news_filename);

if (!$last_modified ||
$last_modified > 900) // file is more than 15 minutes old)
$contents = getFile($news_feed);

else
break;

if ($contents) break;
else
mail('hostmaster@example.com',
"$courtesy isn't responding",
'body of the email, not relevant');
}

// blah blah blah
}


The logic here is that I select my list of news feeds, then use the while() loop to run through them. It checks the filemtime of the first one, and if it doesn't exist or if it's more than 15 minutes old then it tries to use getFile() to fetch it. If it tries to fetch it but it doesn't respond (eg, $contents is left empty) then it sends me an email and moves on (where it prints data that's older than 15 minutes, anyway).

I've left the page open for the last 12 hours on my browser, so it should have fetched the national news feed 48 times (every 15 minutes for 12 hours) and the local news feed 12 times (every 57 minutes).

Of those, I had 10 emails that the top national news feed (Yahoo News) wasn't responding, 5 emails that the top local news feed wasn't responding, and 2 emails that the second local news feed wasn't responding.

That's a LOT more than I expected... 20% of the requests for Yahoo failed, 40% of the requests for local news failed, and 40% of the requests for the backup local news failed!

I can't figure out where the bottleneck is, though. Is something wrong with my intervals that I'm still fetching feeds at the same time, causing one to time out while the other is being fetched? Or is there something in my curl_setopt that should be modified... maybe CONNECTTIMEOUT is set too low (at 3)?

Or is there something else wrong with the logic of how I'm doing it?