Forum Moderators: phranque

Message Too Old, No Replies

if cURL doesn't respond in time?

         

csdude55

6:14 pm on Nov 9, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm having an unexpected issue, and I'd be interested in some feedback. This isn't a live page yet, so there's no traffic on any of these except for me.

I'm curling RSS feeds for news headlines. I run the same script on my homepage twice, once for national news and once for local. I use jQuery and Ajax to fetch them, like so:

<script>
var newsPath = home + '/news.php?feed=national',
newsLocalPath = home + '/news.php';

$(function() {
if ($('#news_national').length)
natNews = setInterval("$('#news_national').ajax(newsPath)", 900000); // 15 min; 15 * 60 * 1000

if ($('#news_local').length)
locNews = setInterval("$('#news_local').ajax(newsLocalPath)", 3420000); // 57 min
});
</script>

<div id="news_national"></div>
<script>
$('#news_national').ajax(newsNationalPath);
</script>

<div id="news_local"></div>
<script>
setTimeout(function() {
$('#news_local').ajax(newsLocalPath)
}, 2500);
</script>


So you can see that on the first load I fetch the national news immediately (which is above the fold), and use setInterval to refresh it every 15 minutes.

Then I use setTimeOut to fetch the local news 2.5s after fetching the national news, and use setInterval to refresh it every 57 minutes.

The logic for the 2.5s delay and to refresh every 57 minutes instead of 60 is to prevent the script from running twice at the exact same time.

In the PHP script that's loaded via Ajax, I cURL like so:

function getFile($url, $getInfo) {
$t = false;

if (strpos($url, 'http') !== false) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$t = curl_exec($ch);

// in this example I don't load $getInfo so this section shouldn't run
// I'm including it here, though, in case you see a problem
if ($getInfo && $t) {
$arr = curl_getinfo($ch);
$http = $arr['http_code'];

if ($http == 200)
$t = $arr[$getInfo];

else
$t = false;
}

curl_close($ch);
}

return $t;
}

// I'm leaving out code to select a list of news feeds from MySQL, but
// the logic is that the scripts reads them in order, then shows the
// first one that responds

while (list($news_id, $news_feed, $courtesy) = mysqli_fetch_row($sth_feed)) {

// I use a text file to store the last updated timestamp
$news_filename = $datapath . '/news/' . $news_id . '.dat';

if (is_file($news_filename))
$last_modified = time() - filemtime($news_filename);

if (!$last_modified ||
$last_modified > 900) // file is more than 15 minutes old)
$contents = getFile($news_feed);

else
break;

if ($contents) break;
else
mail('hostmaster@example.com',
"$courtesy isn't responding",
'body of the email, not relevant');
}

// blah blah blah
}


The logic here is that I select my list of news feeds, then use the while() loop to run through them. It checks the filemtime of the first one, and if it doesn't exist or if it's more than 15 minutes old then it tries to use getFile() to fetch it. If it tries to fetch it but it doesn't respond (eg, $contents is left empty) then it sends me an email and moves on (where it prints data that's older than 15 minutes, anyway).

I've left the page open for the last 12 hours on my browser, so it should have fetched the national news feed 48 times (every 15 minutes for 12 hours) and the local news feed 12 times (every 57 minutes).

Of those, I had 10 emails that the top national news feed (Yahoo News) wasn't responding, 5 emails that the top local news feed wasn't responding, and 2 emails that the second local news feed wasn't responding.

That's a LOT more than I expected... 20% of the requests for Yahoo failed, 40% of the requests for local news failed, and 40% of the requests for the backup local news failed!

I can't figure out where the bottleneck is, though. Is something wrong with my intervals that I'm still fetching feeds at the same time, causing one to time out while the other is being fetched? Or is there something in my curl_setopt that should be modified... maybe CONNECTTIMEOUT is set too low (at 3)?

Or is there something else wrong with the logic of how I'm doing it?

phranque

11:58 am on Dec 28, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i don't see where you are reporting the status code of the response, so how would you know what "isn't responding" precisely means?

robzilla

7:54 pm on Dec 28, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe. You'll have to do a bit more debugging to find out what's failing exactly. Not getting a 200 status code does not necessarily equal a timeout.

The CURLOPT_VERBOSE option could help.

I'd be wary of using something like this in production. Long-running scripts might hog your PHP workers and slow everything down. Depends on the traffic, of course.

Assuming the headlines won't be different for each user, so maybe just have a single job running to fetch the headlines every X minutes and then cache those somewhere?