Welcome to WebmasterWorld Guest from 54.80.198.173

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

cURL is making my RAM spike, big time!

     
4:45 am on Feb 7, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 774
votes: 69


The other thread about upgrading from PHP 5.3 to 5.6 has sort of led to this, but I think it's totally different now so I'm making a new thread. Sorry to the mods if you think they should be merged!

I'm currently running PHP 5.3.29. Here's my script:

$url = "http://www.whatever.com/rss.xml";
$filename = "/home/example/data/file.dat";

if (!is_file($filename)) $difference = 901;
else $difference = time() - filemtime($filename);

if ($difference > 900) {
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");

$contents = curl_exec($ch);
curl_close($ch);

//// This is the old script
// if (!$contents = file_get_contents($url)) {
// $error = error_get_last();
// echo "HTTP request failed: " . $error['message'];
// }

if ($contents) {
// print data from XML feed
}
}


Running the script using the file_get_contents() option works just fine. Then I switch it to cURL and it works fine, but my server load spikes, HARD! Like, it jumps from using 3G of RAM to 12G, and my Apache "number of processes" jumps from 170 to 450!

Here's where it gets weird... I changed the script back to file_get_contents(), but the server load didn't go down. I even removed the script completely, but it didn't go back down until I restarted Apache. So, I guess that curl_init() is opening, but never closing?

I just can't understand the problem. As far as I can tell, it should be reading the timestamp of the text file, and only running every 15 minutes. I definitely have curl_close($ch) in place, so it should be opening and closing within a second.

Any advice is greatly appreciated!
5:49 am on Feb 8, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15756
votes: 0


it may just be that the curl call doesn't timeout

file_get_contents times out at a default of 60 seconds, based on the default_socket_timeout, it tends to fail quickly. cURL on the other hand has a tendency to hang on and try valiantly to complete requests as the default timeouts can be unspecified. I seem to remember unless you build it properly the very lowest it can go is 1 sec and that is only if specified in the call. I am guessing since the above tests say that if the file doesn't exist or if the file has been last modified in more than 900 seconds you are running this on some kind of cron or at least repetitively and the cURL calls are stacking up. Removing the script does nothing as it has been run already and those have not finished and must be killed. Therefore restarting apache gets rid of it.

but just a guess
9:25 am on Feb 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 774
votes: 69


Thanks, Jatar! I modified the code to this:

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");

$contents = curl_exec($ch);
curl_close($ch);


Now it's not crashing my server or causing a load spike, so that's good! But it's also not properly fetching the data, so I'm going to have to dig in to that a little more. I suspect that it needs more than 60 seconds? But that's weird, since file_get_contents() never fails at 60 seconds.

(I double checked, and default_socket_timeout is set to 60 in php.ini, so that's definitely the default timeout)

I'll have to play with that one more tomorrow during off-peak time, but for now, at least, CONNECTTIMEOUT and TIMEOUT seemed to have solved this particular problem :-) So thanks again, Jatar!
7:26 pm on Feb 9, 2017 (gmt 0)

Senior Member from MZ 

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 9, 2005
posts: 836
votes: 0


your file is big. download it in chunks to keep load off RAM/Buffer
8:57 pm on Feb 9, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 774
votes: 69


It's really not big at all, though, phparion. Here's the page I'm fetching:

[w1.weather.gov...]

You have to View Source to see the XML, but you'll see it's only 49 lines... around 2.4kb. It definitely shouldn't take more an a second to download.
9:56 pm on Feb 9, 2017 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 15, 2011
posts:47
votes: 2


If you set CURLOPT_HEADER to 0 instead of 1, you will get only the XML-file - that is what you want?

Now the http headers returned are included in the $contents - so It is not valid XML.

I ran your exact code with the url above and got the result in a blink of an eye, using PHP 5.6.29.
11:07 pm on Feb 9, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1829
votes: 271


Try turning on verbose output:
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_STDERR, fopen('/path/to/log/file.txt', 'w'));

You will have to disable CURLINFO_HEADER_OUT temporarily.

Also, what happens when you run the script from the command line? Or when you curl or wget the XML file directly from the command line?

While it doesn't seem too likely for a government website, if they don't like your frequent requests, perhaps they're feeding you something bigger...
9:40 am on Feb 10, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 774
votes: 69


Awesome, guys, thanks! I think that the issue was either the CURLOPT_HEADER setting (you're right, I just wanted the XML), or maybe the CURLINFO_HEADER_OUT setting (which, honestly, I don't understand what it does). But I removed those, and then it worked with no more problem :D

Rob, the code for an error log is going to be a lifesaver in the future, so thanks a lot for that one, too! Just for clarification, though, am I right that there's no need for an fwrite() statement anywhere? And, do I need to fclose('file.txt') after curl_close($ch), or is it automatically done with the curl_setopt()?

For any future readers that are keeping up with this, here's the code I'm running now. I use the file.dat to run this when someone visits the page and it hasn't been updated in > 15 minutes; I figured that there's no need for a cron to run every 15 minutes if the page hasn't been visited in a few hours, and I run the same code on some very slow sites that might go a few days without any traffic :-( There might be a better way to do it, but I wrote this a long time ago and it's been working well enough for my purposes, I guess:

$url = 'http://www.whatever.com/rss.xml';
$filename = '/home/example/data/file.dat';
$error_log = 'error.log';

if (!is_file($filename)) $difference = 901;
else $difference = time() - filemtime($filename);

if ($difference > 900) {
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);

// Error log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, fopen($error_log, 'a'));

$contents = curl_exec($ch);
curl_close($ch);

// I think this is needed, unless it's implied with STDERR?
fclose($error_log);

// Backup; in case cURL fails, try file_get_contents()
// I'll remove this once I update to PHP 5.6.29
if (!$contents) {
$contents = file_get_contents($url);
$msg = "\n** Moved on to file_get_contents()...\n ";

if (error_get_last()) {
$msg .= error_get_last();

$fh = fopen($error_log, 'a');
fwrite($fh, $msg);
fclose($error_log);
}
}

if ($contents) {
// print data from XML feed
}
}
11:53 am on Feb 10, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1829
votes: 271


Just for clarification, though, am I right that there's no need for an fwrite() statement anywhere? And, do I need to fclose('file.txt') after curl_close($ch), or is it automatically done with the curl_setopt()?

Curl will write to the file if any errors occur, and you can close the file explicitly, as you do, but since it's a short-running script, all opened files will be closed upon exiting anyway.

I use the file.dat to run this when someone visits the page and it hasn't been updated in > 15 minutes

Note that, if it has been >15 minutes, you're increasing your users' load time that way. If the download gets stuck for some reason, they may even have to wait 60 seconds for the job to time out.
10:20 am on Feb 11, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 774
votes: 69


Curl will write to the file if any errors occur, and you can close the file explicitly, as you do, but since it's a short-running script, all opened files will be closed upon exiting anyway.


Cool, Rob, thanks!

Note that, if it has been >15 minutes, you're increasing your users' load time that way. If the download gets stuck for some reason, they may even have to wait 60 seconds for the job to time out.


True. In my case it's not a big deal; I'm loading the script via Ajax, so if it gets hung up then the user doesn't really notice (unless it takes all of my server's RAM, of course, but that's a different story). But for anyone reading this in the future that might want to try what I did, that's a great point to note. A regular cron is probably wiser, I mainly just went this route because I was grabbing a local news feed (with their permission), but their server kept blacklisting me for hitting it too often so I created an alternative.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members