Forum Moderators: coopster
But now I'm wondering how much of a performance hit this is compared to using the database indicator method since the script would have to actually physically check the hard drive for a file. I'm sure the number of items per page would be a factor. At what point does doing a bunch of file_exists operations become cumbersome? I'm thinking the most I'd have per page is about 20 (that is, 20 products per page).
TIA
Write a script with a loop in it and have it check file_exists 40,000 times and see what you get for results. If you don't know how to benchmark a script, look for the benchmarking article in the forum library. And remember, report back with results when you do.
for ($i=1; $i<100000; $i++) {file_exists($_SERVER['PHP_SELF']);}
It took 10.5 seconds for 100,000 iterations.
By comparison, I set it up to call about the simplest user-defined function that I could think of - one that doesn't do anything at all. So calling this instead
function testfunc()
{
return;
}
Took 3 seconds.
You might be right in wondering how it scales to different systems. The main point I was trying to make is that calling the file_exists() function was only about three times faster than calling a function that performs no function (sorry). I also did a quick test with one that simply assigned an integer to a variable and then quit and that was slower than the empty function by about 50% or roughly half the execution time of file_exists().
That was just running on the older computer I was surfing on with a testbed WAMP setup
- Athlon 750MHz
- ca 640MB of RAM (can ergo add? 256 + 256 + 128 = 640)
- lots of other apps running, but no live web server, no concurrent requests.
I suppose the seeks might slow down a fair bit with many concurrent requests waiting for the read head to come around again. I suppose it might depend on how everything was running though. If there weren't a lot of files on the disk, might it just keep the FAT or part of it in memory and not recheck the disk every time? Of course I assume that file_exists does not actually check the file location, but just looks it up in the FAT.
I didn't try calling the function twice (you mean twice per iteration through the loop?)
Thanks!
yes, for a moment I thought a single call to that function took 3 seconds, my mind reeled a bit at the implications.
hehe, I should drink more coffee sometimes but I think this time less would have been better.
Yes, I was wondering about twice per, trying to figure if it would actually double or not. You wouldn't think it should matter but I wonder. Some benchmarking results over the years still don't make sense to me. I can't remember any off the top of my head but I know there are some every once in a while.
I will have to try the same test on my servers tomorrow and see the variations.
Fascinating. I always assumed that calls to the file system would always be slower than calls to the database.
You can think of it with this analogy:
If you were looking for someone in a city and you had their address, you would just go straight to the address - very fast (filesystem when you know the file name and don't have to search through it).
If you only knew the name and had to go around knocking on doors house-by-house until you found the right house - very slow (filesystem when you don't know which file the data is in or it's in a very large file and you have to read it line-by-line until you find it).
If you had a phone book and the name and you could use the name to look up the address and then go straight there - not as fast as just knowing the address, but still quite fast (database situation).
So if you want the whole file and you know where it is, or in your case you know the name and want to see whether the name is valid, it's fastest to use the filesystem. If that file has 86,000 rows of data and you want to find one row, you should use a DB.
file_exists()are cached. Idea is then, once it's checked to see if the file's there, for the rest of those 99,000 times it pretends it's checking, it's really just taking a peek at where it hid this value in memory. Like your night watchman who does his round once and just ticks off the boxes each following hour since he remembers everything was there. Thought being that if PHP really sees fit to cache this kind of thing, there must be some sort of savings involved. If this were true, I could strut around like Mr. Smartypants.
Anyways, here's the results - first the benchmark code:
require '/somepath/PEAR/benchmark/Benchmark-1.2.1/Timer.php';
$timer = new Benchmark_Timer();
$timer->start();
for($i = 0; $i< 100000; $i++){
file_exists($_SERVER['PHP_SELF']);
}
$timer->setMarker('Mark1');
echo "Elapsed time between Start and Mark1 (file_exists() only): " .
$timer->timeElapsed('Start', 'Mark1') . "<br />\n";
for($i = 0; $i< 100000; $i++){
clearstatcache();
file_exists($_SERVER['PHP_SELF']);
}
$timer->setMarker('Mark2');
echo "Elapsed time between Mark1 and Mark2 (clearstatcache() + file_exists() : " .
$timer->timeElapsed('Mark1', 'Mark2') ."<br />\n";
$timer->setMarker('Mark3');
for($i = 0; $i< 100000; $i++){
clearstatcache();
}
$timer->setMarker('Mark4');
echo "Elapsed time between Mark3 and Mark4 (clearstatcache() only) : " .
$timer->timeElapsed('Mark3', 'Mark4') ."<br />\n";
$timer->stop();
$timer->display();
the relevant results:
Elapsed time between Start and Mark1 (file_exists() only): 0.479363
Elapsed time between Mark1 and Mark2 (clearstatcache() + file_exists() : 0.524402
Elapsed time between Mark3 and Mark4 (clearstatcache() only) : 0.129451
Conclusion:
clearstatcache()offers no serious time loss for checking to see if the same file exists, 100,000 times in a row (only about 10%, and less than
file_exists()and
clearstatcache()separately). Not really the kind of info you'd need, though, for a 'real life' application.
Benchmarked on Debian Linux, P4 2.4MHz, 1GB RAM, SATA hard disk w/8MB cache, various stuff open (in X) but no public server.
if (file_exists('my.file'))
{
unlink('my.file');
}
if (file_exists('my.file'))
{
echo "Sorry, delete failed";
}
If the results of file_exists() were cached, you woudl get the failure message every time.
And yes, I'm old school - sub 1GHz. For a desktop, it seems to do most things just fine until I get around to editing my cinematic masterpiece, but that's down the road.
I ran a bunch of things just for interest and cam up wioth these numbers. Just using microtime before and after and subtracting. all 100K iterations
file_exists($_SERVER['PHP_SELF']);
1.64718198776
1.68153810501
1.66256690025
file_exists($_SERVER['PHP_SELF']);
clearstatcache();
1.75823688507
1.76556396484
1.76278805733
testfunc(); // which does the big nada
0.361613035202
0.32157087326
0.342176914215
double tesfunc
0.528954029083
0.579276800156
0.52706694603
quad tesfunc
0.832072973251
0.829246044159
0.811228990555
Benchmarked on Sunfire V120, Solaris 8 , UltraSparc IIi 550MHz, 512 MB RAM, SCSI hard disk, dev server, runs massive amounts of crap, including Oracle server and client, mysql, Apache, minimal concurrent connections
>> my puter's faster than yours, ergophobe!
geez looks like on paper it's faster than mine too, but then again, I don't think so ;)
Yeah, this is exactly what I assumed, and I remember when I first read the thing in the manual on the
file_exists()page about caching this function, and I thought, what the ***? For the same reasons you mention. Apparently, if you want to do what you mention, you do have to call
clearstatcache() [be2.php.net]before checking on a file a second time. This has been so since PHP3 - maybe it's a remnant from the days when people thought PHP would be used a lot for doing batch-like tasks ala PERL, or when people depended more on the filesystem rather than just sticking every imaginable thing in the database, like they seem to do today.
I'm also not much of a currentest-stuff-in-your-box sorta guy, got this machine to replace a PII 330MHz 192MB jobber with a 14" CRT, and haven't even bothered yet getting the ATI graphics acceleration to work. You've got a nice amount of RAM, though - hardly ever use any more than that, and I think I'd take 750MHz with 640MB over 2.4GHz and 256MB any day.
Jatar: no contest here - sheesh, Solaris / SCSI and Oracle on yer desktop? You get the option of the 20-year nuclear UPS? You're way out-ubering all the ubergeeks I know.