Forum Moderators: phranque
My server is a Pentium 4, 2.8GHZ, with about 40Mb of free RAM and a server load between 0.07 and 0.20 when I ran these tests.
File creation
This one isn't so relevant, because my app will never be creating 10,000 files all at once! Still, since I had to create the files to run the test, I timed it. It took "0" seconds to create the 100 and 1000 files for the respective directories, and 7 seconds for the 10,000 files. (Perl's normal resolution for time is 1-second increments.) Incidentally, the files are empty, because all I really need to test is access time, not actually change the contents of the files.
ls (directory listing) from the command line
One of the many (unhelpful) comments I found on a messageboard while trying to research number of files and performance was something like "with a lot of files in a directory, even an 'ls' command can bring a server to its knees." So I decided to test this one. My command line is Terminal in Mac OS X. I'm logging into my server remotely. (I'm in Austin, server is in California.) Anyway, it took less than two seconds for the ls command to run on the 10,000-file directory. The real bottleneck seemed to be in spitting all the data to the window.
Open 100 random files
Now we get down to business! I opened and closed 100 random filenames from each directory. For each folder, it took "0" seconds. I tried it again with 1000 filenames. Still 0 seconds for everything. Once more with *10,000* filenames, and the results were 2, 1, and 2 seconds respectively. (I assume the 1000-file directory scoring best is due to a rounding error, since Perl isn't measuring fractions of a second.)
Get 100 random files via HTTP
Okay, so UNIX clearly doesn't slow down if there are 10,000 files in a directory, but will Apache care? I doubt it, since Apache is getting the files via UNIX, but I might as well be thorough. Anyway, for fetching 100 random files from each directory via HTTP, it took all of 2 seconds for each directory, even the 10,000-file directory.
Conclusion
Man, you can easily have 10,000 files in a directory and the system just doesn't blink. No performance problems at all.
By the way, here was the code I used for "Open 100 to 10000 random files".
#!/usr/bin/perl
print "\n";
&testFiles(100);
&testFiles(1000);
&testFiles(10000);
sub testFiles {
$start=time;
for $counter (0..100000) {
$x = int(rand($_[0]))+1;
open (FILE,"<$_[0]/$x.txt") ¦¦ die("Couldn't open file $x. $!");
close (FILE);
}
print "$_[0]: ". (time-$start) . " seconds\n"
}
And here's the code for "Get 100 random files via HTTP".
#!/usr/bin/perl
print "\n";
&testFiles(100);
&testFiles(1000);
&testFiles(10000);
sub testFiles {
$start=time;
for $counter (0..100) {
$x = int(rand($_[0]))+1;
use LWP::Simple;
$content = get("http://example.com/$_[0]/$x.txt");
}
print "$_[0]: ". (time-$start) . " seconds\n"
}
This is one likely cause for the common approach of using URL-to-filespace mappings such as
/apples.html --> /a/apples.html
/apricots.html --> /a/apricots.html
/carrots.html --> /c/carrots.html
/cucumbers.html --> /c/cucumbers.html
and other similar partitioning approaches used to reduce physical directory size.
Jim