|Large number of files/directories in total and per-dir - problem?|
| 7:05 pm on Apr 24, 2005 (gmt 0)|
After a dedicated server that I rent showed serious performance problems that are not commensurate with the relatively low traffic that the web sites on the server get (most of the files on the server are for other projects that I am tinkering with) the hosting company investigated, and notified me that the large number of files that I use are the source of the problem.
Is this plausible? The data of the filesystem are:
100 GByte harddisk, ext3 file system, RAID-1, ca. 20 % used
4.3 million files in the directory structure of my home directory
up to ca. 20,000 subdirectories per directory
up to ca. 7,000 non-directory files per directory
Did anyone of you encounter a similar problem? Am I running against a total-files or a files-per-directory limit here?
| 8:27 pm on Apr 24, 2005 (gmt 0)|
<i>Am I running against a total-files or a files-per-directory limit here?</i>
IIRC, directories in ext2/3 are organized as a linked list, so to get at a file in a huge directory takes time. I've run across this in both Linux and Solaris.
Reiserfs will probably help, since files are organized for quicker access. I suppose you could also think about moving the content to a database if that's feasible.
| 3:53 am on Apr 25, 2005 (gmt 0)|
The total number of files on a filesystem shouldn't be a problem with any filesystem I'm familiar with, unless you run out of inodes--but that wouldn't be a performance issue, it'd give you errors.
Files per directory is a major issue for many OS and filesystem combinations. Some operating system (like FreeBSD) provide hashing for quick access to large numbers of files on any filesystem. Others experience widely varying performance, depending on how directories are organized.
It sounds like you're probably using a system that stores directory entries in order of creation, requiring a linear search each time a file in that directory is accessed. You can solve this by switching filesystems, switching operating systems, or rethinking your approach to data storage in order to reduce the number of files.
| 8:31 am on Apr 26, 2005 (gmt 0)|
Thanks for the information. I'll probably go for making hashed subdirectories, then (123456.txt -> 1/2/3/456.txt) - that should be the solution that's most portable between servers.
| 2:25 am on May 12, 2005 (gmt 0)|
I ran into a similar issue with my IMAP server. I'm using Courier-IMAP which saves emails and folders in the Maildir structure (each folder is a directory in the file system, and each message is a file in that directory).
I was using ext3 and encountered the same problems you describe. Several months back, I converted the partition to JFS and it's worked out well. JFS is IBM's journalling file system that they've released to the open source community. It's optimized for accessing a large number of files (millions) on a file system. My CPU and I/O utilization have gone down drastically (about 1/4th), and my performance on email operations has gone up by three-fold.