Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: bakedjake
Is this plausible? The data of the filesystem are:
100 GByte harddisk, ext3 file system, RAID-1, ca. 20 % used
4.3 million files in the directory structure of my home directory
up to ca. 20,000 subdirectories per directory
up to ca. 7,000 non-directory files per directory
Did anyone of you encounter a similar problem? Am I running against a total-files or a files-per-directory limit here?
IIRC, directories in ext2/3 are organized as a linked list, so to get at a file in a huge directory takes time. I've run across this in both Linux and Solaris.
Reiserfs will probably help, since files are organized for quicker access. I suppose you could also think about moving the content to a database if that's feasible.
Files per directory is a major issue for many OS and filesystem combinations. Some operating system (like FreeBSD) provide hashing for quick access to large numbers of files on any filesystem. Others experience widely varying performance, depending on how directories are organized.
It sounds like you're probably using a system that stores directory entries in order of creation, requiring a linear search each time a file in that directory is accessed. You can solve this by switching filesystems, switching operating systems, or rethinking your approach to data storage in order to reduce the number of files.
I was using ext3 and encountered the same problems you describe. Several months back, I converted the partition to JFS and it's worked out well. JFS is IBM's journalling file system that they've released to the open source community. It's optimized for accessing a large number of files (millions) on a file system. My CPU and I/O utilization have gone down drastically (about 1/4th), and my performance on email operations has gone up by three-fold.