After a dedicated server that I rent showed serious performance problems that are not commensurate with the relatively low traffic that the web sites on the server get (most of the files on the server are for other projects that I am tinkering with) the hosting company investigated, and notified me that the large number of files that I use are the source of the problem.
Is this plausible? The data of the filesystem are:
100 GByte harddisk, ext3 file system, RAID-1, ca. 20 % used 4.3 million files in the directory structure of my home directory up to ca. 20,000 subdirectories per directory up to ca. 7,000 non-directory files per directory
Did anyone of you encounter a similar problem? Am I running against a total-files or a files-per-directory limit here?
The total number of files on a filesystem shouldn't be a problem with any filesystem I'm familiar with, unless you run out of inodes--but that wouldn't be a performance issue, it'd give you errors.
Files per directory is a major issue for many OS and filesystem combinations. Some operating system (like FreeBSD) provide hashing for quick access to large numbers of files on any filesystem. Others experience widely varying performance, depending on how directories are organized.
It sounds like you're probably using a system that stores directory entries in order of creation, requiring a linear search each time a file in that directory is accessed. You can solve this by switching filesystems, switching operating systems, or rethinking your approach to data storage in order to reduce the number of files.
I ran into a similar issue with my IMAP server. I'm using Courier-IMAP which saves emails and folders in the Maildir structure (each folder is a directory in the file system, and each message is a file in that directory).
I was using ext3 and encountered the same problems you describe. Several months back, I converted the partition to JFS and it's worked out well. JFS is IBM's journalling file system that they've released to the open source community. It's optimized for accessing a large number of files (millions) on a file system. My CPU and I/O utilization have gone down drastically (about 1/4th), and my performance on email operations has gone up by three-fold.