Welcome to WebmasterWorld Guest from 54.205.170.21

Forum Moderators: bakedjake

Message Too Old, No Replies

Linux ext2 filesystem and large numbers of files

Any issues?

   
11:15 am on Oct 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One solution to a project I'm working on could involve the creation of a large number of very small files - in the region of a million or so.

Are there any real or practical limitations on the number of files in a Linux filesystem? Any other issues to bear in mind - eg storage space efficiency etc?

6:48 pm on Oct 21, 2002 (gmt 0)

10+ Year Member



man mke2fs

the -i option sets the bytes/inode ratio.

I was looking for something similar in tune2fs but I couldn't find any.

7:38 pm on Oct 21, 2002 (gmt 0)

10+ Year Member



It seems like once you get a couple thousand files in a directory it takes forever to ls it. I think that it has to stat all the files and that takes forever. I don't think it's even a linear increase since it starts to get really slow pretty quickly. I can't remember if it slows down the time to stat a file given its name or to stat the directory or anything like that, though.

I would suggest a couple of things: test it out first by making a directory with 100,000 or so files in it and see if it works very well. Also, think about making the directory structure hierarchical in some way. You will notice that machines with many users do this with the first two characters of the username for /home. Something like:

/lotsafiles/aa/aasgd
/lotsafiles/aa/aaasdf
/lotsafiles/ab/abasdf
/lotsafiles/bb/bbfoobar

may work well for you.

Finally, as martin suggested, you may want to adjust your block size if the files are all really small so that you don't use up extra disk space.

A relational database is probably starting to sound a little more attrcative now, I bet.

8:07 pm on Oct 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> structure hierarchical in some way

Yep, was planning to have directories 0-255, each with subdirs 0-255 (ie a 16 bit number range mapped to the filesystem). Many directories would in fact be empty or maybe only contain a few files, some might contain a few thousand.

> A relational database is probably starting to sound a little more attrcative

Hehe, actually I'm using one now, and it's starting to drag its feet a little. I think I can maybe cut down the search times by going this way instead... but my grand plan is a little hazy at the moment I have to admit.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month