homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
How many files should one have in a directory?
linux file directory inode
iamvela




msg:3891771
 2:09 am on Apr 14, 2009 (gmt 0)

I have a bunch of user files in the hundreds of thousands and will probably explode into the tens of millions (maybe a few orders of magnitude more) that I am saving on disk.

Obviously we are using a hierarchy structure but are wondering
1. what are the limiations we should impose on the most number of files that any one sub-directory should have?
2. Do a summation on sub-dirs or does that even matter>

Thanks!

 

vincevincevince




msg:3891773
 2:14 am on Apr 14, 2009 (gmt 0)

I don't know what these files are; but it may be better to use a database rather than individual files. Even small binary strings are fine in most databases; and these systems are designed to handle very large numbers of small records.

I don't think there is any limit on the newer linux filesystems concerning the number of files per directory.

Have you looked into cloud storage solutions? This may be your answer as they are optimised for immense numbers of files, and scale beautifully with your growth.

iamvela




msg:3891825
 5:16 am on Apr 14, 2009 (gmt 0)

We actually have the files in a DB right now and MySql is horrible when tables start getting into Gigabytes...

mcavic




msg:3892078
 2:46 pm on Apr 14, 2009 (gmt 0)

It depends on the database structure and indexing. Partitioning helps a lot. I have a mysql database with 35 gigs of music in it, and it brings up a random song without any problem.

On another server I have a table that's only 14 gigs, but 105 million rows, and it does fine as long as I give it a well-indexed query, otherwise it will hang.

The real downside of the database, I think, is that if it gets corrupted, you could lose more data than you would with a file system. And, it can be more difficult to back up.

iamvela




msg:3892259
 5:43 pm on Apr 14, 2009 (gmt 0)

Wow, thanks for sharing that mcavic.

In our case one of the offending tables has less than 20,000 rows but is about 2Gigs in size. There is a mediumblob column in this table that causes even a simple SELECT to take as much as a minute - I'm talking about a SELECT on PRIMARY KEY taking over a minute such as
SELECT * FROM OFFENDING_TABLE where id='735'

We think we have indexing right... regardless why would such a simple query take so long?

We also did another simple test where we copied over the table (exported it) as another table, dropped the mediumblob column and ran the same queries. The queries now took a fraction of a second to run - including complex queries with large multiple joins.

So what could I be doing wrong? What can we do differently? Would appreciate any feedback, anyone can offer.

Thanks much!

LifeinAsia




msg:3892270
 5:57 pm on Apr 14, 2009 (gmt 0)

We think we have indexing right

For that query, unless you have an index on the id field, I'd say you don't have it right. :)

You may want to try rebuilding the index(es). Over time, indexes can become corrupted (especially if there are a lot of inserts/updates to the fields indexed), so they can periodically need rebuilding/reindexing.

iamvela




msg:3892289
 6:17 pm on Apr 14, 2009 (gmt 0)

Thanks for your inputs LifeinAsia, I think you missed th fact that I said id was a PRIMARY KEY. Let me even add to it and say that it is AutoIncrement.

And we run
mysqlcheck -A -c -a -s --use-frm
everyday. Periodically we do ANALYZE as well.

Now what? :(

LifeinAsia




msg:3892373
 7:38 pm on Apr 14, 2009 (gmt 0)

Oops, ya, missed that. :)

The only other thing that I can think of is that the underlying file structure may be outta whack and needs to be defragged (or whatever the *nix equivalent is).

Also, since the query went so quickly on the second table without the mediumblob column, try the SELECT query on the original table, but exclude the mediumblob column. (In other words, explicitly list all the columns except the mediumblob column.) Does that make any difference?

iamvela




msg:3892405
 8:23 pm on Apr 14, 2009 (gmt 0)

Yeah we're not even using the medium blob column in any of the queries... I jut showed SELECT * to make the point. In actuality we do list the columns explicitly.

All operations on that table are HORRIBLY slow, even going to (as in browse) to the last page of the OFFENDING_TABLE in phpMyAdmin.

We have not done any partitioning (horizontal or vertical). That seems like the only other thing that I can think of.. and will spend some time to educate myself on that.

Or, just go to flat files and put all these mediumblobs as files.

Thanks for your inputs. Would welcome any ideas?

Regards,
--

mcavic




msg:3892559
 12:50 am on Apr 15, 2009 (gmt 0)

This is very strange. Partitioning shouldn't help much on a 2 gig table, but something interesting to try might be:

ALTER TABLE MyTable
PARTITION BY LINEAR KEY(MyPrimaryKey)
PARTITIONS 30;

That would cut the size of each data file down to 67 megs. And if you select a specific record, it would only have to open one data file and one index file to find it. You could then time it to see how much faster it is.

iamvela




msg:3892763
 9:17 am on Apr 15, 2009 (gmt 0)

I will try this and report back ;) BUT, we need to select more than one record. Typically it is like a "Browse" functionality that the user can then paginate over... makes sense?

Thanks for your help & ideas. Hopefully we will resolve this soon :)
--

iamvela




msg:3893485
 4:43 am on Apr 16, 2009 (gmt 0)

No matter whether I do it from mysql> prompt or from phpMyAdmin I get the error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PARTITION BY LINEAR KEY(`id`)
PARTITIONS 20'

...and I did try it with & without the reverse-apostrophes/ticks around the table and field names...

Arrrgh! :)
--

iamvela




msg:3893486
 4:43 am on Apr 16, 2009 (gmt 0)

No matter whether I do it from mysql> prompt or from phpMyAdmin I get the error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PARTITION BY LINEAR KEY(`id`)
PARTITIONS 20'

...and I did try it with & without the reverse-apostrophes/ticks around the table and field names...

Arrrgh! :)
--

iamvela




msg:3893487
 4:45 am on Apr 16, 2009 (gmt 0)

P.S.
I tried ALTER table as well as CREATE table and both are spitting out the same error.

We are running
mysql Ver 14.12 Distrib 5.0.22, for redhat-linux-gnu (i686) using readline 5.0

Thanks again!
--

mcavic




msg:3893497
 5:10 am on Apr 16, 2009 (gmt 0)

Sorry, I should have mentioned that partitioning isn't supported until MySQL 5.1.

iamvela




msg:3893505
 5:27 am on Apr 16, 2009 (gmt 0)

ok, I suppose we can upgrade our MySQL. Any known gotchas with that?

But it still bothers me that this issue has eaten up days & days of my brain cycles and is till unresolved... especially given that you you are doing the exact same thing without any problems really bugs me!

Y'know what else we figured out: e.g. when doing a join with another table instead of JOINING on OFFENDING_TABLE.id we joined on OFFENDING_TABLE.id+0 it spits the results right back! Voila!

Basically what we did was force it to not use the index for this table... by using a math operator: 0 in this example. Hopefully someone else can learn/benefit from this too :)

What is the URL to your app? Rgds,
--

mcavic




msg:3893518
 6:12 am on Apr 16, 2009 (gmt 0)

When doing a major upgrade, it's recommended to dump all of your databases and re-import them afterward, rather than carrying the binary data files over.

I know! It's bugging me too. Perhaps there's some sort of a bug in your installation that would be resolved by upgrading. If it was a hardware problem I assume there would be other symptoms.

The music database I mentioned is a streaming server that I'm writing in PHP for my own use. I'd be happy to send you a copy, but it's not really finished.

iamvela




msg:3893581
 8:44 am on Apr 16, 2009 (gmt 0)

Ok so we upgraded to Server version: 5.1.33-community and I was able to alter the table to PARTITION.

Looks like performance is totally un-affected :(
--

vincevincevince




msg:3894431
 9:41 am on Apr 17, 2009 (gmt 0)

What is the field type of the id column? I know it's a silly question but...

Also - how does the performance of :
SELECT * FROM OFFENDING_TABLE where id='735'
compare to:
SELECT * FROM OFFENDING_TABLE where id=735

eeek




msg:3897390
 11:10 pm on Apr 21, 2009 (gmt 0)

I don't think there is any limit on the newer linux filesystems concerning the number of files per directory.

There isn't a limit. But you'll find directory operations can (e.g. open, create, delete, etc.) can get very slow if there are a lot of files. The reiser filesystem can help, it uses a b-tree directory structure. With ext3 you usually have to do a linear search of all the directory pages to find a file (or to find out if it isn't there during a create operation). Note: ext3 can do b-tree directories, it's just not the default.

phranque




msg:3900317
 10:55 am on Apr 25, 2009 (gmt 0)

another thing to consider if you go with files is if you are using ftp.
i've seen significant performance problems with some ftp clients and large directories.
i'm sure you can find a working solution but something to consider if you have software preferences or limitations...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved