Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: phranque
I have read a lot about it and know that database storage would be slow with a large number of images so I am going with a file system approach.
What is the best system for organing lots of images?
I will have usernames and I could store some information in a database.
files system and directory access will be much slower than a db call.
I use mysql to search over a million records and I get a result in 0.4 seconds.
Does anyone else have thoughts on this? The best way to store 1000/10000 images.
FWIW, I set my file count size to about 3500 and I have no performance issues on a mid-level server getting about 4.5million views a month, peaking at about 60/sec.
Let's say the maximum number of users you are ever going to have is 100000. To be on the safe side let's assume 10 times that much. So you need a directory scheme that holds a million files. It is not a good idea to have more than 10000 files in one directory (OTOH, this really depends on the file system). I think one thousand files per directory is a good size for every modern file system. That means you need a two-level hierarchy. Your image directory should contain 1000 subdirectories and each of those should contain 1000 image files. 1000 times 1000 is 1 million.
I would name the subdirectories from 000 to 999 and the files 000000.jpeg to 999999.jpeg. So the files 000000.jpeg to 000999.jpeg go into the 000 folder. The files 001000.jpeg to 001999.jpeg go into the 001 folder.
You don't need to store the entire path to the file in the database. The above 6 digits and maybe the extension are sufficient. Everything else is implicit. If your users already have some kind of unique integer id, you could use that integer to determine the name of the image. That way you don't need store the name of the image file at all. But: If user are deleted, their id is usually never assigned again which means that the spot for the image file can never be reused.
This scheme will scale very well. It's also important to keep the images on a separate harddisk, otherwise the disk seeks will slow everything down dramatically.
Thank you both for your input. I was already planning on using a system similar to what you have both suggested, I just wasn't sure about the folder structure.
Users will have unique id's so I should be able to create filenames based on them and maybe a date or time. The folder structure will probably be very similar to what you suggested.
Again, thank you.
One addition to this that I use is that I also store a machine address in the database. I have a cluster of machines, and every file I want to store goes to 2 machines in the cluster. I can set availability of these machines so I can bring them down for maintenance without causing downtime for users. This also has the advantage of making the system very scalable - I am currently running about 20 TB of physical storage, or about 10 TB after redundancy.
I store ~20,000 images for my main site.
To agree with some points above:
1) Store the data (ie the images) in a filesystem: in my experience trying to handle the data in the DB may be 1000x slower (and a lot harder to back up)!
2) Use a proper tree-based directory structure, partly to help keep your sanity intact, but also because large flat directories may well prove much less efficient for one reason or another.
3) Do keep the metadata (eg file paths, image dimensions, etc) in a DB, preferably entirely in memory if performance becomes an issue. (I use a hand-crafted DB in Java, but the point remains valid.)
I was just going to use folders named 00001 etc and place 1000 images in each named using the username and the number of images they had added.
An alternative would be to use a folder for each username but this could mean lots of folders with less than 10 images in.
Another alternative would be to use several folders deep something like /year/month/day/001/imagefilename.jpg
Any thoughts on a system for this?
Something based on date and username is clearly easy to program and is better than nothing, though you may be able to do better.
Try also to make it easy for the SEs to parse your directory/file names/structure if you want hits from organic search or intend to use AdSense or similar, since the SEs may be able to gain clues from your directory and file names/structure.
If related stuff is close in the hierarchy that may help, so, probably:
is better than:
unless your stuff is strongly time-related, and in any case better than:
which does not organise stuff well by time or anything else.
In my case the stuff is sorted first by a major category, eg "baby" or "food" or "places" and then structured by year and month directories, and then with long, unique, meaninful-names-that-SEs-and-humans-can-read.jpg.