A database will do just fine, files system and directory access will be much slower than a db call. I use mysql to search over a million records and I get a result in 0.4 seconds.
|files system and directory access will be much slower than a db call. |
I heard that it is much slower than a straight file call.
|I use mysql to search over a million records and I get a result in 0.4 seconds. |
You mean the images are actually stored in the the db?
You don't just store the path in the db?
It takes 0.4 seconds to search a million records and retrieve an image from a blob?
Does anyone else have thoughts on this? The best way to store 1000/10000 images.
Use the file system and keep the add a new folder any time there is a couple hundred images in the current folder. Save the path in a DB and you'll be set until you run out of drive space. Storing the image in a DB will get slower and slower the more you add in images. Accessing it via the file system will always remain linear if you folder it out right.
FWIW, I set my file count size to about 3500 and I have no performance issues on a mid-level server getting about 4.5million views a month, peaking at about 60/sec.
I second carguy84's suggestion. Store the images in files on a file system with enough inodes (the number of inodes can be specified when at file system creation) and keep the path to the file in the database.
Let's say the maximum number of users you are ever going to have is 100000. To be on the safe side let's assume 10 times that much. So you need a directory scheme that holds a million files. It is not a good idea to have more than 10000 files in one directory (OTOH, this really depends on the file system). I think one thousand files per directory is a good size for every modern file system. That means you need a two-level hierarchy. Your image directory should contain 1000 subdirectories and each of those should contain 1000 image files. 1000 times 1000 is 1 million.
I would name the subdirectories from 000 to 999 and the files 000000.jpeg to 999999.jpeg. So the files 000000.jpeg to 000999.jpeg go into the 000 folder. The files 001000.jpeg to 001999.jpeg go into the 001 folder.
You don't need to store the entire path to the file in the database. The above 6 digits and maybe the extension are sufficient. Everything else is implicit. If your users already have some kind of unique integer id, you could use that integer to determine the name of the image. That way you don't need store the name of the image file at all. But: If user are deleted, their id is usually never assigned again which means that the spot for the image file can never be reused.
This scheme will scale very well. It's also important to keep the images on a separate harddisk, otherwise the disk seeks will slow everything down dramatically.
carguy84 and Hanu,
Thank you both for your input. I was already planning on using a system similar to what you have both suggested, I just wasn't sure about the folder structure.
Users will have unique id's so I should be able to create filenames based on them and maybe a date or time. The folder structure will probably be very similar to what you suggested.
Again, thank you.
Like others have said, use a database to store locations and filesystem for the files themselves, and use subdirectories.
One addition to this that I use is that I also store a machine address in the database. I have a cluster of machines, and every file I want to store goes to 2 machines in the cluster. I can set availability of these machines so I can bring them down for maintenance without causing downtime for users. This also has the advantage of making the system very scalable - I am currently running about 20 TB of physical storage, or about 10 TB after redundancy.
I store ~20,000 images for my main site.
To agree with some points above:
1) Store the data (ie the images) in a filesystem: in my experience trying to handle the data in the DB may be 1000x slower (and a lot harder to back up)!
2) Use a proper tree-based directory structure, partly to help keep your sanity intact, but also because large flat directories may well prove much less efficient for one reason or another.
3) Do keep the metadata (eg file paths, image dimensions, etc) in a DB, preferably entirely in memory if performance becomes an issue. (I use a hand-crafted DB in Java, but the point remains valid.)
DamonHD you mention using a proper structure for the folders. Could you give an example or examples of tree structures you have used or would suggest using?
I was just going to use folders named 00001 etc and place 1000 images in each named using the username and the number of images they had added.
An alternative would be to use a folder for each username but this could mean lots of folders with less than 10 images in.
Another alternative would be to use several folders deep something like /year/month/day/001/imagefilename.jpg
Any thoughts on a system for this?
Something based on date and username is clearly easy to program and is better than nothing, though you may be able to do better.
Try also to make it easy for the SEs to parse your directory/file names/structure if you want hits from organic search or intend to use AdSense or similar, since the SEs may be able to gain clues from your directory and file names/structure.
If related stuff is close in the hierarchy that may help, so, probably:
is better than:
unless your stuff is strongly time-related, and in any case better than:
which does not organise stuff well by time or anything else.
In my case the stuff is sorted first by a major category, eg "baby" or "food" or "places" and then structured by year and month directories, and then with long, unique, meaninful-names-that-SEs-and-humans-can-read.jpg.