homepage Welcome to WebmasterWorld Guest from 54.221.175.46
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
help storing 1000's of images in a file system
what system to use to ensure organised storage
proper_bo




msg:659971
 11:12 pm on May 2, 2006 (gmt 0)

I am going to be storing user images as part of a new site. The number will be in the 1000's and may eventually reach 5 or even 6 figures.

I have read a lot about it and know that database storage would be slow with a large number of images so I am going with a file system approach.

What is the best system for organing lots of images?
I will have usernames and I could store some information in a database.

 

legallyBlind




msg:659972
 10:44 pm on May 10, 2006 (gmt 0)

A database will do just fine, files system and directory access will be much slower than a db call. I use mysql to search over a million records and I get a result in 0.4 seconds.

proper_bo




msg:659973
 12:01 am on May 12, 2006 (gmt 0)

files system and directory access will be much slower than a db call.

I heard that it is much slower than a straight file call.
Opinions?

I use mysql to search over a million records and I get a result in 0.4 seconds.

You mean the images are actually stored in the the db?
You don't just store the path in the db?
It takes 0.4 seconds to search a million records and retrieve an image from a blob?

Does anyone else have thoughts on this? The best way to store 1000/10000 images.

carguy84




msg:659974
 6:02 am on May 12, 2006 (gmt 0)

Use the file system and keep the add a new folder any time there is a couple hundred images in the current folder. Save the path in a DB and you'll be set until you run out of drive space. Storing the image in a DB will get slower and slower the more you add in images. Accessing it via the file system will always remain linear if you folder it out right.

FWIW, I set my file count size to about 3500 and I have no performance issues on a mid-level server getting about 4.5million views a month, peaking at about 60/sec.

Chip-

Hanu




msg:659975
 6:56 am on May 12, 2006 (gmt 0)

I second carguy84's suggestion. Store the images in files on a file system with enough inodes (the number of inodes can be specified when at file system creation) and keep the path to the file in the database.

Let's say the maximum number of users you are ever going to have is 100000. To be on the safe side let's assume 10 times that much. So you need a directory scheme that holds a million files. It is not a good idea to have more than 10000 files in one directory (OTOH, this really depends on the file system). I think one thousand files per directory is a good size for every modern file system. That means you need a two-level hierarchy. Your image directory should contain 1000 subdirectories and each of those should contain 1000 image files. 1000 times 1000 is 1 million.

I would name the subdirectories from 000 to 999 and the files 000000.jpeg to 999999.jpeg. So the files 000000.jpeg to 000999.jpeg go into the 000 folder. The files 001000.jpeg to 001999.jpeg go into the 001 folder.

You don't need to store the entire path to the file in the database. The above 6 digits and maybe the extension are sufficient. Everything else is implicit. If your users already have some kind of unique integer id, you could use that integer to determine the name of the image. That way you don't need store the name of the image file at all. But: If user are deleted, their id is usually never assigned again which means that the spot for the image file can never be reused.

This scheme will scale very well. It's also important to keep the images on a separate harddisk, otherwise the disk seeks will slow everything down dramatically.

proper_bo




msg:659976
 9:52 am on May 15, 2006 (gmt 0)

carguy84 and Hanu,

Thank you both for your input. I was already planning on using a system similar to what you have both suggested, I just wasn't sure about the folder structure.

Users will have unique id's so I should be able to create filenames based on them and maybe a date or time. The folder structure will probably be very similar to what you suggested.

Again, thank you.

sja65




msg:659977
 3:14 pm on May 17, 2006 (gmt 0)

Like others have said, use a database to store locations and filesystem for the files themselves, and use subdirectories.

One addition to this that I use is that I also store a machine address in the database. I have a cluster of machines, and every file I want to store goes to 2 machines in the cluster. I can set availability of these machines so I can bring them down for maintenance without causing downtime for users. This also has the advantage of making the system very scalable - I am currently running about 20 TB of physical storage, or about 10 TB after redundancy.

DamonHD




msg:659978
 3:35 pm on May 17, 2006 (gmt 0)

Hi,

I store ~20,000 images for my main site.

To agree with some points above:

1) Store the data (ie the images) in a filesystem: in my experience trying to handle the data in the DB may be 1000x slower (and a lot harder to back up)!

2) Use a proper tree-based directory structure, partly to help keep your sanity intact, but also because large flat directories may well prove much less efficient for one reason or another.

3) Do keep the metadata (eg file paths, image dimensions, etc) in a DB, preferably entirely in memory if performance becomes an issue. (I use a hand-crafted DB in Java, but the point remains valid.)

Rgds

Damon

proper_bo




msg:659979
 9:57 pm on May 18, 2006 (gmt 0)

DamonHD you mention using a proper structure for the folders. Could you give an example or examples of tree structures you have used or would suggest using?

I was just going to use folders named 00001 etc and place 1000 images in each named using the username and the number of images they had added.

An alternative would be to use a folder for each username but this could mean lots of folders with less than 10 images in.

Another alternative would be to use several folders deep something like /year/month/day/001/imagefilename.jpg

Any thoughts on a system for this?

DamonHD




msg:659980
 8:23 am on May 19, 2006 (gmt 0)

Hi,

Something based on date and username is clearly easy to program and is better than nothing, though you may be able to do better.

Try also to make it easy for the SEs to parse your directory/file names/structure if you want hits from organic search or intend to use AdSense or similar, since the SEs may be able to gain clues from your directory and file names/structure.

If related stuff is close in the hierarchy that may help, so, probably:

.../username/yyyy/mm/dd/img.jpg

is better than:

.../yyyy/mm/dd/username/img.jpg

unless your stuff is strongly time-related, and in any case better than:

.../mm/dd/yyyy/username/img.jpg

which does not organise stuff well by time or anything else.

In my case the stuff is sorted first by a major category, eg "baby" or "food" or "places" and then structured by year and month directories, and then with long, unique, meaninful-names-that-SEs-and-humans-can-read.jpg.

Rgds

Damon

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved