|How do you Store Millions of Images on your Server?|
| 4:24 pm on Mar 9, 2011 (gmt 0)|
I want to be able to scale to millions of user profile pics on my Server.
I currently store all images in one folder, which is a big no-no, so I want to spread them out into many folders and sub-folders (e.g. aa/bb/ etc...).
What is the best and most efficient way of doing that, especially if I do not want to have to call the DB to get the filename/path for that user's profile pic?
I'm thinking of maybe doing a hash of the username and utilizing the first 4 letters of that hash to generate/locate the path for that user's profile pic, that way I wouldn't have to access anything additionally from the DB since I will always have the user's username. So, for example, if the first 4 characters of the user's username hash were "aabb", I would store that user's profile pic under aa/bb/username/profile.jpg , which should theoretically allow me to scale to millions of users without having to add anything to the DB, while spreading all the pics evenly throughout the aa/zz/ folder structure.
| 6:45 pm on Mar 9, 2011 (gmt 0)|
Every user has a unique ID, and also a unique nickname. You could use any of those to identify the images. If the ID is too long I'll go after the nickname after passing some filters on it to eliminate problematic characters or signs. I think it would also be helpful on the SEO side of naming images for profiles, as nicknames won't change.
I'm using that on a product database where every image has the name of the unique id of the product.
| 1:20 am on Mar 10, 2011 (gmt 0)|
Hi explorador, thanks for your input, but how would you organize the folder structure based on the username such that the millions of profile pics are evenly-spread without overloading any one folder?
Does anyone else have any other ideas/input?
| 1:45 am on Mar 11, 2011 (gmt 0)|
| 6:05 pm on Mar 15, 2011 (gmt 0)|
I think your hash idea is best actually. You're looking at 16^2 directories, each with 16^2 subdirs in each one. I think I read that when *nix systems perform certain directory/file ops, they have to read the whole dir into memory, so spreading them out makes sense.
I wonder though if simply using 16^3 directories (first three chars) would be enough. 4 million users would give you 4096 directories with 1000 files each, which seems more manageable to me, though you'd have to look around for any performance implications.
| 5:37 pm on Mar 17, 2011 (gmt 0)|
Probably something that could be answered or asked at High Scalability.
|brotherhood of LAN|
| 5:55 pm on Mar 17, 2011 (gmt 0)|
If it's your own server I'd consider having a separate partition, and choose a partition format that will best suit your need. I don't have any particular filesystem recommendations but there are enough differences between them to make it a proper consideration, e.g. a smaller block size as having lots of small files would be 'rounded up' to the block size of the filesystem.
| 5:10 pm on Mar 18, 2011 (gmt 0)|
|how would you organize the folder structure based on the username such that the millions of profile pics are evenly-spread without overloading any one folder? |
Hi, kinda late but, I would use alphabetic organization there.