Forum Moderators: coopster

Message Too Old, No Replies

uploading an image and then processing it later

lots of image uploads to process similar to flickr

         

proper_bo

11:13 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



I have an upload form that allows up to 6 images of up to 5MB to be uploaded.
The script uploads them and then resizes the images (similar to flickr) and then inserts rows for them and their tags into the database.
The time taken can be upwards of 30 seconds depending on the images sizes etc.
I was wondering if a better way would be to upload the (up to) 6 original images, do the database stuff then process the resizing using a separate script afterwards (similar to flickr)
Does anyone has any ideas about how I should go about this?

My ideas:
Upload the original image and do the database stuff.
Stick a line in a 'to be processed' table.
The next bit is where I am not sure.
* Run a cron job every few seconds to grab a new line from the database and process the resizing?
* Run a cron job every second to grab a new line from the database and process the resizing?

I can't think of a good way to get a script to check for images that need resizing when sometimes there will be a queue waiting and sometimes there will be nothing waiting.

Ideas?

Thank you.

siMKin

11:55 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



before starting any kind of optimization i think it would be wise to investigate where the bottleneck is. If the upload itself takes 29.5 seconds and the processing of the data (once it has been uploaded) only 0.5 seconds i think you gain very little by optimizing that processing part.
To investigate this you would need to measure how long each of these steps take.

Why do you store the images in the database, btw? There are some exceptions in which this can be benificial but usually storing images in the database is not a good idea and storing only the reference to where they are located on the server is much better.

avant_garde

11:58 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



How heavy of a server load do you expect to have? That is definitely important. If it's not going to be that all that heavy, I think processing the images at upload time would be the best.

If you want to queue the images for later processing, I'd recommend storing the image files in their own directory and create a record for each in a database table that is solely for queued images. You'd have to store data such as the file path and the user id the image is associated with, etc.

Then, have cron call a script once or twice a day at less busy times. Have this script process the images, delete them from the queue directory and table, and store their data in the "real" image table and image file in the "real" image directory.

As always, just experiment some and see what works best for you.

proper_bo

7:54 am on Jun 30, 2006 (gmt 0)

10+ Year Member



siMKin - the images are not stored in the database (I don't think I made that very clear). The image information is stored in the database, the images are stored in a file system.

avant_garde - I agree about the server load. If I never expected it to become large I would do all the processing during upload. I fear though that it may get to the point where the load is high and the coding old, so I am planning for the future.

The reason I want to seperate the two processes is that I already know that the upload takes quite long if there are 6 large images BUT there is nothing that can be done about that. An upload is an upload and the user must sit and wait for it to complete.
The processing however does not have to be done while the user waits, it can be done in the background.

I thought about this alot last night and realised that flickr show the user a page during upload and then when upload is done they switch to a new page. The new page tells the user that they can navigate away because as soon as that page has loaded, a php script has already been called in the background that has begun processing the images. The script will have a 'no timeout' rule in it so that at busy times the script can take as long as it wants to complete.

So my plan is to do the uploads, stick a 'to process' line into a new table for each image that was uploaded. When upload completes the user is sent to a new page where a script is called that gets (and processes) all the images in the 'to process' queue for that user in the background and then deletes the lines from the table. If there is a problem processing then the line will remain and a cron job can pick the missed lines up every hour or so.

Thoughts?