Forum Moderators: coopster
I've got a massive amount of data to process in a MySQL database. I'd like to run it from the CLI so that it can run until complete (if it breaks then it can be restarted from where it got to).
My previous data manipulation has all been triggered by browsers as the scripts were just being tested and it was easier that way. I'm sure they are solid and now just need time to run.
Are there any problems that I may encounter by running a script that will take days to complete? I'm thinking of resource issues, what happens if I need to stop the script...
The DB is likely to reach around 5GB, with around 100 million selects/updates required to get there (DB maintenance is another issue, but let's assume that everything is OK there).
How do you now if the procedure is working, completes, or just hangs? Are you doing somekind of checkpoint logging?
I don't know exactly what it is you are doing but I'd recommend splitting the procedure up and using distinct steps and log after each section.
e.g. For one database update routine I have I do something like this:
initialise.php (if ok log initialised)
import.php (if ok log import_success)
index.php (if ok log indexes_created)
magic_numbers.php (if ok log magic_numbers_generated)
consistency_check.php (if ok log database_consistent)
logging can be a simple pipe out to a log.txt file.
That's a simplistic view but I really think you need to sort out why / if your routine should take 5 days to complete. Something is not right with that.
As for seeing if the process is working correctly, I can manually check the DB to see the correct type of data is being created. Thankfully it's not too difficult to verify that the process is working, it's just a lot of processing.
I'm interested to hear if people use PHP scripts that continuously run or if they just use cron jobs to trigger regular updates etc.
A few ways I've used long running scripts is:
1. Run from a command window and output the progress to the screen.
2. Run from batch files and run behind the scenes.
3. Store each script in a db and I have an executable come along and fire them up based on a time to start field in the db.
JAG