Forum Moderators: coopster

Message Too Old, No Replies

Progress meter

xml feed and subsequent mysql insert

         

martymac

3:06 pm on Oct 19, 2009 (gmt 0)

10+ Year Member



I have a script that queries an API which returns an XML feed (using SimpleXML). One of two responses can happen.

1. API returns all data in XML feed -> data is inserted to DB
2. API queues larger responses and builds .xml.gz data file which is then FTPed to my server, opened, and inserted into DB.

Im trying to build a progress meter into the script for scenario 2, the larger uploads. I cant seem to wrap my mind around how to design such a progress bar. Is there a way to...count...the number of records that have been inserted thus far and return it to the browser without interrupting the reading of the .xml.gz file? Or maybe I could launch a new window that could display the location of the file-pointer as the script reads to the end of the .xml.gz file?

This is a bit above my level of knowledge. Anyone have any ideas they can give me? Thanks!

rocknbil

5:24 pm on Oct 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HTTP requests are "stateless," meaning you make a request, the server spawns a new process to manage the request, the server responds, and the process dies. As soon as you return a response, the process is gone. If the process takes too long, it times out and doesn't return a response to the browser - but, as in your case, the insertion of a lot of records, it continues to run.

The way I do these is to

- submit or start time consuming process to your script

- spawn a child process using fork() (PHP equivalent: pcntl_fork() [us.php.net].) In the child process, you do your time consuming task.

- The parent process immediately returns a response, and captures the process id of the spawned child. In that response is an iFrame. The iFrame calls a second script that merely returns a count of inserted records. You set the http refresh of the frame's content to every five seconds - so the iFrame calls the script every five seconds and returns "x records inserted." You are also passing the child process id, like

<meta http-equiv="refresh" content="5; url=yourscript.php?pid=12345">

- Also in this counter script, it checks for the process id spawned by the initial child. If the process id has died, you can stop refreshing the iFrame and display "process done." The iFrame can also contain a link to kill the process if you want to abort.

Obviously none of these are static pages, the iFrame itself would be dynamically created output, so if the process is still alive, print the http refresh meta, if not, don't print it.

Seems to work out well for large intensive processes and prevents them from timing out in the browser.

martymac

4:11 pm on Oct 23, 2009 (gmt 0)

10+ Year Member



Im not sure I quite understand...

So say the script that pulls the XML and inserts records into the DB is called xmlpull.php and the second script that returns the count of inserted records is called count.php. Which is the parent and which is the child?

rocknbil

5:18 pm on Oct 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, parent and child are within one script, see the link above and play with it. The "counter" is an external script, but it must "know" the value of the child's process ID spawned by the originating script.

Briefly, something like this.


if (isset($_POST['do_big_function']) {
$pid = pcntl_fork();
if ($pid) {
// Parent, HERE you output a page "performing update"
// This page contains an IFRAME calling script #2
//$pid is the CHILD id you are passing to the second
// script, something like
// <iframe src="counter-script.php?p=$pid
pcntl_wait($status); //Protect against Zombie children
}
else {
// CHILD - HERE you are performing your
// time - intensive update
}
}
else {
// on first load, output form - maybe just a
// link to start process, or maybe you can
// upload or select to/from data here
}

Then in counter-script.php,


$pid = $_GET['p'];
if ($pid > 0) {
if (isset($_GET['kill']) and ($_GET['kill']==$pid)) {
// if you wish to abort this process and kill is requested
// execute the kill command via system(), see docs
// output that the process has been killed
// do NOT output the meta-refresh header
// if kill has been requested, you might also
// want to execute "delete * from table" to cleanup
}
else {
// output this page with the meta-refresh header
// that requests counter-script.php?p=$pid
// select count from insert table, display it
// Output a "kill" link, like <a href="counter-script.php?kill=$pid">
}
}
else {
// output "no process found, are you calling this
// directly? load the first script first"
}

The above is not working code, for logic only.

So, in effect, you are initiating a time-intensive process in the child process, which "runs in the background," then with the parent immediately returning a response and calling an external script to check up to see if the child(ren) is/are behaving themselves. :-)