Forum Moderators: coopster
This take a lot of time and I really have no idea how I can simutaneously fetch, say 4 URLs at a same time? Just like our browser can fetch a few picture at the same time.
Is that multitasking? How can I do it with PHP? I realy have no idea.
I'm not sure if PHP is capable of multi-tasking, so the following is just a possibility. I don't know what you're trying to get out of the URLs that you're requesting, but this should give you an idea.
First, create one small script which fetches a specified web page, and then writes the contents to a specified file.
Next, create your main script. This script, instead of fetching the web pages itself, calls the first script with a few parameters (the URL to request and the file to write the contents to). Set it up through a loop so it calls the first script for each URL, and you'll be able to call the script again without waiting for it to finish, therefore having multiple instances of the first script running concurrently.
Once the second script has made it through the loop, it sits and waits until all of the files that were supposed to be created exist, and then it goes through each file and pulls what it needs. Your script will be able to access the files in series from the local file system much faster than it can in series from a remote server.
You understand excatly what I need.
Are you suggesting me to do this?
for($i = 0; $i < sizeof($url); $i++)
{
FetchURL($url[$i]);
}
If this is the case, then the loop will only be continue after the FetchURL function has been executed completely. Thus, it is still fetch a URL a time.
What I need to know is whether it is possible to fetch a few URLs and write the the hard disk concunrently, just like program FlashGet and GetRight, but in PHP.
[php.net...]
and read [php.net...]
Process Control support in PHP is not enabled by default. You have to compile the CGI or CLI version of PHP with --enable-pcntl configuration option when compiling PHP to enable Process Control support.Note: Currently, this module will not function on non-Unix platforms (Windows).
So assuming you can meet the prerequisites, then read the user comments they should set you on the right tracks.
I've had more than 10 "browsers" crawling simulataniously, on my shared hosting account, with no problems. (you can use 4 frameset's too in stead of multiple browser windows , it's the same principle)
assuming you can use mysql,
Make a table with the urls, when you "crawl" a url,just delete that url from the table, so the next "browser" get's the next url.
hope that makes some sense.
Close.. I'm suggesting this:
for($i = 0; $i < sizeof($url); $i++)
{
exec("./fetchurl.php $url[$i] $i.html &");
}
The way I'm passing the parameters is probably incorrect, because we're not calling a URL, we're doing it on the command-line (this should be identical to using backticks). The idea here is that we execute the script telling it to request the URL $url[$i], write the results to $i.html, and the & tells it to run the process in the background so the foreground becomes available again and you're for() loop should continue without having to wait. This is how & works on the command-line.
I don't know if it works this way with exec().
This also requires the command-line version of PHP be installed.