Forum Moderators: coopster
I'm having some difficulties with my php crawler, and was hoping some of you might have some optimization solutions :)
- I have a php crawler which crawls about 30 webpages(each with about 100-200 subpages) in a single threaded environment. The crawler is started by a cronjob each night.
- The crawler retrieves the webpages using:
$page = implode('', file ($url));
And the content found on each page are then saved in a MySQL database as it is found.
The problem is that some nights this operation times out - and only half of the pages are crawled, and I have to restart the crawler the next morning in order to have all sites crawled.
So I was hoping some of you could help me figure out some way to optimize my crawler to it doesn't time out all the time.
Best Regards
It'll save you a lot of trouble and you'll have much more control.