Forum Moderators: coopster
Have a look at the Snoopy php class. It simulates a browser and allows you to do many things a browser does through php. It has some very useful functions, one of which allows you to get all of the links from a selected url.
I've used it in the past to create a simple spider.
The basic process I used was:
grab index page -> store list of sub page urls from index page -> loop through the stored list of sub pages storing the page content from each.
Once this is done you can clean up the data you have collected and extract what you want from it without having to re-spider the site. Also, I used php's sleep() function to pause the spider between requests.
Bear in mind that I had permission to do this, it's not good practice to spider thousands of pages without permission.
Hope that helps.