Forum Moderators: coopster

Message Too Old, No Replies

Developing Spiders in PHP

         

Dinesh

8:45 am on Jun 15, 2004 (gmt 0)

5+ Year Member



How do i develop a spider using the PHP scripts...
actually i need to develop a databaseby extracting the data from the election commission of india website......can anyone help me..? as soon as possible..

carneddau

9:26 am on Jun 15, 2004 (gmt 0)

10+ Year Member



Hi,

Have a look at the Snoopy php class. It simulates a browser and allows you to do many things a browser does through php. It has some very useful functions, one of which allows you to get all of the links from a selected url.

I've used it in the past to create a simple spider.

The basic process I used was:
grab index page -> store list of sub page urls from index page -> loop through the stored list of sub pages storing the page content from each.

Once this is done you can clean up the data you have collected and extract what you want from it without having to re-spider the site. Also, I used php's sleep() function to pause the spider between requests.

Bear in mind that I had permission to do this, it's not good practice to spider thousands of pages without permission.

Hope that helps.