Extracting new content from a website

Forum Moderators: coopster

Message Too Old, No Replies

Extracting new content from a website

turbohost

11:37 am on Nov 5, 2003 (gmt 0)

Hi,

I'm looking for a script that :
- spiders a few pages for new links. The script has to follow these links 2 levels deep.
- saves some of the content of these pages (I need just a few chunks of text which can be easily parsed) into some format
- compares this file with an existing mysql database.

I think php is the best language to write this script, but I was wondering if there are scripts which are already doing this? If not, can anyone help me develop this?

Turbohost

mogwai

12:18 pm on Nov 5, 2003 (gmt 0)

Hi,

I've not seen anything available that will do this, however the Snoopy php class [snoopy.sourceforge.net...] would be a good place to start this project.

It simulates a web browser and has a method for fetching links.

Hope this helps