Forum Moderators: coopster

Message Too Old, No Replies

Spider site to check for updates

How to write a spider to check a website for updates

         

topkas20

2:03 pm on Aug 31, 2007 (gmt 0)

10+ Year Member



I am looking to create a spider preferably in PHP (if this cannot be done in php then any other language) to check an entire website for updates. I want to be able to have something set up to check a site when an update is made. That is the mostly what I need, just so I know when and what files are updated.
I would also like to be able to then scrape the site and compare the page that has been changed to one that I define. I have a site that I monitor and a global site that is very slightly different. I can match the pages up to compare to one another as to be aware of a change on the global site to change on my site.
Thanks ahead of time.

vincevincevince

2:46 pm on Aug 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a start...
... implement If-modified-since if you can manage it
... check the Last-Modified: headers which you can get if you read the page using cURL
... calculate the md5() of the resulting pages and compare those