Forum Moderators: not2easy

Message Too Old, No Replies

Page Comparison software?

Is it available, is it possible?

         

KMxRetro

10:03 am on Aug 10, 2002 (gmt 0)

10+ Year Member



Hi there,
I run a videogaming website that contains release dates for UK console games and its getting harder and harder to keep track of release date changes.

There are a couple of good web-based resources that contain the information I need, but when they update, there is no way of telling WHICH dates have updated.

What I need (and I could swear that I've used before!) is a piece of software that monitors the URL - say, checks it every 24 hours - and tells me which parts of the data have changed.

If I could define the sections that I want monitored, that would be handy, as I don't want to be notified when a banner rotates or when new news is posted.

Does anyone have any ideas? (Oh, and free would be best as I'm broke :))

RBuzz

1:33 pm on Aug 10, 2002 (gmt 0)

10+ Year Member



KMxRetro,

It's not free but I really like WebSite Watcher. You can specify which parts of the page you want monitored, and the changed places are highlighted in yellow.

KMxRetro

4:08 pm on Aug 10, 2002 (gmt 0)

10+ Year Member



Thanks Rbuzz, I'll give it a go. I love 30-day trials :-) $29 isn't too bad.

martin

12:16 am on Aug 11, 2002 (gmt 0)

10+ Year Member



If you know PHP or Python or any other scripting language it can be done in 10 minutes.

The benefits for you:
- You don't have to waste money on something that you're not sure if and how works.
- You learn something by doing this.
- You can customize it when the need occurs.

KMxRetro

12:35 am on Aug 11, 2002 (gmt 0)

10+ Year Member



You make a good point there.

I know PHP, I think I'll give it a go.

If I can, I'll make it easily configurable and release it for all. :)

Jack_Straw

4:31 am on Aug 11, 2002 (gmt 0)

10+ Year Member



I understand your motivations KMXRetro. It makes sense what you are looking for.

But you should be aware that there are issues with this....

There is a whole group of webmasters who communicate here who are trying to limit crawls by "nusiance" spiders and such.

A lot of people doing things similar to what you suggest can become a kind of denial of service attack on web sites. That is why, for example, Google goes to such great lengths to stop automatic ranking checkers.

I would suggest the following:

1. Clearly identify your spider in the user agent field. Please include your email address, and, if possible, the address of a web site that explains your purpose.

2. Read, parse and obey robots.txt.

3. If you read multiple pages on a site, do so slowly, with at least 30 or 40 seconds between each request, so you don't overwhelm their resources and spread out your load.

4. Don't, under any circumstances, try to be sneaky and fake your user agent to try to make your spideer look like a browser. We, and others, look for these kind of spiders and ban them from our sites. The harm you cause by attempting this is not just that we are paranoid or something. For example, we charge some of our clients on a per visitor basis. If you don't identify yourself, then it inflates the visitor rate unfairly for our clients and reduces our conversion rates.

If you do all this then webmasters will view your bot as "friendly", accomodate and welcome it.

KMxRetro

9:13 am on Aug 11, 2002 (gmt 0)

10+ Year Member



No offence Jack, but it isn't going to be anywhere near as complicated as that.

The script will download one page from a site (specified in the script) and store a copy of it on my server (non-viewable by visitors to my site).

A week later, it downloads another copy and compares it to the original, highlighting any changes.

There is no spidering involves, no DoS risk. It'll just look like I've browsed there and read the page.