Forum Moderators: phranque

Message Too Old, No Replies

Ethics of monitoring sites automatically

         

musicales

2:47 pm on Aug 27, 2004 (gmt 0)

10+ Year Member



I'm interested in setting up an automatic monitor that could potentially be used to keep an eye on any website and report changes. It would only pull in one page per monitor, but I'm wondering whether there is any concern about this - would it be reasonable to pull down a page for monitoring purposes every hour?
every 15 mins?
every minute?

Is it OK to do it at all without the permission of the webmaster? Or should I perhaps treat it like a bot and just obey robots.txt files?

Any input gratefully received.

Lord Majestic

2:53 pm on Aug 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



More frequent than once an hour might be treated as a hostile action unless they specifically requested that to happen (like those site downtime monitors). Obeying robots.txt and requesting page with If-Modified-Since header is also advisable.

Doing all this makes this action perfectly ethical in my view. Possible uses of data might not be ethical however, but this is a separate question that does not affect ethics of doing the very same thing as Google, Yahoo and every other search engine.

musicales

3:08 pm on Aug 27, 2004 (gmt 0)

10+ Year Member



Thanks. If I'm just storing a page to note changes does that constitute an 'unethical' use of data in your view?

Thanks for the If-Modified-Since tip that's a good idea.

Lord Majestic

3:13 pm on Aug 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If I'm just storing a page to note changes does that constitute an 'unethical' use of data in your view

In my view what major search engines such as Google do constitues ethical and legal from Fair Use point of view activity.

They crawl and store pages, check for updates regularly, publish them in form of "cached" copy and make lots of money in process and this is deemed perfectly acceptable by majority of people, including those who publish content on open web.

I therefore see no reason why anyone else should not be allowed same freedoms without being accused of doing anything unethical. If anyone has different point of view then I'd like to hear your arguments.

mincklerstraat

12:14 pm on Sep 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



FWIW, I've seen monitoring services I haven't asked for hitting pages as often as every five minutes. Don't know if they request the whole page or are just going for the headers. A problem with robots.txt is initially nobody's gonna know your robot's name, you might want to make it obey the robot.txt entries for similar robots like internetseer. Your enterprise will come off a lot better if you can really think of a very, very good reason for doing this - there are are many companies out there doing similar thing, some of the yikkiest just doing it to leave referrer entries.