Welcome to WebmasterWorld Guest from 188.8.131.52
I do understand the technical difficulties involved in identifying the crawler.
More interested in the law side than the possible other issues it may arise.
[edited by: Habtom at 9:56 am (utc) on April 24, 2008]
The robots.txt RFC [robotstxt.org] says in point 2:
[...] The technique specified in this memo allows Web site administrators to indicate to visiting robots which parts of the site should be avoided. It is solely up to the visiting robot to consult this information and act accordingly. Blocking parts of the Web site regardless of a robot's compliance with this method are outside the scope of this memo. [...]
IIRC the RFCs are the de facto 'laws' of the internet defining protocols and stuff like that. And as long as it says 'It is solely up to the visiting robot to consult this information and act accordingly' you can't sue anybody for ignoring robots.txt.
Or maybe Australia, Belize, Canada, Denmark, Ethiopia, Finland, Guyana, Honduras, India, Jamaica, Kuwait, Latvia or Maldives, as examples from the first half of the alphabet. In Hatbom's case, the United Arab Emirates.
Now admittedly, English is not the primary language of a number of countries I mentioned, but when the government speaks in English, they do say "honour" or "colour" or "neighbour" or...
Has anyone even tried suing Google, MSN or Yahoo when their 'bots have disobeyed robots.txt?
It is solely up to the visiting robot to consult this information and act accordingly.
It is optional then. It probably could have been better for the site owner to set restrictions to his/her own site using whatever method which should be respected by everybody else.
balam, I have lived in the UAE for the last 3 years now hopefully this being my last.