homepage Welcome to WebmasterWorld Guest from 54.237.213.31
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

This 51 message thread spans 2 pages: 51 ( [1] 2 > >     
Colorado Woman Sues To Hold Web Crawlers To Contracts
Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 6:21 pm on Mar 17, 2007 (gmt 0)

[informationweek.com...]

Shell's site states, "IF YOU COPY OR DISTRIBUTE ANYTHING ON THIS WEB SITE, YOU ARE ENTERING INTO A CONTRACT," at the bottom of the main page, and refers readers to a more detailed copyright notice and agreement. Her suit asserts that the Internet Archive's programmatic visitation of her site constitutes acceptance of her terms, despite the obvious inability of a Web crawler to understand those terms and the absence of a robots.txt file to warn crawlers away.

A court ruling last month granted the Internet Archive's motion to dismiss the charges, except for the breach of contract claim.

This could be really interesting if she has the legal resources to take on the case. I have always felt that this issue with archive.org was a very importan legal precedent waiting to be set.

 

cameraman

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3284897 posted 6:28 pm on Mar 17, 2007 (gmt 0)

It's not entirely reasonable to expect lawmakers to understand technology so it could go either way, but I think she'd have a much stronger case if she'd had a robots.txt.

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 6:36 pm on Mar 17, 2007 (gmt 0)

What I find most interesting about this case, is that the judge felt there was enough there to merit further look.

robots.txt has never been adopted by any standards organization on the web. It has never been allowed as an argument in any case involving these issues. It is not universally accepted or interpreted by any of the major crawlers on the web today. The search engines have extended, modified, and corrupted the proposed standard to fit their own needs. Robots.tx it is antiquated and obsolete - it is time to bury it. Even googles robots.txt [google.com] will not pass a validation test to the robots.txt proposed standard. I would be surprised if the court remote allowed the issue to be admissable as evidence in this case.

trillianjedi

WebmasterWorld Senior Member trillianjedi us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 6:37 pm on Mar 17, 2007 (gmt 0)

but I think she'd have a much stronger case if she'd had a robots.txt

Good point. But to my mind that would only have a bearing in mitigation of loss (or the claimants lack of it), not the fundamental issue of whether or not a contract existed or whether copyright had or hadn't been breached.

TJ

kaz

10+ Year Member



 
Msg#: 3284897 posted 6:44 pm on Mar 17, 2007 (gmt 0)

Interesting scenario, the contract is for $5000 per printed page of content copied. I'd quote it here, but don't want to be liable for copying text from their contract. If you google the persons name, go to the first result and view contract at bottom you can see yourself.

Google has about 180 pages indexed, and i'm not going to break it down to printed pages ... but that works out to ... roughly a million dollars. lol ... off to add my own contract to my sites.

inbound

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3284897 posted 6:44 pm on Mar 17, 2007 (gmt 0)

Alexa/Amazon's case may be weakened by the fact they are offering paid services that use the archive information. If I was the one suing I'd be making as much of this fact as possible.

Whether the absence of robots.txt is deemed important will be very interesting. I'd not be surprised if the lawyers concentrate on copyright and the need to seek explicit permission to use anything more than 'fair use'.

Robots.txt has a different function than asserting copyright rights, any lawyer worth their salt will be able to argue that. The intended use of robots.txt is to show what can be crawled for fair use, not what can be crawled, stored and sold on through web services...

ggrot

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3284897 posted 7:17 pm on Mar 17, 2007 (gmt 0)

Isn't viewing the page "copying" it. A digital copy now exists in RAM, on disk (if you have history enabled), on the screen, in your ISP's cache, etc. I could easily be wrong, but it seems like this was obviously baiting Archive.org for publicity/lawsuit, whereas they could sue every ISP that accessed that page.

ytswy

10+ Year Member



 
Msg#: 3284897 posted 7:26 pm on Mar 17, 2007 (gmt 0)

Robots.tx it is antiquated and obsolete - it is time to bury it. Even googles robots.txt will not pass a validation test to the robots.txt proposed standard. I would be surprised if the court remote allowed the issue to be admissable as evidence in this case.

But what is the alternative? robots.txt is a machine readable statement of what content the site owner allows crawlers to access, and what the crawler is allowed to do with the content that it can access.

If robots.txt or something like it isn't the solution, then we have to fall back on a default deny policy. I would argue that this would kill some of the most useful features of the internet, and I find it hard to see what benefits society at large would gain in exchange.

At the end of the day intellectual property is not an absolute right. It is a bargain between the content creator/owner and the society they live in. The owner gains the right to use the legal system to enforce a limited monopoly, everyone else gains the content that would not have been created if authors could not protect and monetise their works.

Compared to the current status quo, what benefits does society gain by increasing the powers of content owners in this way?

cameraman

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3284897 posted 8:01 pm on Mar 17, 2007 (gmt 0)

robots.txt has never been adopted...evidence in this case.

I understand what you're saying, but I would think that the point would be that she didn't take any steps whatsoever to protect her works from automated visits that she should have, by any stretch of my imagination, known would occur.

The day I decide I'm just completely fed up with search engine x's inexplicable antics, I could .htaccess them right off my site, even turn away visitors 'refered' by them. I could make the whole thing a completely bot-free zone by requiring human interaction at the top.

Members-only pages and sign-ins aside, a web site is a public place. If I leave my MP3 player sitting on a bench in Grand Central Station while I go for my cup of designer coffee, do I really have any recourse if someone comes along and whisks it away? If I say I'm from Podunk where stuff like that just doesn't happen does it make a difference? I understand that intellectual property is different than physical, but still, without reasonable precautions...

I really don't know so, like you, I'll be interested in seeing the result. If she hires the same lawyers that MPAA and RIAA use, she should by all rights get her million. Otherwise, I don't really see it happening.

blaze

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3284897 posted 8:34 pm on Mar 17, 2007 (gmt 0)

[blog.ericgoldman.org...]

I think the judge is pretty sure what he's going to conclude, but felt he couldn't because there were certain facts (eg: a human had not read the contract) that had not been proven.

"While Internet Archive may be correct that the absence of human consent to this contract dooms Shell’s claims, Shell has not had the opportunity to develop a factual record on this point. Shell has alleged the existence of a contract, breach and damages, which is sufficient to make out a claim for breach of contract."

The question remains, I suppose, do IA have to prove over and over and over again that it didn't read the website? That seems a tad silly..

I guess the judge isn't tech savvy enough to realise this.

Webwork

WebmasterWorld Administrator webwork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 12:12 am on Mar 18, 2007 (gmt 0)

I guess the judge isn't tech savvy enough to realise this.

Perhaps webfolk aren't savvy enough to realize that a judge does not dismiss a case "on the pleadings" except in very unique circumstances. She has plead a cause of action, setting forth sufficient factual allegations, to sustain a challenge at this stage. The judge is allowing her a bit of leeway, to further develop a factual record, upon which he will rule.

A judge, again in the right case, can craft an order allowing for a time limited period of "discovery" (search for admissible evidence) - during which time the parties have to answer questions and produce documents - which discovery may also be limited as to what is to be explored, all to serve the purpose of building a record upon which a ruling can be crafted that will stand up to an appeal.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3284897 posted 12:58 am on Mar 18, 2007 (gmt 0)

Hey, we found some links pointing to your site from other website(ZZ) on the internet that suggest that your web space on the net describes the following subject:BLAH BLAH BLAH:.
If you dont mind we would like to link to your site from ours and perhaps use some of the intelectual property as it descibed per you nobots.txt document of yours...

hhhmmmmmm. the day i will get that email from a Major Content delivery Org, i ... to .APE my entire network.

farmboy

WebmasterWorld Senior Member farmboy us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3284897 posted 4:48 am on Mar 18, 2007 (gmt 0)

but that works out to ... roughly a million dollars. lol ... off to add my own contract to my sites.

I had a conversation a while back with a man who owns a number of large sites. He had an attorney who specializes in copyright/trademark law write him "copyright" text for his sites that basically established a monetary value for each page.

He claimed he pursued legal action against people who copied some of his pages and won a substantial amount of money.

I have no idea if what he was telling me was truthful, but if it is, there is a goldmine just waiting to be harvested for an attorney who develops a system to set this up for webmasters and then litigates as necessary.

I'd sign up tomorrow.

rohitj

5+ Year Member



 
Msg#: 3284897 posted 5:22 am on Mar 18, 2007 (gmt 0)

People pay all the time when they've been caught infringing on copyright s, whether its the RIAA or a NYTIMES article. The issue here is a bit different because a robot may not be programmed to realize when it may be infringing upon copyrights -- or entering a contract.

To some extent this has already been addressed with Google cache lawsuits.

born2drv

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3284897 posted 6:56 am on Mar 18, 2007 (gmt 0)

So let me understand this.... programmer writes a robot to copy content... robot is programmed to copy material... programmer knows that his robot will not be able to sufficiently tell the difference what is copyright infringement and what is not but sets it loose to copy content anyway.

And this is legal and he is not liable for damages his robot causes?

That's like me buying a robot security gaurd and I program it to shoot at anyone that assualts me. Then someone taps me on the shoulder to tell me I dropped my wallet and bam he's got a bullet in his head. Who's gets charged with murder, me or the robot?

Is this going to be some new legal defense like insanity? "It wasn't me your honor, it was the robot."

gibbergibber

10+ Year Member



 
Msg#: 3284897 posted 9:28 am on Mar 18, 2007 (gmt 0)

--It's not entirely reasonable to expect lawmakers to understand technology so it could go either way, but I think she'd have a much stronger case if she'd had a robots.txt. --

This is the problem with these cases, they all ignore robots.txt.

Robots.txt is a well-established way of saying that you don't want your content archived or indexed, and almost all professional webmasters know about it.

If she genuinely didn't want her site archived, why the heck didn't she use robots.txt?

She can still have her site removed if she puts up a robots.txt file right now, Archive.org removes any sites which do so even if they've already been archived.

It's like inviting people into your home and then arresting them for trespassing.

If she'd tried to use robots.txt and it had failed to work, then she would have a very strong case, but she didn't.

--programmer knows that his robot will not be able to sufficiently tell the difference what is copyright infringement and what is not but sets it loose to copy content anyway.--

No. The robot IS able to tell the difference between infringement and non-infringement, that's what the robots.txt system is there for.

If you don't want your site archived or indexed, put up a robots.txt file saying you want to exclude some or all robots. Every professional webmaster knows this.

If robots.txt fails to stop the archiving, then obviously the archiving is against your will, but if you fail to even try then it's questionable whether you really want to stop the archiving.

And in the vast majority of cases, Archive.org does remove pages which have been excluded by robots.txt.

Webwork

WebmasterWorld Administrator webwork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 2:23 pm on Mar 18, 2007 (gmt 0)

Robots.txt is a well-established way of saying that you don't want your content archived or indexed, and almost all professional webmasters know about it.

Perhaps soon there will be lawmaking bodies writing laws that dictate a standard that says "You may only send a non-human agent to a website if that website embeds an invitation"? Might work.

Robots.archive? Your agent may visit and archive our content but only if you do not derive a direct or indirect profit from the archived material? Okay, that knocks out Google.

Robots.search? Your agent may visit and archive out content but only if you . . .

The laws will come.

farmboy

WebmasterWorld Senior Member farmboy us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3284897 posted 2:46 pm on Mar 18, 2007 (gmt 0)

People pay all the time when they've been caught infringing on copyright s, whether its the RIAA or a NYTIMES article. The issue here is a bit different because a robot may not be programmed to...

The issue here is different in another way also, as I mentioned previously.

This woman is actually establishing a value for her pages, a substantial value in this case, so that if someone does have to pay as a result of infringement, it's not a token amount with little deterrence effect.

I like that approach.

ytswy

10+ Year Member



 
Msg#: 3284897 posted 3:26 pm on Mar 18, 2007 (gmt 0)

Perhaps soon there will be lawmaking bodies writing laws that dictate a standard that says "You may only send a non-human agent to a website if that website embeds an invitation"? Might work.

But why would that be a better system than having an opt-out method like robots.txt?

If the internet had started out with this system would the first search engines have ever got off the ground? First they would have had to convince a sizable percentage of site owners that this new idea was worthy enough to allow them to crawl their sites.

What if there's a similarly mutually beneficial idea that no-one has thought of yet the involves crawling sites. How will this ever happen if whoever thinks of it has to convince millions of webmasters without any proven results?

jecasc

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3284897 posted 3:33 pm on Mar 18, 2007 (gmt 0)

Now thats just stupid. When you enter a contract is defined by law: It requires offer and acceptance and an intention to create legal relations. You cannot simply define your own rules for when a contract is concluded. It may be that content is used without permission - then its a breach of copyright but not a conclusion of a contract.

However for the unlikely case that the lawsuit goes through. If you read the following text you have entered a contract with me and owe me 10,000$:

You now owe me 10,000$.

Thank you very much, please sticky me for my bank details.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3284897 posted 6:34 pm on Mar 18, 2007 (gmt 0)

I think she'd have a much stronger case if she'd had a robots.txt

Nope, you're 100% wrong IMO.

If you don't have a robots.txt file it's obvious you're ignorant of internet spiders and the technology to control them, so an ABSENCE of robots.txt should by default block the crawler.

If there's no robots.txt, I think it's also valid check the index page for meta tags that control crawler activity as well, if these don't exist either, which don't on her site, the crawler should keep out.

Basically, the crawler wasn't invited in but it also wasn't blocked, so there's no defined permission granted either way. Politeness alone would dictate to err on the side of caution until the webmaster figures out they have no traffic and learn to install a robots.txt file with permissions and explicitly grant spiders access.

Think about how this works in the real world. Just because you leave your front door open on a hot day isn't an open invitation for someone to walk in your house and sit down on your sofa, it's still trespassing.

jomaxx

WebmasterWorld Senior Member jomaxx us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 6:53 pm on Mar 18, 2007 (gmt 0)

an ABSENCE of robots.txt should by default block the crawler

Should, maybe, but does not. The standard that exists is called the Robot Exclusion Standard, and by convention anything that is not specifically blocked is spiderable.

It's also a trivially easy thing to implement, and anyone with the ability to create a website also has the ability to create a robots.txt file if they so desire.

P.S. By your analogy, you've already opened your house to any person who cares to come in and look wherever they like. Personally I would be delighted to have a robot walk in and sit down on the sofa.

ytswy

10+ Year Member



 
Msg#: 3284897 posted 7:09 pm on Mar 18, 2007 (gmt 0)

If there's no robots.txt, I think it's also valid check the index page for meta tags that control crawler activity as well, if these don't exist either, which don't on her site, the crawler should keep out.

Basically, the crawler wasn't invited in but it also wasn't blocked, so there's no defined permission granted either way. Politeness alone would dictate to err on the side of caution until the webmaster figures out they have no traffic and learn to install a robots.txt file with permissions and explicitly grant spiders access.

Goodbye Google et al. It was nice while it lasted, but unfortunately we've decided that copyright owners interests trump any benefits you've provided the web over the last ten years.

I come back again to the fact the copyright is not an absolute right. It is an incentive to create, based upon the reasoning that society at large gains more in terms of the additional content created than it loses from enforcing monopolies. I simply fail to see why any sane society would want to enforce these draconian measures on crawlers, especially when a widely accepted opt-out system already exists with robots.txt.

JAB Creations

WebmasterWorld Senior Member jab_creations us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 8:04 pm on Mar 18, 2007 (gmt 0)

It's simple as this: if you don't want your content copied don't post it. If you want your content viewed only by humans and not non-humans then post it in a section of your site that requires registration.

('big companies', 'spammers', 'web surfers', 'webmasters')

In this array regardless of whoever (or whatever) else you add the ultimate power is in the hands of the webmaster. Whether they are aware of their potential power or not is an entirely different issue.

- John

cameraman

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3284897 posted 9:04 pm on Mar 18, 2007 (gmt 0)

Just because you leave your front door open...isn't an open invitation...

With this analogy I can see where you're coming from, but I believe the internet is widely regarded as a public place, not a collection of private [cyber] domiciles. Take pr0n for example - if web sites were regarded as the latter, there would be no restrictions. When you open a website, it's more akin to setting up a little kiosk at the mall or an information booth at a fair, and you are extending an open invitation via index.whatever or homepage.whatever.

Given that a web site is a public place, the onus is back on the webmaster to be aware of all the lurkers (good or bad intentioned though they be) and to take measures to select out the undesirables. My point is that the tools are there; she chose not to use any of them, instead opting for the equivalent of a white sign with simple black lettering proclaiming 'all trespassers will be shot' - which won't reduce your jail time any in many municipalities.

With that (and my previous two posts) said, I do agree that spidering should be a bit more proactive. I go into this thing knowing full well about the googles and yahoos and asks, but then get blind-sided by picsearch (and probably 30 more that I haven't happened to run across in my server logs). I have to admit though, that "I didun know" is usually followed closely by "well now ya do".

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 9:35 pm on Mar 18, 2007 (gmt 0)

A better analogy is just because you don't have a sign on your front door (robots.txt), doesn't mean that coming inside and stealing your property and giving it away is ok.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3284897 posted 10:07 pm on Mar 18, 2007 (gmt 0)

A better analogy is just because you don't have a sign on your front door (robots.txt), doesn't mean that coming inside and stealing your property and giving it away is ok.

Bingo!

I agree that in the early days when there were no SE's they had to start somewhere, but that ship sailed about 10 years ago and people SHOULD know they need a robots.txt file by now.

It should state right on the "SUBMIT URL" page for the search engines that the only way they can include your site is if you either include a robots.txt or include the appropriate meta tages to prove you want it spidered and someone isn't submitting on your behalf.

How the SE's get away without any type of confirmation of ownership blows my mind.

ytswy

10+ Year Member



 
Msg#: 3284897 posted 10:58 pm on Mar 18, 2007 (gmt 0)

A better analogy is just because you don't have a sign on your front door (robots.txt), doesn't mean that coming inside and stealing your property and giving it away is ok.

Coming inside your property without permission would be trespassing. You can hardly make that claim about making requests to your publicly accessible webserver.

It's more like you open your property to the public, and then try to sue me because I took photos of something. Maybe you'd have the right to enforce a no photographs policy (no idea about that to be honest), but I am sure you would have to tell me about this policy first.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3284897 posted 11:55 pm on Mar 18, 2007 (gmt 0)

It's more like you open your property to the public, and then try to sue me because I took photos of something.

Putting something in public VIEW for people to access is NOT a license for free-for-all copying to use for any purpose other than your PERSONAL use on your PERSONAL computer. This has been the same for books, CD, records and tapes for ages, make a copy for yourself and nobody cares. Sell those copies or profit from them in any other way and a SWAT team of lawyers will descend upon you.

The radio waves are public, so is TV, just like the internet, do people get away with copying songs off the radio or movies off TV and selling them? Nope, heck the FBI will bust you if you sell copies of movies and I know a video store rental guy that got first hand experience with jail time over that nonsense.

So what part of USE but don't ABUSE are people missing here?

What part of COPYRIGHT doesn't apply just because it's in PUBLIC?

Imagine if you walked into a public library and started making copies of all the books on the Xerox while you sit there ripping their entire CD collection on your laptop.

I'm sure the librarian would grab a baseball bat and make a dent in your skull.

Me thinks those that protest too much are scrapers...

FWIW, if you truly think they do nothing wrong just sticky me your URL and I'll post hundreds of copies of your site all over the internet so you vanish on Google in a sea of dupe content and then we can discuss how I've done nothing wrong because it's PUBLICLY available when your AdSense or eCommerce sales all go POOF! Better yet, I'll use it to cloak your site to porn sites so every time anyone types you or your company name in Google it comes up with links to barnyard fun instead.

Hey, it's a public service, try me! ;)

[edited by: incrediBILL at 12:03 am (utc) on Mar. 19, 2007]

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3284897 posted 12:46 am on Mar 19, 2007 (gmt 0)

If she genuinely didn't want her site archived, why the heck didn't she use robots.txt?

The last time I looked, there's no instruction in robots.txt to stop archiving, only to stop crawling. Assuming she wanted the site indexed by search engines then there was nothing to be done with robots.txt (other than exclude specific robots).

She'll loose the contract argument but she may win an argument based on copyright, but any such victory is likely to to pyrrhic.

Kaled.

This 51 message thread spans 2 pages: 51 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved