| This 51 message thread spans 2 pages: < < 51 ( 1  ) || |
|Colorado Woman Sues To Hold Web Crawlers To Contracts|
|Shell's site states, "IF YOU COPY OR DISTRIBUTE ANYTHING ON THIS WEB SITE, YOU ARE ENTERING INTO A CONTRACT," at the bottom of the main page, and refers readers to a more detailed copyright notice and agreement. Her suit asserts that the Internet Archive's programmatic visitation of her site constitutes acceptance of her terms, despite the obvious inability of a Web crawler to understand those terms and the absence of a robots.txt file to warn crawlers away. |
A court ruling last month granted the Internet Archive's motion to dismiss the charges, except for the breach of contract claim.
This could be really interesting if she has the legal resources to take on the case. I have always felt that this issue with archive.org was a very importan legal precedent waiting to be set.
I've read the whole thread and I don't see anyone protesting the issue of copyright. I think we're all in agreement that your unique content on your website is your intellectual material.
The point that I'm trying to make is that it is in public view, whereas you're drawing analogies to private property as in a domicile.
Correct me if I'm wrong but you contradicted yourself:
|Nope, you're 100% wrong IMO. |
If you don't have a robots.txt file it's obvious you're ignorant of internet spiders and the technology to control them, so an ABSENCE of robots.txt should by default block the crawler.
|that ship sailed about 10 years ago and people SHOULD know they need a robots.txt file by now. |
Now let's look at this - and please keep in mind that I'm not attacking - I am only discussing this and there's a sparkle in my eye, ok?):
|Putting something in public VIEW for people to access is NOT a license for free-for-all copying to use for any purpose other than your PERSONAL use on your PERSONAL computer. |
By that definition, I should be able to sue every search engine that has visited my web sites.
- I have never once ever in any way shape or form submitted any of my sites to a search engine (a couple months ago I signed up for a webmaster tools account but they've been visiting me for years)
- I have a meta tag, revisit-after 21 days
- My sitemap.xml file lists 'monthly' for each entry
Google drops by every day. Yahoo drops by every day. MSN drops by every 2-3 days. Ask drops by once every week or two. picsearch has dropped by at least once - my original photos are indexed on their site and I didn't know it until I happened to see their visit in my server log.
Search engines make money by selling ad space. They attract customers to see those ads by collecting up content. Ergo, they are making money by using my original content (in a collection) for profit, not-for-personal-use. I have never given my consent, explicit or implicit, for them to spider any of my sites. I'm not in the top 20 so there's no symbiosis - I don't get nothing out of it in any way. Worse yet, in addition to stealing my content they are ignoring my signs and stealing my bandwidth - and that's real money, not theoretical intellectual income potential.
And if I apply your earlier sentiment (what I inferred anyway), then I've also got them for trespassing - I never invited them into my house, they just came barging in through the door.
So where's that baseball bat?
Now let's say you take your brand new spiffy laptop computer down to the bus stop. You set it on the bench and walk away. You come back the next day and it's gone. Do you check the bus company's lost & found? If it's not there, do you report the theft to the police? How hard do you think they should look to try to round up the perpetrators?
Is it wrong that someone removed it from the bench? Yes. What would your dad say - something like "well why the h*** did you leave it in the yard?!?" That's the sum total of my point.
Personally, I'd love it if she won, and won big. I'd like it if she won so big that every search engine shut down until they figured out how to gather up everyone's permission slips. I just don't think it will happen - she's the one that left her stuff in her yard. She left a sign for people, but not for robots. Is it right? No. Is it reality? Yup.
I'm really not arguing against copyright law, I'm simply saying that I think there should be a degree of responsibility from the owner to indicate their preferences regarding robotic access on the internet, since their are many highly beneficial services that rely on this.
Making this opt-in is not going to make the problems of copyright infringement go away - scrapers will still scrape etc. all that will change is that the legitimate services that rely on bots will be crippled - you'll still have to take action against those who infringe your copyrights in exactly the same way as you have to now.
So where is the benefit in changing?
|Me thinks those that protest too much are scrapers... |
FWIW, if you truly think they do nothing wrong just sticky me your URL and I'll post hundreds of copies of your site all over the internet so you vanish on Google in a sea of dupe content and then we can discuss how I've done nothing wrong because it's PUBLICLY available when your AdSense or eCommerce sales all go POOF! Better yet, I'll use it to cloak your site to porn sites so every time anyone types you or your company name in Google it comes up with links to barnyard fun instead.
Not a scraper. And if you did that to one of my sites I would sue you and recover damages, which is as it should be. I completely fail to understand how abandoning robots.txt would change this in the slightest. It's not like it would become any more illegal to do than it is already.
|I have a meta tag, revisit-after 21 days |
Slightly off topic, but WHY?
The only SE that *MAY* have ever used that meta to the best of my knowledge was AltaVista before they were replaced with Inktomi results.
On the robots.txt front, I didn't actually contradict myself, those statements actually support each other in that A) people have been using robots.txt a LONG time so this is nothing new, and B) if you don't have one you probably don't want robots or last but not least C) you don't know what you're doing and don't deserve to be indexed until you do.
I think she has a contractual leg to stand on as photographers have successfully sued for licensing rates for images that show up on other websites regardless of whether or not you knew about those licensing fees, nor how it got there with a rogue web developer ripping off images, etc. although admittedly it starts with copyright and then moves into the contractual issues.
|Google drops by every day. Yahoo drops by every day. MSN drops by every 2-3 days. Ask drops by once every week or two. |
But do you ALLOW them or not?
I allow them, there's a huge difference, as I have my site security built around OPT-IN so the only way in the front door is to be permitted inside.
Now if you try to sneak your automated tools in as MSIE or FireFox, I have bear traps set to stop you as well and then there are laws on the books, which an attorney has said could be used against you in a major way trying to bypass my site security to access pages.
This is where I drew the trespass analogy:
"Computer Hacking and Unauthorized Access Laws" [ncsl.org]
My fave is the content of the California law, particularly the definitions in 502.c:
|(c) Except as provided in subdivision (h), any person who commits any of the following acts is guilty of a public offense: |
(1) Knowingly accesses and without permission alters, damages, deletes, destroys, or otherwise uses any data, computer, computer system, or computer network in order to either (A) devise or execute any scheme or artifice to defraud, deceive, or extort, or (B) wrongfully control or obtain money, property, or data.
(3) Knowingly and without permission uses or causes to be used computer services.
The trick is those laws don't say PRIVATE server, so anything publicly displayed is not fair game is someone deliberately bypasses your robots.txt and .htaccess files to get the data, or at least that's what I'm being told.
However, she didn't install any security so they don't have this leg to stand on and copyright seems to be the only defense at this point. The only reason I mentioned it is people that lock the front doors of the web site and then get ripped off by people deliberately bypassing those locks MIGHT has a legal leg to stand on besides copyright as you can prove intent to bypass security.
Anyway, that's a whole new thread some day ;)
[edited by: incrediBILL at 1:50 am (utc) on Mar. 19, 2007]
cameraman, I agree there's a point to be made about copyright. Google and Yahoo do the same thing by cacheing web pages, at at some point there may be some real fallout.
I think what everyone is reacting to is the obviously spurious nature of the "contract" the plaintiff is pretending has been entered into. I might as well enter into the same kind of agreement with my toaster and sue General Electric for damages.
born2drv has it right. The robot may run independently but it is only acting on behalf of the programmers and the company. If it enters into a contract, it is clearly the company which has entered into that contract.
|When you open a website, it's more akin to setting up a little kiosk at the mall or an information booth at a fair, and you are extending an open invitation via index.whatever or homepage.whatever |
You're extending an open invitation to view, not to copy and take, freely. At a retail type kiosk, you can view all you want. If you desire to actually take something from the kiosk, you must pay.
|I agree that in the early days when there were no SE's they had to start somewhere, but that ship sailed about 10 years ago and people SHOULD know they need a robots.txt file by now. |
I agree with your overall stance Bill, but you lost me on this statement -- which is similar to other statements in this thread which claim that people should know to use robots.txt.
You guys and ladies have to remember that most of us on WW are geeky, techy folks who know how to construct the inner bowels of a website. But the Internet is a medium where almost anyone can start a web page or website. In fact, there are probably millions more web pages created by non-technical folks, than by technical ones.
With that, there is absolutely no way we should claim "people should know to use robots.txt". That's like saying: "People should know not to wear green-colored clothing in that neighborhood. If they do, certain gangs will beat them up."
Whether you know or not, you don't deserve to have your property taken and you certainly don't deserve to be beat up.
Granted, if anyone wishes to operate in this medium (the Internet), there may be certain rules they should know. But again, most web pages are not created by web professionals, so I think we have to account for them in some way.
|... just because you don't have a sign on your front door (robots.txt), doesn't mean that coming inside and stealing your property and giving it away is ok. |
Is that really a fair analogy? By publishing on the net you are taking the content and offering it to the world at large. No walls by default is, in a way, the definition of the Internet.
Anyone is free to protect their content with passwords, etc... which would be the only kind of website you could reasonably compare to a private home.
|people SHOULD know they need a robots.txt file by now. |
I don't personally think it would be good for the net's future if knowledge of arcane practices was required for participation. One of the reasons for the current diversity is that anyone can still contribute regardless of their level of expertise.
It seems to me precedent forcing search engines (because that's clearly who it would hurt the most) to switch to opt in would be a disaster for the availability of information online.
The worst part, though, is that any precedent set by this case would effect the only the US. Consider what that would do to the US' position in the online economy.
OK, this seems pretty simple to me.
(1) Given: Anyone who knows enough to understand the opt-[in/out]-by-default issue, knows enough to opt--whether in or out--under either conceivable system.
(2) Given: The internet is currently opt-in-by-default.
Question: So, who's getting hurt by the current scheme? Nobody in THIS thread, that's for sure -- by discussing the matter, you've all proven you are capable of acting on your own best interest, as things are now.
So, who's getting hurt? The people who are sueing? No, they also know enough to understand how to opt out, in the current system.
So, who's getting hurt? The people who put web pages up that they don't want to be found by search engines? Um, like, and just WHO would that be?
So, who's getting hurt?
Don't everyone speak at once.
I think most people here missed the point. This is not about copyright infringement, the woman claims the company has entered a contract to pay 5000$ per page for copying. As far as I know fair uses allowes non profit online archives to copy content, like for example real world newspaper archives and libraries.
A coresponding case would be like this: A small magazine puts a small note on every page that anyone copying content enters into a contract to pay 5000$ per copied page. Afterwards it sues libraries and archives for fullfillment of contract because they have copied the text for archiving purposes.
Simply put: pure nonsense. In my opinion we have here a not very technique savvy judge who let himself confuse by all the talk about robots and computers and totally missed the point. And the point is simple in my opinion: There is no contract at all and there would not be a contract even if the content had been copied by humans and not by robots.
If it where otherwise everybody could put such disclaimers on his page demanding unreasonable sums of money and if some individual copied some text sue him for the money. However this won't work because there would be no contract. You could sue for copyright infringement but not for fulfillment of a contract.
In the general case, signatures, etc. are required for a contract. Legislation exists for special cases, e.g. when the hammer comes down at an auction. In this case, simply stating that by performing an action you are deemed to have accepted the terms of a contract does not mean that a legal contract has been entered into.
Confusion exists in part because software companies have been using "I agree" buttons on installers for years, but the legality of such things remains largely untested in court.
|Slightly off topic, but WHY? |
I have web pages that old. For many years my html editor was notepad, so I tended to copy and paste headers from one page to the next. I threw it in to this discussion because it is, however dated, a 'sign' which is being ignored.
|But do you ALLOW them or not? |
Allow? Hey, it's a hot summer night in the desert. I allow them to view, but that doesn't give them the right to snatch up my content... Actually I have an area cordoned off by signin, I have .htaccess doing all sorts of wonderful things (most of which I learned here), and the search engines are doing a beautiful, thorough job of showing me where my oversights are.
|she didn't install any security so they don't have this leg to stand on |
Or in other words she'd have a stronger case if she had.
It sure appears to me that you have taken a fair amount of time educating yourself, then applying that knowledge to protect what is yours instead of relying on the virtuosity of others [human and not] to not wander in your door and wag it off into the sunset.
|At a retail type kiosk, you can view all you want. If you desire to actually take something from the kiosk, you must pay |
Not if the kiosk owner left all the goods unlocked out in the open and walked away. Many people would take a gander at the goods and think 'hmm, guess I'll come back later to buy one' and leave. Others would take what they want with the rationalization "well if he didn't want his stuff taken then he should have locked it up" which of course doesn't change the fact that it's theft, but also doesn't change the fact that the stuff is being taken.
celgin, I agree that in an ideal world of wonderment and light, my grandmother should be able to set up a web site with pictures of me as a grandbaby and her vacation schedule so that all of us relatives know where she is at any given time. But the world has lots of very dark spots in it so it's no surprise that she'd come home from vacation to find all of her heirlooms long gone - the thief knows just exactly when to head over since her vacation schedule is in public view and knows where to go by doing a whois on her domain.
Of course no one deserves to get beat up, but all of our actions have consequences. You can either blunder through and pay those consequences on occasion or you can educate yourself and keep yourself in the clear.
Ms. Shell did not, in my opinion as a non-lawyer non-judge armchair quarterback, give adequate notice to the robotic agent that stole her material. However, given the current uproar over DRM and all the nonsense which has spewed forth from MPAA and RIAA, I think her case has some merit. I really don't have anything against search engines coming by my site (I'm practically begging them to come) but maybe they should be accountable for their agents' actions. And I really don't think it would spell doom for the internet - if they did something like incrediBILL suggested, then it would amount to "every webmaster has a year to either add a robots.txt if s/he wants the site indexed, or remove it if not".
One bad thing about opt-in --- If everyone was required to put a robots.txt file to be online or they would simple be ignored by the SE's who would benefit?
Well the big players would benefit most in a way. Google, Yahoo, MSN, maybe ASK, etc, will all be the on the top of those opt-in lists. You won't want to allow all spiders because that would be like inviting all scrapers. So all the little players or those developing new search engines would suffer.
If there was an opt-in type policy it would have to be something in respect to what can be done with the content on the website. For example:
- Is indexing in a non-profit search engine allowed?
- Indexing in a for-profit search engine allowed?
- How often to spider the page for indexing?
- Are text snippets in SERPs allowed?
- Are graphical snippets in SERPs allowed?
- Are cached pages of recently spidered content allowed?
- Are cached pages of previously indexed pages allowed?
- How long can cached pages be archived/saved for?
- Can images be indexed?
- Movie clips?
- Can you mooch off my content and scrape my site and slap on adsense on yours instead? (rewored with legaleeze of course) :)
- etc, etc.
A clearly defined robots.txt that is universal and does not favor any search engine is needed. And this robots.txt MUST be OPT-IN only. If someone puts up a web page and gets zero traffic, they'll realize very quickly why and put up a robots.txt file. Or they'll pay godaddy or whatever $5 to generate it for them. But something like this is clearly needed to level the playing field for all search engines, and to empower webmasters to clearly define what is and what is not allowed for use of their content.
|Of course no one deserves to get beat up, but all of our actions have consequences. You can either blunder through and pay those consequences on occasion or you can educate yourself and keep yourself in the clear. |
Maybe. But my point is, the Internet will never be a place where "average" people upload and configure perfectly designed and perfectly secured websites. They're not informed enough to do that and it sounds like everyone is saying: "If you want to play on our playground, you need to learn all of the rules... no matter how difficult the rules are."
Not everyone knows the rules of the road, and they're still allowed to drive. Not everyone knows which medications negatively interact with each other, but you can still purchase them over-the-counter.
Sure, if you took one of these cases into court, you will be laughed at for being so uninformed.
But I don't think securing your site on the web will ever be held in the same regards, simply because there are more unsecured, "average" sites - than there are secure, well-coded sites.
One thing that comes out of this is that if the contract was not valid then there needs to be a way of presenting a contract to a robot in a way that will be valid.
Her site is down now - just a holding page saying the page is temporarily unavailable. Luckily I can still see it in google's cache :-)
|Question: So, who's getting hurt by the current scheme? Nobody in THIS thread, that's for sure |
Ah ha, would you like an actual list?
There are hundreds of documented NUTCH crawlers alone, so I would say there are probably thousands of crawlers on the web at this time which is insane.
All but about 10 of them aren't really useful to most people. The rest are just noise that chew up bandwidth, CPU, and give no ROI. At times they run amok and crawl so fast your server can appear down for hours, or they run up bandwidth costs and it's out-of-pocket, or they flat knock you offline like what happened on WebmasterWorld itself.
Who gets hurt?
The webmaster, the visitors, hosting companies, legitimate crawlers that get blocked with the backlash from all the crap crawlers.
In other words, most of the web.
IIRC, a valid and enforceable contract exists when one party offers a conditional benefit to another party, and the other party accepts the conditional benefit. Signatures are *not* required, and never have been. All that one must show is the acceptance of the conditional benefit by the receiving party.
Conditional benefit = a web site's content, offered conditionally upon acceptance of some terms.
The fact that SE's use automated means to read and process the content may or may not get them off the hook in this case, but the real question is: should it? The robot's owner/creator/master is indeed legally and morally responsible for the activities of the robot.
After all is said and done, I don't expect the plaintiff to prevail - even if he/she does at this level this will just move up the chain where someone with the right connections will kill it in favor of the moneyed interest, and you know who/what that is.
The case is totally idiotic. What make it worse, is that here, on webmasterworld, that thing about automagically charging for copies had been advise.
|Putting something in public VIEW for people to access is NOT a license for free-for-all copying... |
For search engines, yes it is. The purpose of search engines is to allow people to search *all* of the *publicly available* (i.e., not password-protected) content on the Web. If search engines were opt-in, most people with non-commercial sites would not bother to opt in, and thus the search engines would not be able to achieve their purpose.
|For search engines, yes it is. The purpose of search engines is to allow people to search *all* of the *publicly available* (i.e., not password-protected) content on the Web. If search engines were opt-in, most people with non-commercial sites would not bother to opt in, and thus the search engines would not be able to achieve their purpose. |
I beg to differ as I block many search engines and only let a few good ones on my site, those that give me ROI for allowing them to crawl.
Why are search engines special?
If I get nothing in return, they can go pound sand and index 403 errors.
Besides, copyright is NOT opt-out, never has been, and the minute the search engines started displaying CACHE pages instead of just snippets is when they started overstepping their boundaries. In the case of the IA that she's suing, they flat out copy your site without permission and need to be slapped upside the head.
I hope she rips 'em a new one.
One thing that has been overlooked is that the Internet archive robot doesn't truly obey the robots.txt file. While true cached copies of pages will not be made available if the IA robot is banned via the robots.txt, the robot will still crawl all pages on a regular basis. The only way to truly stop IA's robot from crawling and archiving pages is to totally block it via the .htaccess file.
Oh just because the Internet Archive is non-profit, doesn't mean what it is doing falls under the fair use exception of copyright law. As has been referenced many times in these forums, the Chilling Effects website has a great primer on copyright law and fair use.
My personal feelings on this matter is that the failure to use a robots.txt file should weaken the lady's case; however, had she used a robots.txt file and the IA robot continued to crawl her site after it was explicitly banned, she should have been able to prevail in her case. The robots.txt file and meta tags may not be perfect, but they are the customary means to express our desires as web publishers in a manner that can be understood by robots. A robot that disregards the robots.txt file and meta tags should be considered to be in legal violation of contractual terms.
| This 51 message thread spans 2 pages: < < 51 ( 1  ) |