| This 156 message thread spans 6 pages: < < 156 ( 1 2  4 5 6 ) > > || |
|Google cache raises copyright concerns |
Everyone loves to write about Google:
Is anyone that has commented so far actually a lawyer? I tend to give the most credit on the legal issue to Brett since he actually spoke to lawyers about it and they seem unanimous.
On a practical level of webmasters and Google, let's face it, who wants to make waves with Google? The little bit they're "taking" with the cache is nothing compared to what they give (free traffic). Complain about the cache on your site and they will remove it. But they may remove your site altogether too. You would be a fool to make a stink.
As to why they haven't been sued I think it is a little more complicated than that. I think with a copyright you need to write a cease and desist letter first. Give them a "reasonable" amount of time to take it down. If you do, Google will certainly take it down quickly.
In order to sue them successfully, you would have to skip that step and say that they caused you damages by their cache. How do you prove that?
The above is my understanding of the law. However, I am not a lawyer and don't even play one on TV.
I agree with Josk
Google points out that they are not affiliated with page or web site etc.
Its a big shame that everyone is forgetting what a great service Google is providing.
I remember when the Internet used to be the place to find information and thats what google wants to do.
And they do it well.
If you don't want it seen on the web don't publish it
If you don't want any search engines to grab it use a robots.txt file, thats why its there.
Sorry Brett but I to disagree with you, with regards to Google finished because of no cache.
Google is fast its quick, its easy and best of all its up todate.
I don't mind them caching my pages but I wish they would connect to my site when someone looks at the cache and cause an entry in my logs so I would know I had another vistor. When you only get a couple hundred a day, everyone counts. LOL
If the legal case is so clear, how come no-one is sueing, or has sued, Google. Lawyers (so the stereotype goes) are money-grabbing so-and-so's. I'd have thought that if there was a case against Google then there would have been one by now.
How wealthy is Google? That must intimidate a lot of companies. Right and wrong are irrelevant if you don't have the cash.
I would like to see an opt-in meta tag, but not a general one. Something that specifys the domain that is allowed to create a cached copy, so that you aren't just giving a blanket permission for everyone to republish your stuff.
>I'm not aware of any archives from the NY Times that Google has crawled. But the article didn't mention any specifics.
Given the NY Times' recent history, the reporter probably made it up!~ :eek:
If the cache case goes to court, then there are others such as archive.org that would also be subject to litigation. It is a shame web sites that try to legitimately display cached content come under file, whilst there are numerous other sites who copy and display content (sometimes presented as their own) that should really need to be addressed.
|Its a big shame that everyone is forgetting what a great service Google is providing. |
I don't think people are forgetting it. In fact I mentioned it in my last post. The cache doesn't bother me personally. But after Digital Host brought up the copyright issue I realized he was right. Then I started to think about whether as a webmaster I would prefer to have it on Google or not (assuming no penalty) and realized I don't especially like it.
The law is there to protect people. If something benefits 1 million people a tiny bit but hurts 1 person by infringing on his rights guaranteed to him by the constitution, the law may protect the individual.
For example, if someone took 25 million dollars out of Bill Gate's pockets and gave it evenly to all WW members, most of us wouldn't complain, nor would we cry for Bill, he can spare the cash. But it wouldn't be fair to him.
By the same token, copyright law is there for a reason and protects people. If even 1% of those whose content was cached suffered because of it, that would be enough to be worthy of protection.
Do you see where I'm getting at? We're not trying to dump on Google or be unreasonable to Google or say they are bad. The cache is technologically a great thing. Love to have it as a user. But as someone with copyrighted content, I'd prefer it wasn't there.
All this said with all due respect to Google and the understanding that Google is not doing this in order to harm webmasters, but in a desire to provide data to end users. Unfortunately, it might be illegal and it may harm people.
I'll bet some big class-action attorneys are salivating over this one and just waiting for the damages to accrue for those millions of websites. I'll also bet they're planning to throw down their trump card when Google tries an IPO.
I can't imagine that the potential liabilities with their caching will ever make it through due diligence with public investors and the FTC.
There is a technical issue with the cache that has not been discussed - some cached pages just don't work.
Google's policy in this regard is therefore going to cost me some precious time that could be better spent on other problems.
|Why should G expect that person to even know about Google let alone how to format a robots.txt to protect their content? Not having heard of Google is not a reason to expect their copyrighted laws to not be protected. |
Everyone who publishes on the web should know what they're doing. Especially if they have a major site. They should know what search engines do, what robots.txt does, and what meta tags do. That's like saying "why should I have to know how to drive just because I bought a car?"
I may not be with the flow of discussion at this moment but I am reacting to the title of the post.
From an old article (Update Sept 14, 2000)http://www.searchengineworld.com/engine/rom5.htm
>>>What is Google doing by caching documents? They are taking YOUR web page and delivering it to THEIR users! hmmm. This gets into a gray area that borders on legal discussion, but I feel as though I've been a victim of a theft. That Google is in essence STEALING my page - nothing short of a blatant copyright infringement. This is ok if I've requested they index my page, but if they just Crawled onto it from somewhere else, then they should have Zero right to my intellectual copyrighted material. You can't record Monday Night football and show it in your sports bar on Tuesday Night. You can't record my website and serve it up to others.
I belive till date i.e July 10th 2003,google Followed in principle what it stated.I don't see google cache raising copyright issues.If you don't want to be indexed, you can put noindex meta tag.
> Everyone who publishes on the web should know what they're
> doing. Especially if they have a major site. They should
> know what search engines do, what robots.txt does, and what > meta tags do. That's like saying "why should I have to know > how to drive just because I bought a car?"
Well said! At the moment there are way too many Webmastering, and programming, cowboys out there. People who aren't professionals, and bring down the industry for us who are as well as taking valuable jobs from those who take the time to learn, to understand, and to be professional.
[edited by: Josk at 3:29 pm (utc) on July 10, 2003]
|Everyone who publishes on the web should know what they're doing. Especially if they have a major site. They should know what search engines do, what robots.txt does, and what meta tags do. That's like saying "why should I have to know how to drive just because I bought a car?" |
With all due respect, that's a narrow view of publishing on the net. You just published content in this very forum. All you needed was a username email address and password.
Why should you have to know about meta tags? Not everyone is a technologist. Some people prefer to go out in nature than read up on meta tags.
|You just published content in this very forum. All you needed was a username email address and password. |
All I did was post a comment on a forum. Now, if this was a poetry forum, for example, and I was publishing an original work, I should have at least realized that since it's on the web, it would be read, cached, and republished. We're not talking about bloggers here, we're talking about people like NY Times.
Great. The law isn't trying to protect you since you don't mind. It is there to protect the poet who publishes his work, doesn't know it will be cached and DOES mind if Google caches it and republishes it WITHOUT his permission regardless of their disclaimer. Just because it doesn't harm YOU doesn't mean it's ok.
For an individual poet, you may have a point. But a newspaper should have a real webmaster, who when asked "how can we publish on the web but still protect our copyrights?" should be able to come up with noarchive as a good idea.
The best quote of the NY Times story, imo, is:
|... as Google, which draws millions of visitors to its site daily and redirects them to others through secretive search formulas, ... |
Only the NY Times can make the simple concept of a search engine sound evil!
First of all the popular car driving analogy falls flat on the fact that you can hurt people with a car directly and severly.
Secondly, who is to decide on the proper rules for the net? Should we know how to configure Apache? should we know how to program an OS? Should we know about META tags? Should we know about the <KILLROY> tags that I just invented and use on all my pages? Since when has the net ceased to be in the open market and become regulated so severly? I must have missed it.
Further, what are your qualifications to make that comment? Do you know meta tags? Maybe you need to know corn farming to post on this forum, you might be a criminal! I think you get the point that that is as nonsensical as expecting people to be thrown in jail for violating social ettiquette in a Pub.
Copyright law is quite straight forward (more or less) (I've been reading case law for 3 months now as I'm currently getting sued over IP too), and Google is in clear violation.
The real question is if somebody can make a case for real damages to make it worthwhile. Hey there might even be a business in creating a site specifically for beeing monitarily damaged by the existance in the Google cache, whate a few months and then sue.
I don't mind the cache, never really used it as a user, don't mind it as a webmaster, and while I've had my own share of minor copyright violations, I certainly understand that there is a law, and that there is a reason for that law.
COPYRIGHT VIOLATION: I think Google's caching scheme is a copyright violation. It may be a benign violation (even one with good intentions), but that's irrelevant to the question of whether it is or isn't a violation of copyright.
NOCACHE TAG: The availability of the "nocache" tag doesn't absolve Google of its obligations under the copyright law. To use an analogy, if I'm a book publisher, it isn't my duty to print the book in light blue type or with a pattern superimposed over the text to prevent illegal photocopying. It's the photocopier's duty to refrain from illegal copying and distribution.
LAWSUITS: It's true that a class-action lawsuit might benefit a law firm, even if the damages awarded to each cached Web site were inconsequential. Still, the challenge would be to prove and calculate damages. For every site that was hurt by the presence of the Google cache, another might be helped. And for every Web site whose copyrighted content is registered (thereby entitling the owner to statutory damages), there are probably a thousand Web sites whose copyrights aren't registered (and who therefore are entitled only to real damages). Proving and calculating damages would be far more complicated than, say, proving that a phone company overcharged its 10 million customers by 50 million dollars or an average of five bucks each.
MY PREDICTION: Google may well eliminate its cache at some point to avoid future liability (but without admitting guilt), and that will be the end of that--partly because no law firm will want to risk investing huge sums on a pig in a poke, and partly because a challenge to the legality of caching isn't in the public interest.
Thank you Europe. Are you a lawyer? You sound like you know your stuff. In all this one thing no one brought up is that Google must have a team of lawyers and they must have discussed this question. They ain't no fools. And they've been doing this for a long time, so they calculated correctly. If they dump it or change the rules it would probably be prior to IPO to avoid future litigation as you say, on a substantially larger company.
|there might even be a business in creating a site specifically for beeing monitarily damaged by the existance in the Google cache |
Now thats an idea :)
[edited by: gopi at 5:32 pm (utc) on July 10, 2003]
I disagree with you Brett:
|...It was the Google branding cache that built google. No branding cache page - No Google.... |
I believe Google became "grand" because of
* the quality of their search results
* responsiveness to the target market,
* and finaly branding.
Y! had a branding campaign. With the yodling <sp?> commercials. That was clear branding. I don't believe Google ever did.
To find Google, I had to have a technical "relationship" or connection with something that had Google associated with, such as Mozilla, the Google toolbar, etc. Nowhere do I find Google advertisement other then on their own site. (I could be wrong on this of course; I didn't do a complete research.)
Their tiny bar on top of 'cached' pages clearly indicate that -
|G o o g l e's cache is the snapshot that we took of the page as we crawled the web. |
The page may have changed since that time. Click here for the current page without highlighting.
Here is some further questions, which I think would have to be answered first -
* Are individual pages separate publications or a whole web site is considered one publication? Careful on the answer, because it has other implications.
* Is the web site a published or unpublished works?
* When is the web site published? (When user views it or when it was created on the server?)
* Are web sites factual or artistic works?
* When is the web site considered 'out of print'? Clearly not the life of the author...
* Will the cached pages negatively affect the web author's economic gain?
* Does the cached page display be considered to contain anything original? (possibly - the search?)
These are just some of my thoughts on Google's snapshot taking, or caching.
Well, first of all all print arguments are irrelevant as we've had legislation for electronic works for a significant time now. Websites are obvisously artistic, as creativity has gone into their creation. The case law applies where facts quoted on a website remain facts but their arangement may be original, possibly not sufficiently to warrant protection (like alphabetic listings, see the Feist case).
Publishing is the process of making it available. If nobody buys an issue of a newspaper, or even if a particular individual doesn't purchase an issue, that can obviously have no bearing on the published status of the document. Websites have the same out-of-print status as any other electronic document. Case law supports a comparison with programs (PS I don'T know hte details of out-of-print specifics).
Economic gain or loss is different for each site/page and clearly the crux of any legal attack on copyright.
On the otherhand I juse had my accounts garnished as well as some other individuals not connected to my site or business, as well as my site barred by court order from beeing modified in any way, without there beeign any kind of violation. Of course I'll have the next 8-10 years to proving my innocense... hurray to the legal system...
Also the continuous quotation of the disclaimer on the cache page is completely irrelevant as it has no contractual binding. I.e. people are very well capable of viewing the cache without havign read or noticed the disclaimer. If I pirate some software and sell it for profit, I simply cannot protect myself legally by scribbling on the CD that it is pirated. That's simply nonsensical.
Personally I do not agree that the cache is a major factor of Google. I believe you would be mighty surprised how few "usual/common" users don't use it significantly. Google of course, only has the answer to that.
Some great points made by killroy, europe and DG.
|>>not only do they make abundantly clear the location of the original, they also leave the copyright information intact<< |
I don't know how to make this any clearer, but I'll certainly give it a shot. It doesn't matter if the copyright information is left intact or not and referencing the original doesn't give anyone permission to reproduce the content. Permission must be sought after and received.
A few points here:
The fact that they leave the copyright intact hurts Google. In the US, that copyright actually means something. It means NOARCHIVE, and Google demonstrates that they ignore it. So requiring a NOARCHIVE, NOCACHE tag is redundant.
And what about sites that specifically state their copyright position? Is then Google exempt because their spiders dont understand the meaning of the text? Some smart lawyer will have a field day with Larry and Sergey's quotes about Applied Semantics and the ability to understand text.
On the flip side of this issue, because the copyright and dates were left intact, Google's cache and the WBM allowed me to print past versions of my site that are now on file with the USPTO to help me defend my ownership of a domain that a competitor has been trying to steal. Ironic, isnt it, that Google violating my copyright is helping me protect my copyright? Of course now I cant collect for damages under the class action suit.
Let's suppose we grant that Google is violating copyright.
So why hasn't anyone sued them?
OK, your attack dog calls up Google's guard dogs. Your dog barks, "you guys are violating our copyright, sit, stay!"
Their dogs bark back, "OK, no problem. You wanna fix it or shall we?"
Your dog, if stupid, snarls, "We can't fix it! You gotta fix it!" And their dogs roll over and play dead. "Just a sec, we'll fix it, no problem! ... tick...tock ... There, your site is gone from all Google servers. Sorry for all the trouble, and have a nice day! See you 'round the fire hydrants!"
Your dog, if slightly more intelligent, snarls, "Whadda you mean?" Their dogs say, "you don't have to call out the mange patrol. It's a lot faster to just put the robots.txt file in. Or we can pull your site from Google. Your choice, but make up your mind quick, we've got sites to ban." And your dog rolls over and plays dead, then gives you a whacking-big bill for setting him up for public humiation.
Your _trained_ attack dog tells you up front, "you know, this is making a big deal over nothing. Pay your webmaster $1.50 overtime for the coupla minutes it takes to add the robots.txt file. Here's my bill for, lessee, 30 seconds of concentrated thought at the usual rate, 5 minutes of blather at the usual rate but 1 hour minimum billing period, $200.02. See you 'round the fire hydrants!" Expensive advice, but it's cheaper in the long run.
|Your _trained_ attack dog tells you up front, "you know, this is making a big deal over nothing. Pay your webmaster $1.50 overtime for the coupla minutes it takes to add the robots.txt file... |
There was a case back in the late '90s where a site called TotalNews.com was framing articles from THE NEW YORK TIMES, THE WASHINGTON POST, and other major papers. To add insult to injury, TotalNews was running ads in the frames.
A consortium of about half a dozen newspapers (including the TIMES and the POST) sued, and TotalNews stopped its practice of framing--but only for the publications that sued and eventually settled. It still frames other Web sites' pages, and some of the sites are big names. Why haven't those sites sued? I'd guess it's because they're willing to tolerate the frames in return for the traffic that TotalNews sends them--and if they didn't want their pages framed, it would be cheaper to use "framebreaking" code than to file a lawsuit.
About.com is another company that frames third-party Web sites, and it's never been sued (although it has received "cease and desist" demands from at least some Website owners). Ask Jeeves is another site that--in the past, at least--has used frames with banner ads. Why have About.com and Ask Jeeves gotten away with framing? Again, it's probably because most Webmasters would rather have more traffic with frames than less traffic without--and those who object to such frames find it more effective to use framebreaking code than to hire lawyers.
Therefore, not only remove our page but pay use $$$ damages which are all documented here: [document]"
Pity most folks aren't as neat online as with their accounts, or we might have seen a few of these...
There might even have been hush hush out of court settlements. We all think of Google as squeaky clean, but who really knows? It's not like they've been forthcoming with info.
Mind you, I'm the biggest Google fan there is, just plaing Devil's Advocate, learned that from my lawyer...
i think the whole idea of putting information free to view on the internet ... then later making you have to pay for it ... is somewhat flawed
either the information needs payment to view, or it doesn't
Vince...some had the idea of taking their articles in totality and publishing them, but with a twist. They randomly jumble the word order so that the search engines see the same keywords and word density, but charge you a fee to view the content in the correct order :)
I find the cache just awesome - it's really one of the features I love about google, and I use it a lot. It happens pretty often to me that I was looking for a particular, hard-to-find piece of information, then it seems like I found it, the snippet in the SERPS suggests that I'm almost there... and then it turns out that the page I was longing for has simply disappeared, (re)moved without replacement by a careless webmaster. But with the Google's cache, it's still there. I LOVE IT! :)
Btw. if the cache is copyright violation, then the site snippets in the SERPS are as well. To me, the cache is really nothing more than a huge snippet, and it says very clearly at the top of the page that it's not Google'S page or copyright, so I really fail to understand all the fuss about it.
I may be missing the point but what about Google Images?
Surely caching images off your site and making it available to their customers via google image search is a greater violation.
- are frames and cache the same issue? wisenut does frames.
- how's the NYT policy towards news aggregators?
- quotation rights? common use? is there such a thing? It is the press after all.
- is the displayed copy legally different from indexing or are the two the same? (copy vs. display of it)
- laws differ worldwide, here you don't need (c) on a page to have copyright. I think it's 50 or 70 years.
- most anything raises copyright concerns these days it seems
- what made google, imho, was "I'm feeling lucky". Big hit with techies, and the techies told the less savvy.
"There are also some people who do not know about the robots exclusion protocol, and think their page should be protected from indexing by a statement like, "This page is copyrighted and should not be indexed", which needless to say is difficult for web crawlers to understand." (...) "Since large complex systems such as crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and solving these problems as they come up" Anatomy of a Large-Scale Hypertextual Web SE
- relates to indexing, not caching, but imho, an index is also some kind of copy, it is just not displayed in its entirety.
| This 156 message thread spans 6 pages: < < 156 ( 1 2  4 5 6 ) > > |