Talk to a lawyer, and then report your results back here.
And they aren't the only ones doing so. Search Hippo and several others do it as well. This question has got to be addressed and resolved.
Considering that all major search engines make cache copies available, I doubt there is anything illegal in doing so. It is imporant to realize that having cache's of pages is fundamental to SE operation. They exist, and will exist. I understand that some people dislike having someone making these copies publicly available. OTOH, those very people made the page available first.
If cache copies were not publicly accessible, we would hear legal rumblings about how SE's were using secret copies of their pages. Web masters would demand to know the content of the data the SE's were using.
Someone is always going to b*tch. Best to keep things open, IMO. Lord help the paranoid if they find out about archive.org...
To my mind images are at least the same level of issue as visible textual / page code caches, in many cases images are clearly copyrighted graphics or photographs yet they are copied none the less and only a minor passing reference to "these may be copyright" is made, mind you even that was when I last looked not present on the G page cache.
The reason AffiliateDreamer that google as one example is allowed to do this is the same reason that the few supermarkets get away making enormous margins compared to thousands of farmers who make tiny ones, because there is now an imbalance of power in the structure of the market place and there is no union of webmasters (or union perhaps of content generators and or providers) which can retain its own lawyers of sufficient quality to fight for what it decides is its own corner and its members interests.
No individual farmer can make much headway against the supermarkets until he joins with lots of others and gets some leverage, its the same with webmasters / content providers.
So it raises a question .. how many webmasters would be prepared to pay something to join a union?
So basically if you don't like the cached pages, you requested them to be removed, blam-o, you now get 0 referrals from the said search engine...
you are free to set a no cache directive to instruct Google not to show a cached copy. How-to in the Google faq.
They still use one for their SE work, but they don't display it to end users. You are still indexed and get referrals.
So unless you are cloaking or don't get fresh crawls, the cache is not a problem (?)
i am not sure if this link is allowed but someone is taking legal action.
"The company said it has sent 27 formal requests to the Mountain View, Calif.-based Google to remove the offending Web sites from its index and stop displaying the photographs in its search results, but was not satisfied with Google's response."
If they win, will Google stop displaying cached pages altogether, or will they introduce some way to opt in? Sometimes the cache is quite handy, but in some ways you can take it or leave it. What you gain in knowing about how Google sees your page, you lose in visitors statistics for those people who view the cached page.
It's for the courts to decide whether or not the cache is legal, but I wouldn't assume that because it has been done for so long, that it is necessarily something that's all right for search engines to keep on doing.
Interesting post / link cabbie, I was not aware that there had been a US case about thumbnails which was referred to in the text.
I've said it before but it's worth repeating, a standard needs to be set to disallow caching from a single, central file. The obvious solution is an enhanced robots.txt specification.
I have no problem with an opt-out system provided opting out is easy. However, the only way you can opt out is by using a robots meta tag - not much use for non-html files.
Frankly, I think it's pitiful that Google etc. have not addressed this problem.
Wow! That is a lousy description of whatever the heck that case is about.
If the description of the case is accurate, I would have to assume that all the errors are on the part of the copyright holder, and that it is a garbage case.
Just think about it. They are complaining that google is indexing a site that contains copyrighted images (it does not make it clear whether it is their own site or the sites of others), but the complaint is not about the websites in the index being copyrighted, but the pictures on the websites and that people can get to the images for free.
If it is their own pay site, then how in the hell did googlebot get their for free? How do the surfers follow that link and get to the images for free?
If it is images on some other site, then google is not indexing the images, only the web pages, in the main index.
If it is a complaint about google images, they are pretty securely covered by the Arriba ruling if it is to images on their own servers. It also goes back to the question of how googlebot and surfers can get to that paid content.
Images on someone else's server should be dealt with in the normal way with a DMCA complaint, which Google always seems to comply with.
As for the original question about Cached copies, I have never read an opinion by a lawyer that has not allowed for the possibility that it is in fact legal. There is a general consensus that it is highly risky behaviour for a corporation, but there is no case law that supports it.
I have read several opinions of lawyers that suggest that you should fully understand the risks involved to your business before filing such a suit. It is a fun theoretical discussion, but it is a lawsuit that should not be entered into rashly.
As far as images go, Google is in the clear because (a) it's already been determined that using a thumbnail is legal as long as it's small enough to not detract from the value of the original image on the original host's site and (b) Google doesn't seem to be caching the full-sized image.
As far as websites go, I can't see how it's legal to cache a website and host that cached page on your own site. This goes back to the debate over whether you have to ask for someone to not do something illegal or not (via robots.txt or asking that your cached pages be removed via specific requests). In the US, anyway, it's legal until the courts or the laws say it's illegal.
And by the way, a lot of other search engines doing it does not make it legal.
|As far as websites go, I can't see how it's legal to cache a website and host that cached page on your own site. This goes back to the debate over whether you have to ask for someone to not do something illegal or not (via robots.txt or asking that your cached pages be removed via specific requests). In the US, anyway, it's legal until the courts or the laws say it's illegal. |
Fair Use is how it *might* (notice that I said "might", I don't want to go through that whole "BigDave said that it was Fair Use" garbage again) be legal.
You have to understand that in the United States, copyright does not exist to protect the creator of the work. The protection is only the means, not the goal. The goal is from section 8, clause 8 of the United States Constitution:
"To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;"
It is to promote the base of public knowledge. You are granted copyright for the public good, not for your own good.
The public's ownership interest in the copyrighted work was made clear in court cases for well over a century before it was ever codified into law.
If having the cached pages available is a great public good, there is a very good chance that the courts will find a way to rule on the side of the public good.
And believe it or not, the fact that there is a way to keep Google from showing the cached version, will play into any court case given the nature of web publishing.
I'm not saying that it is absolutely legal for them to do, but it is far from being as cut and dry as the average content publisher would have you believe.
I personally think that the wayback machine has a much greater claim to public good than the google cache pages, but I don't know all the arguments that would be made.
I would not be surprised if google were to welcome a lawsuit on this issue to help create caselaw in their favor.
OK BigDave, I recently purchased a book about automobiles that contained hundreds of thousands of references to production data, specs, options etc. about these vehicles. I feel this is great public knowledge! If what you are saying is true I can copy this information to Digital and produce it on a site for public knowledge even though the book clearly states "Copyrighted"
It would benefit others! Same way the engines cache pages, to benefit others! I think I have a whole new game plan :)
Not in the long run, since you would remove the incentive for the authors/publishers to create new material. Are you saying that Google's cache feature (which you can easily opt out of) has removed the incentive for you to publish new creative material?
Ok, what about the Wayback machine?
personally I use the cached pages more :)
And google is willing to go to court for their actions. Would you be willing to go to court for your's?
As a matter of fact, the example you give has a huge volume of information that you are quite free to reporduce on your website, as it is not covered by the copyright.
As has been pointed out hundreds of times on WW, fair use is a huge gray area in the law, and it is that way intentionally. The courst are to weigh the value of the use against the rights granted the copyright holder.
Google's actions tell me that they have carefully considered their legal position, and that it is not as flippant as one put forth by someone on webmaster world with 30 seconds of thought.
To start with, you will notice that they do not serve up ads on that page to reduce their exposure under the first factor.
They only serve up one page at a time and do not change the links or images to those served up by their own site. That is limiting the quantity that they serve up.
In your example, that would be like serving up an image of one page of a book, which is quite often covered under fair use.
Like I said, I am not saying that they are definitely covered under fair use. I am saying that it is open to the *possibility* that the ruling could go that way, and that I am absolutely certain that they have spent far more time thinking about it than anyone here on webmaster world.
None of us is a US District Court Judge. And any such judge will not rule on any opinion or hunch of anyone posting here. They will be rulling on the actions and arguments put forth in the case. And you can be certain that Google will put forth a good case if it should ever come up.
I think that because you have the potential to block Google from caching your pages and your images then it is up to the site owner to be aware of that technique.
Ignorance of the law is not a defence and neither is creating a site without full knowledge of what your getting into.
In the UK you have to put up a "Private Land" sign for people to be culpable for tresspassing.
|If what you are saying is true I can copy this information to Digital and produce it on a site for public knowledge even though the book clearly states "Copyrighted" It would benefit others! Same way the engines cache pages, to benefit others! I think I have a whole new game plan :) |
Or you can "cache" a few real copies at your house and sign them out to people at no charge. Of course, you could do it for hundreds of books. I mean, who cares that dozens of people will get to read each book without any benefit whatsoever to the original creator? Sure they got paid for the first copy, but what about lost revenue from all the potential sales...
Or does someone do that already? ;)
Apologies in advance if my language is too tongue-in-cheek. I seriously couldn't think of a less flippant way to say it. :(
Google and other engines will get sued and eventually be forced to stop that unless permission is given. One site somewhere will make a real good case about how they got hurt by this.
You can ban G from caching but it's an extra step...SEs will eventually require (IMO) you to put Cache in the meta instead of the other way around. You give Google permission to index by posting it online but I don't know if that extends to caching..even though you know they do it.
Think about this: many newspappers, magazines charge money for older articles. A lot of money is lost because of it. You need an article about John Doe and see if you didn't get it for free from Google's cache, you'd pay NY Times $2.50. Multiply that by how many NY Times and people looking for info are out there. To make matters worse, Google is making money from this (better site because it includes articles for free when they cost, more people visit it, click on ads etc.)...indirectly but still making money.
|Google and other engines will get sued and eventually be forced to stop that unless permission is given. One site somewhere will make a real good case about how they got hurt by this. |
While I see this posted alot, I never see any analysis of why, by anyone knowlegeable about Copyright and Fair Use.
The only thing close that I have seen by attorneys has suggested that while it is a tough fight, there is a very real possibility that Google could win.
If you have a lawyer that tells you that there is no way for google to win, then you better get yourself another lawyer that knows the ins and outs of copyright, fair use and DMCA safe harbor provisions. Because the good lawyers are still arguing about ways that it could come out in Google's favor, and Google has good lawyers.
|SEs will eventually require (IMO) you to put Cache in the meta instead of the other way around. You give Google permission to index by posting it online but I don't know if that extends to caching..even though you know they do it. |
This makes the most sense to me. We started with the G no cache tag and then as other engines started caching we switched to the robots no cache tag which was a pain.
"Copyright and Fair Use"
entire pages and articles are NOT fair use. Snipets like those used to display search results and G News are.
...and we're not laywers (at least I'm not), we're just giving opinions. But, if Google can afford good laywers, so can another 10000 companies out there.
[edited by: walkman at 4:19 am (utc) on Nov. 23, 2004]
|entire pages and articles are NOT fair use. Snipets like those used to display search results and G News are. |
Can you show me this either in USC or case law?
The quantity is a consideration in a Fair Use determination, but it is only one factor.
Fair use could be as little as none, or as much as all. It is rarely at either extreme.
An example of where it is ALL, is personal use copying for the purpose of time shifting or backup copies. How about libraries putting newspapers on microfilm?
As for entire pages, they are quite often well within fair use limits because they are only a part of the whole work.
Do you want me to quote case law? We can start out with the Betamax decison. I have plenty more if you want. Reading them is quite educational.
The point I think most webmasters that are against being cached is that they shouldn't have to opt out of this, you should have to opt in, if anything.
According to laws of the USA...
|Internet is not considered public domain simply because it was posted on the Internet and free for anyone to download and copy. |
|entire pages and articles are NOT fair use. |
Actually, you can write a hundred page book that is about someone elses thousand word article and include the entire text of the article as part of the book and that would be considered fair use.
Now, in the case of cached pages on search engines, this probably doesn't apply -- but it's important to a discussion of fair use to understand there are situations where an original work in it's entirety can be used by someone other than the owner.
Now is each page of a website it's own original work or is it like a page in a book -- only a part of the overall original work? I can think of sites which work one way and others which work the other way.
Amazon.com shows sample pages from books it sells without obtaining specific permission from the original author. Amazon must do it because it helps them sell books -- the authors (or publisher) must not complain because they consider the net effect of what Amazon does to be more beneficial than trying to stop Amazon.
This is not a discussion about public domain, and no decent lawyer would ever claim that it is. You can be certain that Google would not be stupid enough to ever try and claim that in court or in public.
But the courts do treat information differently depending on where, how and in what form it is made publically available. And the fact that there are affimative steps you can take to stop the caching *will* play a part in any possible court cases. As will the safe harbor caching provisions and Fair Use analysis.
You can be bothered by it all you want, and you can declare that it should be done a different way, but the court will decide based on the law and the facts of the case as presented by the counsel.
And you better pray that whoever files suit the first time has the best possible counsel, because a bad job of lawyering can lose a winable case, and that can set precident against you.
|According to laws of the USA... |
Internet is not considered public domain simply because it was posted on the Internet and free for anyone to download and copy.
I happen to agree that statement is correct (although I doubt that SE's cache because they don't understand public domain). But the source of that quote can only be considered another person's opinion, not a definitive interpretation of US law.
(I'd post a link but because of the nature of the site I think it would be against the TOS here, sticky me if you want it.)
Bigdave, My point about the Internet not being considered public domain is that just because it's on the internet doesn't mean me or anyone else is giving Google permission to download my work and than to posting it online.
| This 54 message thread spans 2 pages: 54 (  2 ) > > |