|Appeals court decision on image search engines|
Partly "fair use," partly illegal
The Ninth Circuit federal appeals court has issued a decision on fair use and image searches:
The Ninth Circuit says that image thumbnails are "fair use" but the use of the full-size image in a search engine's frame is not fair use. One reason thumbs are okay is that if you try to enlarge the thumb from the thumb itself, you lose all the important resolution. The reason a full-size import of the image in a search engine's frame is not fair use is because the user may not be aware that the full-size image was in-line linked from the original site, and the SE's frames have their own content surrounding the image.
This is a pretty good decision, it seems to me. It's the last nail in the coffin of framing of off-site copyrighted material.
Google's image search, in my unprofessional opinion, can live with this decision. Their click-through that shows full size doesn't frame the image with extraneous material from Google itself. Also, the original page in the original context is available on the bottom half of the screen.
> CONCLUSION We hold that Arriba's reproduction of Kelly's images for use
> as thumbnails in Arriba's search engine is a fair use under the Copyright
> Act. We also hold that Arriba's display of Kelly's full-sized images is
> not a fair use and thus violates Kelly's exclusive right to publicly
> display his copyrighted works. The district court's opinion is affirmed as
> to the thumbnails and reversed as to the display of the full-sized images.
> We remand with instructions to determine damages for the copyright
> infringement and the necessity for an injunction. Each party shall bear
> its own costs and fees on appeal.
So does that mean that Google's 'cached' version of a web page is a 'full sized' image?
By the standards of the legal system, I doubt a cached web page counts as any kind of image. But the ruling does open doors for a challenge of the Google cache.
I just finished reading the opinion, and I'd have to think that it indeed has some implications for the Google cache.
One could easily argue that a traditional SERP listing is the equivlant of the thumbnail, while the cached page is the equivlant of the full size image.
Absolutely. But the ruling in and of itself doesn't cover the caching issue. It just opens a great big bomb-bay door for anyone to drop a lawsuit out of. I'm sure the precedent could be expanded to include the Google cache rather easily by anyone with enough $$$ to out-legal Google.
> I'm sure the precedent could be expanded to include the Google cache.
I would have thought common sense implies although this is an optimistic view where the law is concerned :-)
1. When one creates a website on a server which is available to people on the internet, one in effect may be giving a limited licence for any user that finds the website to view the materials in their computer screen.
2. Computer operation means that you are in effect granting them limited copyright, similar to that you grant someone to whom you give a photoraphic print - to display this one print at home or at their place of work but not to replicate it or sell replicas of it for commercial gain, because their computer must obtain one copy of the data (cache) in order to display the file.
3. This is not the same as granting other parties a licence to use or replicate your content for their commercial gain yet this is what the google cache and search engine page summaries actually do.
3.1. However if you have submitted your site to these search engines then you have in doing so granted them such a limited licence because such use is implied and is a quite clear result of submission. Note the open directory terms and conditions are one that take some reading and could be cause for concern if you read them closely.
3.2 If you did not submit your site to search engines but they found your content and allowed users to browse your content giving the impression that your content was their content, this would clearly be a copyright infringement.
However as google and vivisimo (which may be a closer parallel to the lawsuit mentioned), simply show what YOUR page looks like or looked like last time they checked it, I think "fair use" would almost certainly be granted and the lawsuit fail.
Why would you put your computer files on an open server rather than your hard disk unless you want people to find them?
Why would you put them out there with no password access system unless you wanted people to find them.
Why would you submit the url to search engines unless you wanted people to find them..
Why would you put photographs on your site unless you wanted people to see them.
The search engine is in effect the photographer standing in the public area of the street and taking photographs of what it sees, this the photogapher is allowed to do in most countries of the world.
What they are not allowed to do is take photos on private property without the owners consent nor sometimes take and commercially sell or use photographs of which the subject is the copyrighted work of an other and for which permission has not been obtained nor sometimes sell or commercially exploit identifiable images of people who have not given their consent unless this is newsworthy and done in the case of publicising a news story.
Clear as mud? it seems not to me :0(
Why would you put your computer files on an open server rather than your hard disk unless you want people to find them?
Of course you want them to be found, but in the case of many sites, the Google cache will not correctly display their site's images, or advertising (if any exists). You want visitors at your site, and people viewing it through Google's cache are not coming to your site, they are staying at Google's site because of the "stickiness" of YOUR content.
By putting up a website, one does indeed grant limited use to site visitors to download the content to their browser for the purpose of viewing it. That doesn't mean we grant other websites permission to present our content on their own server, or under their own branding. And I think use of content in such a fashion should require an Opt-In permission scheme, rather than Opt-out (with the no-cache metas.)
Another issue is the extent to which Google's page cache is operated for profit. The court said:
> There is no dispute that Arriba operates its web site for commercial
> purposes and that Kelly's images were part of Arriba's search engine
> database. As the district court found, while such use of Kelly's images
> was commercial, it was more incidental and less exploitative in nature
> than more traditional types of commercial use. Arriba was neither using
> Kelly's images to directly promote its web site nor trying to profit by
> selling Kelly's images. Instead, Kelly's images were among thousands of
> images in Arriba's search engine database. Because the use of Kelly's
> images was not highly exploitative, the commercial nature of the use only
> slightly weighs against a finding of fair use.
Google's caching of pages has two consequences: a) a heavy-handed branding of Google on top of the original page, and b) the attractive, colorful highlighting of search terms.
Arguably, both of these serve to promote Google more than Arriba was using the image search to promote Arriba. One could show that the technical resources needed for an on-site, colorfully-highlighted, full-text search of all of a site's pages are beyond the resources of the average website owner. Therefore, Google exploits this state of affairs for commercial gain, by offering a value-added approach to viewing websites. Such an approach competes with the copyright holder for site traffic. It keeps surfers on Google's site, enhancing Google's ad views, while it deprives the copyright holder of potential ad revenue and "sticky" traffic. Most websites are trying to convey an entire experience over a succession of related pages; Google detracts from this experience.
The question is not whether we owe Google a debt of gratitude for referral traffic. Of course we do. The question is whether they should have taken the extra step of caching our pages.
I think it's a strong argument for an "opt-in" requirement for the Google cache, rather than the present "opt-out" situation. But then again, it's always a question of money and lawyers.
>I think it's a strong argument for an "opt-in" requirement for the Google cache, rather than the present "opt-out" situation. But then again, it's always a question of money and lawyers.
Everyman I think you are probably right especially as the cached content is often way way out of date if google is not visiting often .. opting in and accepting opting in would require some up to dateness of the cache.. but this is a slippery slope because you are almost offering them the licence to display your complete website within theirs processing the content (highlight etc) as they wish ..
You only have to take a look at what they are doing with pdf files with their view as html option .. something that one puts in a pdf format for control and aesthetic reasons google engineers convert into a pile of ugly rubbish. Well done indeed.
Hrm. So if I stole an image from another site, but reduced it by 1 pixel in either direction, it would be legal? Thats what it sounds like to me.
Very good point. Now there's a greater gray area for defining what is a thumbail? Is it a certain size, or any image that is a smaller representation of a larger image?
What a can of worms.
I spent 2 uncompensated miserable hours of my time this morning discussing various copyright issues with three different people, including our resident legal advisor, Mr.Idiotgirl, who tends to answer my questions with the same vagueness I would expect from any other attorney. (Mr. Idiotgirl is in the doghouse when he comes home today, BTW)
In cases like this - the only people who make it out unscathed are the attorneys. Until you (plaintiff) are rendered a favorable judgment, you are basically ice skating uphill - no matter how "right" you may be. These gray areas only make it more slippery.
Google could drop the cache and regular users wouldn't even notice....they don't use it, only webmasters do.
The entire scope and purpose of the cache along with Google's other "toys" is for spam detection and prevention. Regular users could care less what a page looked like a month ago. Think about it.
So what are you all saying? From this post and one I did in the Google forum about some site that was stealing content, the bottom line appears to be this:
It's the Wild West, folks. Here come da judge once every two years, but he don't matter at all. In two years you can steal all of the cattle, shoot those who don't shoot you first, and put enough money in the bank to buy off the judge!
Is this an accurate assessment of where we are?
'Perzactly. Plunder, steal, and take your chances with Johnny Law. And Johnny Law has some mighty short arms.
|brotherhood of LAN|
In short, Google's cache, and indeed full size images are not the property of Google, and this is what makes it an issue.
But I've seen on other posts the mission of Google for almost 'real time searching'.
By the time the law cut through all the red tape to get at Google (who cares about the other search engines!) the search engine wont have the need to host the cached version of anything on its server.
Thank you for the comments Mark. You brought up several points of minutia I'd never seen referenced in relation to caching before.
>The search engine is in effect the photographer
That's an interesting analogy, but I think a better one is a pirate radio station. The search engine that redistributes your content, is in effect, rebroadcasting it.
We hear it during almost all sporting events:
"The use of this telecast is for the sole enjoyment of our viewing audience. Any other use...."
Just as the television stations put their signal on the air for anyone to tune into, I put this website online for you to tune into. That does not give you the right to redistribute or rebroadcast it from your site.
>exploits this state of affairs for commercial gain
Sure they do. Far more people use the cache that is generally believed. Get a js counter and try comparing for yourself. The usage is quite high. The benefit is the highlight and the speed.
Regarding whether Google would ever adopt opt-in vs. opt-out permissions for page caching, I think the decision between adopting either option might vary based on the world region being served.
At a recent "Legal Pitfalls of eCommerce" discussion I attended here in Burlington, Vermont (USA), Bill Schubart (head of major ecommerce/fulfillment company Resolution) mentioned a privacy directive recently issued by the European Union that would, in effect, prevent direct marketing companies from selling a person's contact info unless they received his/her explicit written permission each time that info was being sold or traded. His point was that this differed greatly from the US, where apparently opt-in permission is required only once, and as long as the contract spells out that the company has the right to share that info with third parties, from that point on the person never knows which firm has been trading in his/her personal data.
My point is that, given differing business traditions in Europe and the US, Google might wind up with restrictions on page caching in one market (say, Europe), but free reign to page cache in another (say, US).
...Another interesting point made by Mr. Schubart (I don't work for him, honest! :)) was that the rate of technological innovation (e.g., page caching, thumbnail display) is far outpacing the rate of statute creation. This was reinforced by a business lawyer I spoke with at the same discussion who recalled one time asking another corporate lawyer how the threat of regulation affected his company's practices re new products, marketing tactics. "We don't even worry about it," he supposedly said. "They're [i.e., legislators] so far behind us that we can't afford to guess what rules might come. We just do what we have to, and we'll deal with the regulation if it ever catches up to us."
"something that one puts in a pdf format for control and aesthetic reasons google engineers convert into a pile of ugly rubbish"
but the point of that is so an information seeker can see the information they were looking for quickly and easily. It's the same with the cached pages - how many times do we find a 404 or changed content when we click on the search result? thus the cached page helps us quickly find the info we were looking for in the first place.
however i do believe it should be an opt-in rather than opt-out system. most certainly it's a grey area, not forgetting it's only those search engines that obey the robots.txt file that will "allow you" to opt-in/out.
>> Google could drop the cache and regular users wouldn't even notice....they don't use it, only webmasters do.
That simply isn't true. As a surfer myself, not in a webmaster role, I've used the cache a number of times. It makes it easier to find information on a page with the highlighting, it lets you skirt around slow or overloaded servers, it blocks popups (or so it seems), and it lets you at the content after it has been deleted or changed (in the case of some forum systems or such nonsense). I've even talked to people who hardly ever click on anything *but* the cache. Do you think google would leave it running if webmasters were the only ones using it? I don't. It costs too much in bandwidth.
Re PDF "but the point of that is so an information seeker can see the information they were looking for quickly and easily."
Jammy the thing is that one used to be able to put things in pdf files that you knew would not be indexed by search engines.
Google suddenly changed that starting to index textual contents. A number of people were a little surprised because they did NOT want or expect the contents of their pdf files appeaing in search engines.
Comes back to the items mentioned before..
Quite apart from the legal point of view..
If you dont want anyone to find it,
Dont put it on an internet server.
Maintaining old out of date or removed content a la "archive" is also very dodgy.
Imagine someone puts up a libel and then realising this removes it from their server, people will still be able to find it .... it may never go away..
>>This is a pretty good decision, it seems to me. It's the last nail in the coffin of framing of off-site copyrighted material.<<
Maybe, maybe not. There are substantial differences between displaying an offsite image within a page and framing a third-party page that has all of its identifying logos, navigation links, ads, etc. intact.
IMHO, the issue of framing offsite pages (as opposed to images) has faded in importance for two reasons:
- A site that objects to being framed can add framebreaking code to its pages. (That's much cheaper than filing a lawsuit.)
- Sites that earn revenues from e-commerce, pay-per-click ads, or affiliate sales may welcome the traffic that comes from being framed by About.com, Ask Jeeves, and other large sites that use "outbound ad frames."
Interestingly, the ability of a website owner to "opt out" with a NOARCHIVE meta or a robots.txt is of absolutely zero interest to the Court. The facts of this case as specified by the Court indicate that Arriba complied with Kelly's request to opt out. But these facts, while mentioned here in the factual summary, were not mentioned at all in the rest of the decision, and had no bearing on Arriba's liability:
> In January 1999, Arriba's crawler visited web sites that contained
> Kelly's photographs. The crawler copied thirty-five of Kelly's
> images to the Arriba database. Kelly had never given permission to
> Arriba to copy his images and objected when he found out that
> Arriba was using them. Arriba deleted the thumbnails of images that
> came from Kelly's own web sites and placed those sites on a list of
> sites that it would not crawl in the future. Several months later,
> Arriba received Kelly's complaint of copyright infringement, which
> identified other images of his that came from third-party web
> sites. Arriba subsequently deleted those thumbnails and placed
> those third-party sites on a list of sites that it would not crawl
> in the future.
The opportunity to "opt out" doesn't seem to cut it under copyright law at all. Google's various robots.txt options and meta options may serve to placate webmasters and keep Google out of court longer. But should Google ever find itself in court, they apparently would not help them. This strengthens the case against Google's cache copy of textual material. The entire discussion of the "transformative" nature of copying in the decision would also seem to go against Google in the case of text caches, even though it went in favor of Arriba in the case of thumbnail images.
Finally, the Ninth Circuit's jurisdiction includes California, which means this decision is binding in Mountain View. All we need now is to get someone on the West Coast (so that the jurisdiction cannot be disputed), with deep pockets and a strong set of facts, to file suit against Google. That would be the end of Google's cache.
I'll even go so far as to say that Google will not file for an IPO until they either 1) get a favorable legal decision from the Ninth Circuit or Supreme Court that clearly makes their cache practices legal, or 2) they phase out the cache on their site.
The first option, a favorable legal decision, does not seem likely anytime soon, so I would expect to see changes in their caching in the very near future.
The risk of getting sued is a serious one for Google. If everyone is watching your stock price every day, it becomes more than serious. Losing a copyright suit opens up all sorts of liability issues, and these can have a dramatic effect on a company's stock.
I can't understand what the fuzz is about...
First of all those cache's are barely used by people who use the internet. They want up to date information and not an older version of a website (the cache won't be up to date.
Second, Google gives webmasters the choice to have their sites displayed in cache or not. So if you're not interested in a cache version, then use that non-cache code.
Third, I like the cache version. When a website seems to be down I still can browse through it's information with Google's cache.
Going back to the original topic, I find it very interesting that "framing" was specifically mentioned.
I wonder if About.com and AskJeeves will stop framing now?
|brotherhood of LAN|
PGSBS, in fact, people do use those cache's quite alot
As my site has grown, the contents of two folders have diversified into 6, and the previous pages of the two folders redirect to the page offering links to all 6 folders
for some people, instead of finding an exact page, they dont care if its in the cache or it was freshly made yesterday, they just know that the cache contains the results they were looking for
Personally I would get rid of the cache. Any URL's of mine in the search engine that are outdated are always redirected to a page very relevant to the old listing.
So, basically I dont like the cache, I say get rid of it :)
This discussion is not about whether you like the cache or not, or whether it makes more money for you or not, or whether almost no one or almost everyone uses the cache copy, or whether the webmaster has adequate control with the meta tag and robots.txt.
The issue is whether Google's text cache (which, by the way, includes the HTML and/or text versions of PDF and other documents that are available) is legal in light of this Ninth Circuit ruling.
This ruling is the most definitive on the topic, by the highest court that has ruled on it. Chances are that it will not be appealed.
My observation is this: while the "opt out" options offered by Google make me as a website owner somewhat satisfied, they don't matter under the Copyright Act of 1976. I'm no lawyer, but it appears that a Court decides first whether there was a copyright violation, and then decides on damages, if any. The "opt out" options have no bearing on the first point, but may have a bearing on the second -- if the website owner knew about them and failed to take advantage of them.
That still leaves us with the Google text cache as a copyright infringement. I think they could tweak their image search to make it legal under this ruling, but I don't see how they can tweak their text cache under this ruling.
Since copyright law is a civil matter (except for that recent criminal provision in the DMCA, which doesn't apply here), you have to challenge the text cache by suing Google. This could be a class action, or could be an individual.
The perfect case from an individual would be something like this:
1) A website owner somewhere in the Ninth Circuit puts up a website with textual content. There are HTML, plain text, PDF, and some Word documents.
2) There is no robots.txt -- the website owner is not aware of bots or robots.txt protocol.
3) The website owner realizes that he has no control over those who link to his content, but charges a fee for those who copy and paste his content directly onto their own website. (This is not so bizarre; most online journals and newspapers have such a copyright policy -- linking is okay but not copying.)
4) Google comes along and sucks up the site.
5) Google also sucks up a few third-party sites that have pasted this content onto their own sites. (Whether the third-party site did it with or without permission isn't relevant insofar as Google's action is concerned).
6) The website owner notifies Google that their cache copy is in violation of his copyright. Google responds with a reference to their robots.txt and NOARCHIVE pages. To be safe, Google also bans the site entirely.
7) But Google doesn't ban the third party sites, because Google doesn't know about them. It continues to crawl these sites. The original website owner still has legal standing over Google's copyright violation on these third party sites. To put it another way, Google may be smart enough to take corrective action in the first instance, but a bot is a bot, and Google doesn't have systems in place to take remedial action on third party sites.
8) The website owner files suit for coypright infringement. The U.S. District Court judge is required to follow the Ninth Circuit's lead and concede that there was a violation. The case goes to the damage phase.
9) The damages may be slight, but the precedent is worrisome. With millions of sites out there, Google's attorneys are forced to recommend that they stop caching documents because it's not worth it for Google.
There was an issue that came up a while back with free hosts making it part of their conditions of usage agreement to take space that they claimed copyright over any content that was put on the sites. A number of people, especially graphic artists, moved to other hosting providers. If that condition of usage was ever effectively legally challenged I'm not aware of it.
Using that logic, what's to stop Google from making inclusion in the cache a condition of inclusion in their database, and eliminating the option of choice for caching altogether. It would certainly eliminate a lot of the spam problems. As long as it were to be consistent with this ruling as a guideline, on what grounds could the condition be challenged? It seems that would rely on whether inclusion in their index constituted a right or a privilege subject to their terms of service.
If Google were to make inclusion of the cache a condition of inclusion in their indexing, it would make the entire Google system opt-in instead of opt-out.
Yes, I think this would be perfectly legal. But it would also mean that the total number of pages in Google would drop from two billion to a few thousand, and slowly rise from there as the word gets out that you need a special meta or a special opt-in equilvalent to robots.txt.
It's not an option for Google at this point, because it would hurt them much more than simply dumping the cache copy. Dumping the cache copy would hurt them a little, but they could recover. Making their entire system opt-in would open the door to a competitor that crawls the entire web and doesn't use a cache copy.