Google cache raises copyright concerns

Forum Moderators: open

Message Too Old, No Replies

Google cache raises copyright concerns

Clark

9:09 pm on Jul 9, 2003 (gmt 0)

Everyone loves to write about Google:
[news.com.com...]

aravindgp

12:04 am on Jul 12, 2003 (gmt 0)

[news.com.com...]
CNET news July 7, 2003, 8:40 PM PT

Search engines' display of miniature images is fair use under copyright law, a federal appeals court ruled Monday, but the legality of presenting full-size renditions of visual works is yet to be determined.

Europe Visitors wrote:

>>BTW, the caching and serving (redistribution) of entire pages is a whole different kettle of fish from indexing a page or quoting snippets of text on a SERP.

When we look at the above article and what EuropeforVisitors wrote it directly states that cache is perfectly fine under federal law, if it's just a snippet of the whole website.Is cache a snippet of whole website?I have been reading over and over again at this forum,to determine this aspect.

I would love to hear arguments on whether cache is the whole website snapshot or it's a reproduction of webpage.

kaled

12:14 am on Jul 12, 2003 (gmt 0)

europeforvisitors said

the fact that Google honors the "nocache" tag means that a technical remedy is available--and the technical remedy is obviously more practical for most people than a legal remedy

There may be automated tools available that will add this meta tag to all the pages in a site, but I for one have no such tools. That means that I have to manually add this tag to every page. Google should devise a global solution to this problem (that applies to directories). Meta tags could then be used for individual pages or better still, a single text file that contains a list of all exception pages could be used. Perhaps this could be called robots_ex.txt

If there is some sneaky trick that can already be used to achieve this, I would certainly like to know what it is.

Kaled.

kaled

12:45 am on Jul 12, 2003 (gmt 0)

Question for GoogleGuy (I know I'm bending the TOS here).

Pages from my website don't work in the cache because of my use of javascript. My script files appear to be cached correctly (to my surprise) but subsequent files that a called (framesets) are not. So, is there a simple javascript method I can use to detect the Google window and thereby abort all further actions? I could look for search?q=cache in the url, but is this guaranteed to work in the future?

I don't mind Google caching my pages, but I would like them to work if they are cached.

Kaled.

taxpod

12:50 am on Jul 12, 2003 (gmt 0)

Be careful how you extrapolate rulings such as this. They are hardly ever despositive. Extending rulings without a thorough reading of the fact pattern is dangerous business. Also rulings by the 9th U.S. Circuit can be handled differently by the circuit in your area which would be controlling on you until overturned by the Supremes. The thing to do is to follow developments but don't be convinced by one decision unless that is by the big guys and gals.

mcavic

12:55 am on Jul 12, 2003 (gmt 0)

the fact that Google honors the "nocache" tag means that a technical remedy is available--and the technical remedy is obviously more practical for most people than a legal remedy

I certainly agree with that.

Google should devise a global solution to this problem (that applies to directories). Meta tags could then be used for individual pages or better still, a single text file that contains a list of all exception pages could be used.

I agree, it would be nice to have a global solution. Maybe the robots.txt definition could be expanded to have more functionality. I don't see why robots.txt couldn't have more tags than just Disallow:.

europeforvisitors

1:19 am on Jul 12, 2003 (gmt 0)

I would love to hear arguments on whether cache is the whole website snapshot or it's a reproduction of webpage.

It is the Web page, source code and all, with Google's annotations tacked on.

For the equivalent of a "snippet," enter a site's URL at www.alexa.com and see how Alexa does it. (Alexa shows a small snapshot of the site's home page.)

BTW, I personally have no objection to Google's caching of my pages, since the cached versions include the pages' AdSense ads and affiliate links (which are the sources of revenue for my site). Nor do I think the caching itself is copyright infringement. It's the serving or distribution of the cached pages that constitutes copyright infringement--although the issue may be somewhat academic because of the difficulty of proving damages and the availability of the "nocache" tag.

Side note: In my opinion, Google's display of cached pages is much more benign than the practice of running ads in frames above third-party pages (which About.com, to name just one major site, has been doing for several years). Google, to its credit, doesn't run ads on cached pages; its intentions appear to be altruistic or at least benign.

havarian

11:32 am on Jul 12, 2003 (gmt 0)

phpproject u asked for some examples, that it could cost u money, with the caching from google, i think an example would be a deep linking to a specific page where the site was build for the surfer to see the adds before that page.

grifter

2:51 pm on Jul 12, 2003 (gmt 0)

The NYT probably has better issues to address, like the easily hackable URLs by adding parameters, stuff like '&partner=GOOGLE' when combined with others to get around registration. Maybe they should start there first.

Kackle

4:57 pm on Jul 12, 2003 (gmt 0)

There are a lot of strange opinions in this thread.

1. Most search engines cache the entire page in compressed form. This is how they retrieve the snippet that contains the search terms. Publishing the snippet would almost certainly be considered "fair use" under U.S. copyright law, just as the Ninth Circuit considers thumbnail images to be "fair use." That's because the thumbnail has dramatically reduced the resolution of the original image, and cannot really be considered a substitute for the original. It's very similar to the snippet situation.

2. The issue isn't the snippet, or the non-published cache. The issue is the fact that Google publishes the entire cache copy unless the copyright owner takes the trouble to opt out.

3. It hasn't come up in court because you need a very well-crafted set of facts surrounding the lawsuit before you are in a position to address issues of law. If you don't have the right set of facts, courts will sidestep the issues of law. The facts of the case determine which points of law are relevant. Many judges would admit factual arguments such as these:

a) Was the plaintiff aware that a robots.txt protocol is available?

b) Was the plaintiff aware that a NOARCHIVE meta is available?

c) Was the plaintiff aware of Google's page removal options?

d) If damages are claimed, and plaintiff was aware of any of the above three options, then did the plaintiff fail to exhaust available remedies? If so, is the claim to damages moot? If the claim to damages is moot, does the plaintiff have legal standing?

That's why it hasn't come up in court. In my opinion, it is very unlikely that a case will arise from a single webmaster that will get very far.

I think the class action option is the most likely method that a legal challenge would present Google with something to worry about. And I think it would have to be based on the argument of unfair competition. Accordingly, I believe that three groups are in a position to think about class action. One would be a collection of webmasters, another would be a collection of publishers, and the third would be other search engines.

The first is not an option because webmasters are not organized, and a lawyer would have a hard time putting together a class action consisting of webmasters. Also, webmasters are the most likely people to be fully aware of the three remedial options mentioned above.

The second might work because publishers are increasingly hiding archived content behind a registration screen or pay-per-view option. The New York Times moves their stuff into a pay-per-view archive after seven days, I believe. If you had a group of publishers filing a class action, a lawyer could make the case that Google's cache is a violation of their copyright. I don't believe that any single publisher could pull it off, but in a class action, you would have enough variety within your factual circumstances to add grist to the plaintiff's general case.

The third would be other search engines. Unfair competition would be a strong argument here. In fact, I would imagine that Overture and Yahoo and Microsoft are already mulling over the Google cache situation. What do you do if you want to compete with Google? How can you compete with Google if you don't address the cache issue? Do you challenge Google over it, or do you compete with Google by adding your own cache even if you think it is illegal?

I agree with Brett that the cache made Google. And I don't see how any engine can compete with Google unless they come to terms with the Google cache, one way or another.

Brad

5:09 pm on Jul 12, 2003 (gmt 0)

>>In fact, I would imagine that Overture and Yahoo and Microsoft are already mulling over the Google cache situation. What do you do if you want to compete with Google? How can you compete with Google if you don't address the cache issue? Do you challenge Google over it, or do you compete with Google by adding your own cache even if you think it is illegal?

Bingbingbing! Kackle you get the door prize! :)

This is exactly what I think is going on and will happen. M$ plays for keeps -- if I were Bill Gates and about to launch my own SE, I would be asking, "Where is Google vulnerable?" and "What advantages does Google have?"

Now if the answer is the cache and if filing a lawsuit will exploit a vulnerability _and_ take away a Google advanatage, plus you have cash to burn, what would you do?

M$ plays for keeps.

kaled

6:19 pm on Jul 12, 2003 (gmt 0)

If you add the meta NOCACHE to each web page manually, this would average, say 15 seconds per page. If you have a hundred (static) pages on a website, that's, say, half an hour's work.

Can we bill Google for this time? I don't think so. However, the class action approach might allow a few thousand webmasters to get together and sue Google for requiring them to waste their time.

This is not something I would get involved in myself, but I would certainly watch any such case with great interest.

Kaled.

John_Creed

7:01 pm on Jul 12, 2003 (gmt 0)

The Google cache is a copyright violation, no question about it. However - I LIKE the Google cache feature and I use it all the time. How many times have you used Google and clicked on a website and the server was either having temporary problems or the site was offline completely?

I think Googles reasons for using the cache are pretty innocent(Aside from the fact that Google wants to "take over" the web). It's not like they're removing copyright notices or adding their paid banners to our sites. All they're trying to do is help and provide a service for their users.

Getting all worked up over this is pretty silly considering:

(1) Google supplies most of us with a lot of our traffic. We WANT a relationship with Google.

and

(2) There is an option availuable for those who don't want their sites included. Yeah I know it's a hassle adding it to every page if you have a big site, but do you know what's easy to do and only takes a few seconds? Blocking Google all-together using a robots.txt file.

But having said that, it does appear to be a copyright violation and just because I dont mind having my sites cached doesn't mean everyone else should follow suit.

Personally I hate it much more when a crappy search engine like Ask.com puts my site behind a frame.

Kirby

7:13 pm on Jul 12, 2003 (gmt 0)

One of the benefits of a class action is that it does organize plaintiffs that wouldn't otherwise organize themselves. It could move right along with a lawsuit from a publisher and suits from other companies and individuals (see below).

Bill Gates holds the digital copyrights to numerous images that I'm sure Google has cached.

M$ does play for keeps, and more importantly, they win in court. Time is also on the side of M$, and a lawsuit from M$ is more a matter of the right timing and what serves their interests. Lawsuits are often about more than collecting damages.

As for the argument that publishers wont sue for fear of disappearing from Google, this hurts Google more than the likes of the NY Times.

Another weak argument is that the burden is on the publisher. I doubt this would fly. Copyright requires notice, not much more. Google also makes an argument that hurts themselves as they have put themselves forward more as the library reference desk, and not an unabridged enclycopedia. The courts may tend to agree.

hutcheson

7:54 pm on Jul 12, 2003 (gmt 0)

>If you add the meta NOCACHE to each web page manually, this would average, say 15 seconds per page. If you have a hundred (static) pages on a website, that's, say, half an hour's work.

Yes, but how much is it WORTH? Half an hour's handwork from a so-called web developer who's too stupid to know how to use a computer? Pretty close to worthless, I'd say. Maybe you should pay Google for this simple test to detect totally clueless contractors!

What would you think about a mason who took longer to, say, repair your house because he didn't believe in metal chisels -- he just used his teeth? Would you expect the insurance company to pay for that extra time?

rtiainen

9:04 pm on Jul 12, 2003 (gmt 0)

Copyright requires notice, not much more.

Actually, in most contries, copyright does not require any notice. It is a immaterial right that you get when you produce something original. I have a feeling that this is the situation in the States as well, but don't take my word on it.

Anyway, about the subject - instead of the current noarchive meta tag, I think a more practical way to deny archiving is needed for those of us who do not want their sites archived by Google, Wayback Machine or any other similar service. Perhaps something like robots.txt, a some sort of "archivers.txt" that is fetched by the crawler and the processed accordingly.

Regards,
Sami

XtendScott

11:05 pm on Jul 12, 2003 (gmt 0)

OK, so if G looses cache to copyright then I could see the same thing hapening to our "internet cache". After viewing a page with my browser, I have a complete copy of the page or even "site" on my computer. Are they going to have to change browsers not to cache unless specifically allowed.

I feel that with Robots.txt and <NOCACHE> the copyright issues is not being abused. As said earlier they are not advertising on them, they are specifically saying that the content is not for their site, they don't cache the images(they are pointed to the site).

ISP are caching websites to improve bandwidth. I feel that this is either spured on by M$ or NY Times is to lazy to put a robots.txt on their site and make sure it is correct. That should take 5min for a slow webmaster to accomplish rather than the <nocache> on each page.

On the other side I really see very little benifit to the average user that has not a clue that the site is cached on G or even what a cache is.

All content in this message is just My Opinion.

Scott

killroy

12:01 am on Jul 13, 2003 (gmt 0)

First of all, the browser cach is there for the sole purpose of lettign you view the page. Googles cache is not htere for google to view the page, but for google to show the page to somebody else. If you'd package your cache onto a CD and sell it to third parties, I sure think there would be an issue.

Also, copyright is a right. Not something you have to ask for, apply for or work for. It's yours by the simple act of creating original work. Saying "it is simple to add NOARCHIVE" is completely besides the point. You could write a book, and by the "simple" method of locking it in a safe and never readign it to anybody, nobody could copy it. But you don't need any rights for that. Copyright gives you the explicit right to present and publish and broadcast your own work, WITHOUT having anybody else do so as well.

Google is clearly republish the work of other authors and breaking copyright. There is simply no argument about it.

That it possibly benefits the webmaster is besides the point. And since noby sued them yet and got away with it, they're alright. But that doesn't mean that authors suddenly have no right over their work anymore, just because google says so.

But I feel we're going in circles, as many have already stated the facts and applicable laws in this thread, and there is really no argument about hte legality of the Google cache.

shurlee

12:06 am on Jul 13, 2003 (gmt 0)

>Yes, but how much is it WORTH? Half an hour's handwork from a so-called web developer who's too stupid to know how to use a computer? Pretty close to worthless, I'd say.<

Pretty funny stuff there. For someone who seems to see themselves as pretty smart, or at least sees themselves as smarter than who they are addressing, Hutcheson appears to fail to see the contradictory point he is making in support of the so-called developer who's too stupid to know how to use a computer, as he has chosen to so describe.

He then goes on to make totally irrelevant analogies about teeth weilding masons as though that were some off-topic evidence to support his pointless contribution to the Google caching debate.

If, as he so charmingly stated, it is "pretty close to worthless", then Shurlee even he sees it as having worth. How much value that worth has is completely subjective. It has value and the loss of that value could constitute damages and if those damages were caused by an illegal act of a third party, there could be liability.

I enjoy reading thoughful posts from intelligent people even when they disagree or create conflict. But sometimes I read his too.

Seriously, I think the entire community would be better served if he could back off of all that, web developer that is too stupid", talk. It's rude, pompous and gives the wrong impression. He may think saying things like that makes the "other" poster look bad and makes people reading it see him as witty or havng the guts to call a spade a spade, but the fact is, to me at least, it makes him look like a middle aged, over-weight, balding man who is more than a little bitter. It doesn't make him look intelligent. It only makes him look like the one who is not smart enough to simply carry on a civil conversation or debate an issue without resorting to petty name calling to make a point.

rfgdxm1

12:29 am on Jul 13, 2003 (gmt 0)

>But I feel we're going in circles, as many have already stated the facts and applicable laws in this thread, and there is really no argument about hte legality of the Google cache.

When it comes to cyberspace, copyright law is fuzzy. The basic purpose of copyright law was to protect the financial interests of intellectual property creators. Given the Google cache isn't driving large numbers of webmasters into poverty, I would question if the courts would find it an infringement.

kaled

1:59 am on Jul 13, 2003 (gmt 0)

Hutcheson said

Yes, but how much is it WORTH? Half an hour's handwork from a so-called web developer who's too stupid to know how to use a computer? Pretty close to worthless, I'd say.

Not everyone who publishes material on the internet is an internet professional. I have picked up a degree of knowledge on the subject out of necessity. I certainly don't sell my web skills to others, nor do I intend doing so in the future. Even if my web skills are so limited as to be worth only $5.00 an hour, my skills in other subjects are much greater. If I have to take half an hour to accomplish a task set me by Google, I should have the right to bill them according to my higher rates.

As for being too stupid to know how to use a computer, I have battled the inner workings of Windows and, most of the time I've won. I've been working with computers on an off for more than twenty years. However, I'm not someone who sees the need to acquire computer knowledge for the sake of it. I simply learn what I need to to get the job done. I doubt that I am unique in this regard.

Kaled.

PS
Apologies for going off-topic, but thinly-veiled insults that contain no logical argument sometimes require a reply.

rainborick

2:34 am on Jul 13, 2003 (gmt 0)

No, I am not a lawyer, but I once self-published a book about a celebrity who founded a very large corporate empire that I depend on for many favors, and so I was being careful not to tweak their corporate noses. So between that and dabbling in software publishing I have actually studied copyright law to a point where I am confident of several aspects.

First, Google's cache system may be judged a copyright violation and it may not. As others have mentioned, it will depend on how they fare with fair-use and other similar arguments that allow copyrighted materials to be used by others. God knows, courts will differ on these things. But the value of the work they copy or the direct financial benefit they gain from having made the copy is completely irrelevent to the determination of whether or not a copyright has actually been violated. The issues of value and profit are for determining damages and compensation only.

The fact that web authors could, in effect, opt out of the cache by inserting the proper <meta> tags or using robots.txt is also irrelevent. The right is the authors' to exercise and not Google's to ignore, and the authors have no obligation to abide by rules Google set to prevent unauthorized use of their work. On the contrary, it would be more correct to argue that Google has no right to cache any page that does not have a <meta> tag that explicitly grants permission to "robots" to "archive" them. In fact, I would not be surprised if that ends up being the state of things if this does ever go to trial.

I have very mixed feelings about this issue. In the first place, I do believe Google's cache to clearly be a copyright violation. I believe each webpage is a work unto itself, regardless of its dependence or relation to other pages on a given website and that by duplicating pages in whole, Google is violating the authors' copyrights for no other purpose than to present the material as a part of their service. And for those who dwell on the profit and loss issues, don't overlook the fact that since having the cache enhances their service, it means more users are likely to use their service and thus, their advertising space is more valuable. On the other hand, I think its an enormously valuable tool and I would wish that the whole world would just agree to ignore it.

XtendScott

5:28 am on Jul 13, 2003 (gmt 0)

Well here is a start of the circle again.

I access to many copyrighted works of literature, art, music and videos in my local town. It is called a Library. I can go there and pickup a book and read it there or check it out and take it home with me for a period of time. Do I buy it, no. Do I go and buy a different copy for myself, maybe, but not usually.

When someone writes a book do they spcifically sign something that says its OK for a Library to have their Copyrighted book?

I can read magazines, watch videos or listen to music. If I am looking for something specific they have a computer system that tells me where it is in the building and then I go get the book I was looking for. I didn't just get a single page of the book, I got the whole book. Are Library's violating the copyright? Stealing profits from authors who could be selling more if it was not in the Library?

For Google, I have to ask, "Is Google Not Like a Library of Information?", handing out snippets of information for people to be able to find the "Book" they are looking for. "How big is a snippet?" For a large site 100 pages could be considered a snippet of information. Small site a paragraph. If a user stayed on googles "Cache" while clicking links, and never reached the original site, that would be a concern for me on Copyright issues, But to me a "snippet" of information could easily be just a single page of a website.

Still Just My Opinion.

Scott

PS Google is much easier than the Dewey Decimal system.

Dpeper

5:38 am on Jul 13, 2003 (gmt 0)

I think scott summed it up very well... and librarys do create revenue, ie. late fees, out of area residents. Just incase some one was gonna take up that avenue.

John_Creed

6:50 am on Jul 13, 2003 (gmt 0)

The difference is that libaries don't get those books for free. They pay in bulk to own those books. They don't rent the books out to users, they allow users to borrow them. The same way I can allow you to borrow one of my books. And so on and so on...

The libaries have a right to charge a late fee because they paid for the book and you didn't bring their property back.

However if the libaries were printing copies of those books and allowing users to use those copies(because the original copies were missing or already borrowed by others), than they'd be in trouble. And it's pretty much what Google is doing.

But like I already said in a previous post; It's not that big of a deal. Googles intention appears innocent and the cache feature is a GREAT service that their users love.

Either be accomodating and take advantage of the no-cache option, or block Google with robots.txt.

kaled

10:31 am on Jul 13, 2003 (gmt 0)

Lending libraries are (normally) non-profit organisations either run or subsidised by local government. They exist, in part, to give financially disadvantaged people access to books and information.

Libraries do not photocopy books and place a disclaimer at the top stating that they are not responsible for the contents.

If Google were to ever rely on a "library" defence in court, they would undoubtedly loose. However, the "reasonable use" defence might possibly work. My guess is that the only real winners of a court case would be the lawyers.

If Google were to change the policy from opt-out to opt-in, the cache would die immediately. That is not something I would like to see. However, they should provide a system of opting-out that is easier to manage. If they do not, eventually, someone will initiate legal proceedings. Such proceedings could last up to several years with appeals. During this time, a court order would almost certainly apply either banning the cache altogether or forcing Google to switch to an opt-in policy which would amount to the same thing.

Kaled.

dgdclynx

4:53 pm on Jul 13, 2003 (gmt 0)

Just to say that amongst my 300+ files I have a number of my books of poetry all of which have ISBNs and thus are copyright protected. So I am not bothered about Google taking copies for their own purposes. It even gets me more readers!

Kackle

6:55 pm on Jul 13, 2003 (gmt 0)

The cache copy allows Google to highlight the search terms even when the user doesn't have a toolbar. The cache copy is almost always delivered faster than the original copy on the original site, because Google can afford lots of bandwidth. Finally, the cache copy never hangs on DNS lookup, and never shows a 404.

How can any search engine compete with this?

Google has recently begun parsing text files with a .txt extension, so that they can present them the same way they present their other cache copies. They put their Google disclaimer in a table on top, and they stick the rest of the page inside of "PRE" and "/PRE" tabs. They highlight the search terms. And get this -- they turn anything that starts with "http://" into an actual live anchor.

Why is this going much too far? Because there is no place to stick a NOARCHIVE meta in a text file. I'm going to insert a clear GIF counter URL in my text files, on the expectation that Google will turn it into a live URL. It won't trip when a browser sees it, because it's a text file. But it will trip if Google turns it into a live URL and someone clicks on the cache copy.

Then I'm going to keep count of how many times the cache copy gets accessed.

I don't know about file extensions other than .txt. Does Google parse out the URLs in other file types, where there is no place to insert a NOARCHIVE?

grifter

7:04 pm on Jul 13, 2003 (gmt 0)

Kackle, I think the Google cached copy of .txt. file just adds an <a> tag for link clickability, but not <img> tags for the browser to proactively grab your gif. The user still has to voluntarily click.

Kackle

8:05 pm on Jul 13, 2003 (gmt 0)

Thanks grifter, I just thought of that myself after I posted, and further research shows it to be the case. Maybe I'll take out the NOARCHIVE on some of my html files, just so I can get a reliable count. I want to know what percentage of all my Google referrals came from clicking Google's cache copy, as opposed to clicking directly from the SERP to my page.

Does anyone have any stats on this?

Kirby

5:33 am on Jul 14, 2003 (gmt 0)

Excellent points, kaled.

My guess is that the only real winners of a court case would be the lawyers.

Winning doesn't always refer to money.

If Google were to change the policy from opt-out to opt-in, the cache would die immediately. That is not something I would like to see. However, they should provide a system of opting-out that is easier to manage. If they do not, eventually, someone will initiate legal proceedings. Such proceedings could last up to several years with appeals. During this time, a court order would almost certainly apply either banning the cache altogether or forcing Google to switch to an opt-in policy which would amount to the same thing.

The winner in this scenario is the competition.

What fascinates me about this thread is the number of posters with situational ethics.

This 156 message thread spans 6 pages: 156