Forum Moderators: open

Message Too Old, No Replies

Will Google sell their data?

         

getvisibleuk

8:21 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



Looking at the Alexa page they're selling our data. They've crawled the net and the data they've captured is up for sale on a disk (mighty big one at that).

[pages.alexa.com...]

Would Google go down this route?

Giacomo

11:44 pm on Aug 6, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



> Would Google go down this route?
What evidence do you have that they have not done this already?

A comment by GoogleGuy would be appropriate and much appreciated.

Legally, I don't see much difference between Alexa and Google here.

Main differences between Alexa's and Google's use of crawled content:

1) Alexa's archived content is available for sale online [pages.alexa.com]; Google's is not.

2) Google provides instructions on how to remove content from their web index [google.com]; Alexa does not [webmasterworld.com].

IITian

11:52 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



Kackle,

I am with you. Google is a very competent company with a nice "born in campus" image. However, it is funded by the same VCs who funded other commercial companies, run by same professional executives who ran other commercial companies. Their job is to increase revenues, profits and maximize share prices. It is their fiduciary duty.

They will do whatever it takes to fulfil their duties.

Giacomo

11:55 pm on Aug 6, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



<added>
I see that the thread title has been changed from "Will Google go the same way as Alexa?" to "Will Google sell their data?".

The new title is utterly inappropriate, incorrect, and misleading. In a word, wrong. :)

I suggest to change it to "Will Google sell our information?".

There are a number of substantial differences between "data" and "information" (from a semiotic point of view), and between "their data" and "our information" (from a legal point of view).

Those distinctions are essential.
</added>

GoogleGuy

1:09 am on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think our privacy policy covers pretty much all this. From [google.com...]

(Brett, I think this is totally fine to quote--we want people to be informed about Google's privacy policies at all times. If you're worried, feel free to cut the quote and point people to the page.)

With Whom Does Google Share Information?
Google may share information about you with advertisers, business partners, sponsors, and other third parties. However, we only divulge aggregate information about our users and will not share personally identifiable information with any third party without your express consent. For example, we may disclose how frequently the average Google user visits Google, or which other query words are most often used with the query word "Linux." Please be aware, however, that we will release specific personal information about you if required to do so in order to comply with any valid legal process such as a search warrant, subpoena, statute, or court order.

I think that says it as well as I could. I highly encourage everyone to read our privacy policy--it's much more readable than you'd expect (only 8 short paragraphs!) and gives examples and everything.

Giacomo

1:19 am on Aug 7, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



GoogleGuy, by "our information" I did not mean "personal information about users collected through cookies, etc."; I mean "our intellectual property", i.e. our (copyrighted) web content, collected through Googlebot and archived in Google's cache. Google's privacy policy doesn't seem to cover that topic at all.

The right question is therefore:

Does Google plan to sell archived web content?

Suggested reading [news.com.com]

[edited by: Giacomo at 1:22 am (utc) on Aug. 7, 2003]

Visi

1:22 am on Aug 7, 2003 (gmt 0)

10+ Year Member



what do you think their partners are buying today? Am I missing something in this thread?

chiyo

1:26 am on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally i have no problem with the google cache

But i do have a problem with anyone setting up a service where they copy websites, put it on a disk, and send it to you, for free or a fee. Just because something is published on the web may mean the author meant it be veiwed for free, but this does not extend to copying beyond "fair use" as defined by the Berne Convention.

This is analogous to photocopying or scanning a book, (or groups of books), and selling it. You can review a book, quote long sections for review or satirical purposes, but you just cant straight out copy it.

Why is this illegal?

1. The copied version can "replace" the original, meaning revenue goes to the copier and not the author.

2. The person who created the original work will not create any more, as they did not get compensation.

This is why there are strict laws on how much of a book or article you can copy or reproduce and for what reasons (personal use, educational use, not for multiple copies, not to be redistributed or re-packaged etc)

I understand that to many people with say commerce sites this is no problem. But to people like us who have websites made up of original articles and guides it is extremely important. - not to mention artists with original artwork, etc. etc.

If we wanted to make a compendium of our best articles and wanted to sell it on CD, thats fine. We created it and hold the copyright and distribution rights. Its creative content that we must be able to get a return out - not for some one else to make money out of without our permission.

[edited by: chiyo at 1:31 am (utc) on Aug. 7, 2003]

Giacomo

1:26 am on Aug 7, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



> what do you think their partners are buying today?

Quality search results.

> Am I missing something in this thread?

Definitely yes. The concept of intellectual property [google.com] and definition of copyrighted works [google.com].

Visi

1:29 am on Aug 7, 2003 (gmt 0)

10+ Year Member



So where is the yahoo cache being served from then?

Giacomo

1:38 am on Aug 7, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



Visi, there's a huge difference between serving cached content through partners' SERPs, and selling it online the way Alexa does!

Visi

1:45 am on Aug 7, 2003 (gmt 0)

10+ Year Member



Don't agree with that statemment. We are only talking about who it is sold to, not the principle of selling it. That is the point that I am trying to make here. Attempting to make a distinction about who the buyer is is not the issue. The data has been sold.....or partnered for profit. Is that not the issue that others have been making in this thread. Personally I do not take issue to this as I can stop this from happening by opting out of the search and cache if I desire. But if I allow this, I recognize that the search engines are selling this data. (not personal information as the google privacy policy covers). I think we need to take off the rose colored glasses here and recognize it is the database in the entirity that is sold as a service, not just the search functions. Just a different view of the question posed.

dmorison

7:16 am on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think our privacy policy covers pretty much all this

Sorry GG, it doesn't cover it at all. We're talking about website content, not personal information.

plumsauce

7:42 am on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BTW, just who is paying the freight bandwidth wise?
(hint: look in the mirror and feel your wallet)

With all these crawlers running around, could they
not just put up a joint venture crawler farm?

After all a page is a page. It's their post crawl
algorithms and presentation that is their market
differentiator.

Bottom line is they're livin' large on your nickel.

+++

trillianjedi

8:04 am on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. The copied version can "replace" the original, meaning revenue goes to the copier and not the author.

Can you explain how?

2. The person who created the original work will not create any more, as they did not get compensation.

Compensation for what though? Lost money? From where?

I'm damn sure I want my sites on this "disc". I'm not Yahoo! or Coca-Cola so I need my content everywhere. That's why I put it on the web it in the public domain in the first place.

TJ

Giacomo

11:24 am on Aug 7, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



Visi, do you honestly believe that Google sold (i.e., physically transferred possession of) their web index (including the Google cache) to Yahoo!?

Google's index is one of its most valuable assets. I don't think they will let other companies get their hands on it so easily.

My intepretation of the G/Y partnership is that Google have licensed (i.e., leased use of) their search services to Yahoo!.

chiyo

11:45 am on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi TJ

>>1. The copied version can "replace" the original, meaning revenue goes to the copier and not the author.
Can you explain how?<<

That the work can be fully experienced as a copy, meaning people can use the copy and not the original. Therefore they do not have to purchase, rent, license (or see the ads in) the original from the author.

>>2. The person who created the original work will not create any more, as they did not get compensation.
Compensation for what though? Lost money? From where? <<

Compensation that they should expect from the "use" of their work. This could be royalties, licence fees, subscription fees, etc. etc.

>>I'm damn sure I want my sites on this "disc". I'm not Yahoo! or Coca-Cola so I need my content everywhere. That's why I put it on the web it in the public domain in the first place.<<

I think we may be arguing about different things. It depends on the site, or the information. Coca cola dont mind their message being spread as much as possible, but use their trade mark or expect to BUY a coke for nothing, and the attitude will be different. For example I am talking about intellectual property, say original works of art or writing, or say a database of information, rather than say advertising copy or promotional material.

This is no different from publishing a book. Even if we give the book away, we are not telling people it's OK to copy it, repackage it, and sell it. The author retains those rights. All we are saying its OK to read it and use it within "fair-use" guidelines.

Same with the web. By putting it on the Web we are not saying it is free and therefore worthless. We provide the ability for people to view or read it or even copy it for personal and non-commercial use, not to copy, repackage and distribute it for free or for profit.

Without these rules, scientific and artistic endeavour will come to a halt and everybody will become marketers!

merlin30

11:50 am on Aug 7, 2003 (gmt 0)

10+ Year Member



Check out #72 on this thead [webmasterworld.com] for my ramblings on this subject

Kackle

12:36 pm on Aug 7, 2003 (gmt 0)



I continue to believe that legally speaking, a judge looking at the Google cache, and then at the Alexa "web on disks," both of which are offered without the "express consent" of opt-in, would have a hard time making a distinction between these two modes after studying U.S. copyright law.

However, there is a difference here for the average webmaster. Comments from several who don't object to Google's cache copy but do object to what Alexa is doing, reinforce this difference.

The difference between Google's cache copy and Alexa's "web on disks" is the extent to which the webmaster loses control over distribution. With Google you have some control before the fact -- you can use a NOARCHIVE meta. Fine and dandy, as far as this goes, although Google doesn't make this very easy because you need it one on every page. Also, for non-html documents that don't have headers, there's no place to stick a NOARCHIVE meta.

With Alexa you can use robots.txt.

Although googlers are fond of saying that if you don't like Google, you can always disallow Google in robots.txt, this is not realistic. Most sites depend on Google for traffic. Assuming that the site provides some income, it is not an option to tell Google to stay off of the site. It's like saying if you don't like air pollution, you can always stop breathing.

The real difference between these two modes is the amount of control you have after the fact. With Google you can have a page removed. (But I know of no way to have just the cache removed while keeping the page, except to stick in the NOARCHIVE and wait about 90 days.) However, with Alexa selling data on disks, you lose all control. You have no idea who bought your content, or what they're using it for, or how much money they're making off of it. There's no way to correct errors or retract pages. Someday it may even be used against you.

I predict that the Alexa move will bring the copyright issue into play sooner than we might have expected, and that any ruling from this will have implications for Google's cache copy as well. That's the real reason Google probably won't hype the "sale on disks" option -- not because they're ethically superior to Alexa, but because too many powerful people would notice if Google started doing the same thing as Alexa, and too many lawyers would start salivating. Alexa is low-profile, and they don't have all that much to lose. Google is high-profile, and they could lose their cache copy.

Brett_Tabke

1:17 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google may share information about you with advertisers, business partners, sponsors, and other third parties.

In other words - Google admits they may do it. No big deal - everyone does it - end of story.

trillianjedi

1:44 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey Chiyo,

Interesting reply and an interesting thread!

That the work can be fully experienced as a copy, meaning people can use the copy and not the original. Therefore they do not have to purchase, rent, license (or see the ads in) the original from the author.

Worst case scenario there is, a user, on a "local" copy of the internet, clicks on an advert on your site and is taken to the advertisers site, but the advertiser has to pay you no revenue although, possibly, still getting the benefit of that advert.

I can see an argument to say that is wrong, or that says "OK, the click-through business model does not work now", but I still don't follow the main thrust of your argument - no-one is being forced to buy a copy of the internet from Alexa - people can still view the "original", for free, but connecting to the internet (as most will of course, for the "freshest" copy).

If someone wants to pay Alexa to shove it on a disc for them, for convenience, then so be it. Alexa are legally able to charge for that *service* I think as they are not charging for *your content* (which as a freely available resource, republished in original form, I'm thinking is actually lawful). Now I say that because that's my gut instinct, but I have not yet seen any cases which deal with this issue (although I have seen cases which deal with the distribution of software which is "shareware" or "freeware" on CD with the fabricators of that CD charging for the "service"). Yes I'm a lawyer by the way, but this is not my specialist field as you can probably also guess!

I think there is a subtle distinction here to be made betweeen "charging for your content" and "charging for a service that provides someone with your content".

I wonder whether a Judge that effectively states that ".... it is unlawful, or a breach of copyright, for party A to provide access to party B's website without their consent" would put every single ISP in the country at risk of being sued for charging people to connect to the internet, resolve a DNS and view your site?

ISP's are doing the same thing - they're just doing it online, rather than locally.

My bottom dollar says, at a guess, a future judgment will state that a website owner who publishes a freely accessible site effectively puts that site in the public domain, with the caveat that, if it is reproduced, it is reproduced in its original form.

But for the moment, with the exception of your point about advertising, which I accept is a damn good point (and in fact the only one I've seen so far on this thread) I still do not see any loss being caused:-

That the work can be fully experienced as a copy, meaning people can use the copy and not the original. Therefore they do not have to purchase, rent, license (or see the ads in) the original from the author.

Ignoring the ads point as I have accepted that one - who currently "purchases, rents or licenses" your original website content in its original form?

Compensation that they should expect from the "use" of their work. This could be royalties, licence fees, subscription fees, etc. etc.

Who currently pays you such royalties, licence fees or subscription fees? Your web host?!

I'm not sure that you follow my point, which is, in a nutshell, what is your loss? Your publicly available, free to view website does not generate "royalties, licence fees, subscription fees or rental" income.

The users ISP earns the money for providing users with your "original content in it's original form".

TJ

chiyo

2:08 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi TJ.

Thanks for the very detailed and welcome response!

im sorry i cant respond with same.. but in reply to your various questions on what we are "losing".

1. We will for this year and for the future be providing a CD with the "Best of the ...." for sale. (In fact we already tried this in 1997. Disaster. Sold around 500. But we are bigger now, and have the contacts... and wiser!)

2. Some of our web published articles are joint published with a leading business journal publisher as articles in their journals. They have print rights. We have electronic distribution rights.

3. We expect in 2004 to make the Cd referred to in (1) as a print "glossy" volume to be published by a book publisher covering the last 5 years to be distributed and sold in bookstores. We may well remove those articles from the site and replace with a link to how to buy the book. Too late. Alexa has already sold it. And possibly for much cheaper than we would have, because all they pay for is a copying machine, they havent had to pay the author, copyediting, promotion, coding, and so on...

4. Some PDF articles are articles from the print journal publisher referred to above. We have an agreement to use selelcted articles on the website under certain conditions. They either have a disclaimer right in the PDF or the only links to it, reminding readers that it is for personal use only.

Now my point (and yes im not a lawyer) is that if Alexa can sell our site contents to others they are depriving us of future income from the selling of it, as it can be already be purchased from Alexa (whose contribution to research, writing, formating, copy-editing, reputation of writers is limited to copying it on to a disk, similar to a pirate VCD disk seller in a Singapore market).

It also is against our agreement with the print publisher that suddenly finds their articles secondarily published. We both lose control, (and the ability to profit from) redistribution.

It also devalues the content should at one stage we decide to take it off line and make it available from a subscriber-only service. Much material does not date.

Anyay, hope that answers some of your questions and see my concerns. Im sure the same or similar would apply to photographers, graphic artists, video artists, and other authors who decide to make some of their work available on the web for personal use only.

The very principle that by displaying your work in the Web immediately makes it "worthless" or available for any entrepreneur to copy without paying you for its development frankly is very distrubing to any creative type!

Kirby

2:37 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think there is a subtle distinction here to be made betweeen "charging for your content" and "charging for a service that provides someone with your content".

Except that they charge for a product with your content. The sale of the service (the labor involved in copying) is only part of the issue. The real issue is the sale of a copy of the content. Period.

That's the real reason Google probably won't hype the "sale on disks" option -- not because they're ethically superior to Alexa, but because too many powerful people would notice if Google started doing the same thing as Alexa, and too many lawyers would start salivating. Alexa is low-profile, and they don't have all that much to lose. Google is high-profile, and they could lose their cache copy.

Im sure Google is thrilled to let Alexa test the waters for them.

I am not going to modify a robot.txt file to keep out everyone able to build a bot. I will notify Alexa that they do not have my permission to distrubute my content offline in any form. It will set the groundwork for a class action. My domains have been under attack, now my content. Regardless of the perceived value (or lack thereof), I will protect my property.

For those who think it is a lost cause, in California, two law students just got a judge to rule that ladies night at a bar is discrimination and they received $125,000 from the 8 San Diego bars they sued.

trillianjedi

2:43 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chiyo,

A few things just blown up at work so will come back to you later, but couple of quick points:-

Now my point (and yes im not a lawyer) is that if Alexa can sell our site contents to others they are depriving us of future income from the selling of it, as it can be already be purchased from Alexa (whose contribution to research, writing, formating, copy-editing, reputation of writers is limited to copying it on to a disk, similar to a pirate VCD disk seller in a Singapore market).

They are not depriving you of anything that you are not already depriving yourselves of by having your site contents already freely available.

If your site ever becomes "pay per view", then yes, you I'm sure would be able to ask Alexa to remove it and it won't be able to index it in future anyway.

It also is against our agreement with the print publisher that suddenly finds their articles secondarily published. We both lose control, (and the ability to profit from) redistribution.

You don't lose control exactly, if Alexa redistribute exactly as you have presented it (which they will by nature of the fact that they are grabbing your site).

You are currently unable to realistically profit from re-distribution anyway, as your "product" is available for free.

It also devalues the content should at one stage we decide to take it off line and make it available from a subscriber-only service. Much material does not date.

As and when you do that, you will need to write to Alexa. At the moment, your material is something only just shy of being public domain and free game.

Let me give you a quick example. If I take a free daily newspaper, spend time and money and use my expertise to scan it and put it on a CD-ROM and sell it, then I am able to sell that CD without a breach in copyright. The charge for the CD represents a charge for my "services" in scanning the paper in and not a charge for "content". In essence, I am just redistributing, for free, the free content (and in fact the paper would be very happy I suspect as they could charge more for their advertising as they would be reaching a wider audience).

Now, where I may be in breach is in respect of the publishing rights, and that's something where the law in relation to the internet, gets very complex, unsettled and muddled.

But without getting into that muddle, you have to consider, from a legal point of view, what has the newspaper in that above example actually *lost* that would give due consideration to a legal action?

Now you've gone and raised a whole load of new points though and that's maybe where it could get interesting and Alexa could just get into trouble.....

TJ

chiyo

2:55 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



TJ thanks...

Given what you say the smartest thing for us would be just to take all main content down, replace with small abstracts, and make it subscriber only to read the full articles.

We like letting people read the whole article for free but it this means that legally we lose all rights to it, we of course would take it down and use a more respectful media than the Web!

We may just find it easier to block the Alexa spider. I guess that may solve our problem with them but im sure there are other theives like Alexa around.

Kirby

3:15 pm on Aug 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If I take a free daily newspaper, spend time and money and use my expertise to scan it and put it on a CD-ROM and sell it, then I am able to sell that CD without a breach in copyright.

In the US this is still a copyright violation. It is about reproducing the content, not the value of the content.

The yellow pages are delivered free on my doorstep once a year. If I scan it, put it on cd and then sell it, I have done the same thing you are talking about with the free paper. However the publisher of the phone book is still going to sue me for violating their copyright.

Regardless of the exchange on WW, Alexa and G's cache are issues that will ultimately be argued in court.

wkitty42

4:01 pm on Aug 7, 2003 (gmt 0)

10+ Year Member



kirby,

you are correct if you do nothing to alter the presentation... however, you could scan and OCR that phonebook and format it is any way you like and you'd be ok...

FWIW: i've seen this discussion many times over the years and the conclusion is what i just described above when it comes to publically accessible data like that in a phonebook... "over the years" is over 20...

Kackle

4:33 pm on Aug 7, 2003 (gmt 0)



The issue was resolved only 12 years ago.

I think you're referring to Feist Publications, Inc. v. Rural Telephone Service Co., 499 U. S. 340, 358 (1991).

This decision said that originality is a prerequisite to copyright, and that the white pages of a telephone book do not contain sufficient originality. The facts (names, addresses, and telephone numbers) are not themselves copyrightable, even if a lot of work was required to compile them. However, the bar is set pretty low for the definition of "originality," so that a new rearrangement of the facts may indeed be copyrightable.

wkitty42

4:43 pm on Aug 7, 2003 (gmt 0)

10+ Year Member



kackle,

yes, that issue was... however i have seen the discussion for longer... that one case finally came up and made the point...

as you say, its the presentation that is copyrightable... not the data...

driesie

4:48 pm on Aug 7, 2003 (gmt 0)

10+ Year Member



but they are caching the presentation aren't they? I don't understand that argument.

wkitty42

4:50 pm on Aug 7, 2003 (gmt 0)

10+ Year Member



first i should have stated that IANAL (i am not a lawyer)...

however, in all my reading of this stuff, i understand that caching the content is not the same as publishing the content... that particular snafu is still being discussed in many venues...

This 124 message thread spans 5 pages: 124