Will Google sell their data?

Forum Moderators: open

Message Too Old, No Replies

Will Google sell their data?

getvisibleuk

8:21 pm on Aug 5, 2003 (gmt 0)

Looking at the Alexa page they're selling our data. They've crawled the net and the data they've captured is up for sale on a disk (mighty big one at that).

[pages.alexa.com...]

Would Google go down this route?

Marcia

10:16 am on Aug 6, 2003 (gmt 0)

I can't see Google ever going that route.

edit_g

10:18 am on Aug 6, 2003 (gmt 0)

No chance. I can't see how they would want to either...

driesie

10:22 am on Aug 6, 2003 (gmt 0)

Isn't that a Copyright Infringement anyway?
I remember a discussion a while ago about whether google's cache was or not, but this surely must be?

trillianjedi

10:26 am on Aug 6, 2003 (gmt 0)

Even if it was copyright infringement, would you email Alexa and ask for your website to be removed from this index?

dmorison

10:41 am on Aug 6, 2003 (gmt 0)

I can see Google going down that route.

Their archive is a hugely valuable asset, and under the right licensing conditions I can easily conceive of Google "selling" access to their crawl.

People seem to talk as if Google is a charity, not a company.

driesie

10:41 am on Aug 6, 2003 (gmt 0)

I can imagine some people would unless they get paid royalties.

Yidaki

11:09 am on Aug 6, 2003 (gmt 0)

>Even if it was copyright infringement, would you email Alexa
>and ask for your website to be removed from this index?

Damn, YES! Gonna check the link now ...

edit_g

11:14 am on Aug 6, 2003 (gmt 0)

Their archive is a hugely valuable asset

Which they do not own... It would be the equivalent of painting a giant red target on their foreheads - they would be caught in a media <snip>storm before they could even think about removing the links to the press release.

Why should a profitable and private company like Google take such a silly risk for a return, which even on a best case scenario forecast, would be minimal? Who is seriously interested in buying the Alexa data? I don't think anyone is breaking down their doors to buy it...

[edited]period replaced w/ question mark...[/edited]

dmorison

11:21 am on Aug 6, 2003 (gmt 0)

Which they do not own...

No different to the Alexa situation.

edit_g

11:25 am on Aug 6, 2003 (gmt 0)

I would say that the difference is that the general public, media and most webmasters do not know what Alexa is (or care), but Google is very much in the public eye.

chiyo

11:31 am on Aug 6, 2003 (gmt 0)

If either Alexa or Google or Fred down the road copies my content onto disk and distributes/sells it IS copyright infringement. The Google cache I think is different as it is not designed to be reproduced, but Alexa looks like it is doing nothing different in practice than copying a site onto disk and selling it. There are some pdf's licenced to us by a second party for reproduction on our web site only on a condition that people read a copyright note that it can be downlaoded and printed for personal use only. Does this mean we can be sued, as Alexa's use is certainly not personal.

This to me must be sorted fast. It demeans any information on the web and bascially says that when you publish on the web you lose reproduction and distribution rights to any smart cookie with a writable cd and a marketing network.

Same as copying MP3's on the net. It may be easy to do. you may not like recording companies. You may think artists are paid too much. But in the end, its plain old robbery and thievery, no matter what excuses you use.

trillianjedi

11:35 am on Aug 6, 2003 (gmt 0)

Damn, YES! Gonna check the link now ...

Out of interest Yidaki, why?

trillianjedi

11:49 am on Aug 6, 2003 (gmt 0)

This to me must be sorted fast. It demeans any information on the web and bascially says that when you publish on the web you lose reproduction and distribution rights to any smart cookie with a writable cd and a marketing network.

I'm not sure that 600 terrabytes of data can quite compared to that!

I'm just playing devils advocate, and in truth, really don't see what the fuss is about. Websites are available to view on the internet, for free, by the public, by going to a URL.

It seems to me that Alexa is offering that ability to "view" on disk.

The analagy to MP3's and audio doesn't stack up to me. It's different - people make CD's to distribute for sale in a shop. People take that product, and against the terms of the licensing arrangement whereby the user is entitled to use that media, copies it to MP3 and puts it out for public consumption for free. It's not the same. The original intention of the author was that the product was for sale in one form of media, not available for free in another. It's the same with "paid for" MP3's - the intention was to charge.

How can we actually claim that we have *lost* anything. We have not, as far as I can tell, lost anything. To have a reasonable chance of success for an action in tort (certainly in the UK anyway) you have to have suffered loss.

All that's happening is these websites are being distributed on a different form of media for a charge.

I liken it more to a CD mnaufacturers ability to distribute freeware on a CD and charge for that CD.

But at the end of the day, why do you feel you are being harmed?

TJ
(I don't work for Alexa(!) and I'm really just curious. Sorry if I'm being stupid and completely missing the point but I just don't see it!)

chiyo

11:57 am on Aug 6, 2003 (gmt 0)

>>Websites are available to view on the internet, for free, by the public, by going to a URL. <<

Correct. They are not available for copying and redistributing for profit or otherwise, as is the general terms of the Berne convention for any published work, and is made clear in copyright statements on millions of website.

>>60 terrabytes?

No alexa will kindly make you up a custom disk of anything from a few websites to millions according to it's spin.

This is theft. no two ways about it. People can read my website till kingdom come. They can print or download it for personal use. but copy it and give it too all your mates on disk? No way. To allow ay other provelages will be the end of free, useful info on the web as we know it.

driesie

12:03 pm on Aug 6, 2003 (gmt 0)

What about this comparison:

You go to a free concert, you tape it, you sell the tapes.

I think most people would agree that's not very legal?
It's not because somebody offers "free to view" content, that somebody else can take the right to sell it, that content can still be copyrighted.

Giacomo

12:08 pm on Aug 6, 2003 (gmt 0)

Hmm... I'm not sure this is 100% legal.

I mean, webmasters can prevent their content from being indexed by Alexa (just as the

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

tag prevents caching by Google), but how can a webmaster remove existing content from Alexa's index?

One more good reason to disallow ia_archiver [pages.alexa.com] IMHO.

trillianjedi

12:11 pm on Aug 6, 2003 (gmt 0)

OK, well the concert analagy relates to performance/publishing rights - so I guess you're saying that you want to protect the publishing rights in your works published on the web.

OK, I can see that.

I'm still confused as to what the loss actually is?

If you feel there is one, then maybe we should all be suing google for charging Yahoo! a fee for our websites content in it's search engine?

Or are google allowed to do it because they're google?

Staffa

12:35 pm on Aug 6, 2003 (gmt 0)

Something puzzles me here. Alexa search is powered by Google then why, apart for use at the archive site, is Alexa crawling our sites in the first place?
Or has the article, mentioned in the initial post here, become the only reason?

Any ideas.

suing google for charging Yahoo! a fee for our websites content

Not at all, google charges yahoo! for work google has done from which yahoo! benefits.
Alexa will charge for 'a service' ie handing out work that we have done and with no likely direct benefit to us.

It's the principle.
My site is free to view to anybody and those interested can download a compact version in ebook format without charge.
Alexa has no right to charge for my content.

[edited by: Staffa at 12:48 pm (utc) on Aug. 6, 2003]

Perplexed

12:37 pm on Aug 6, 2003 (gmt 0)

Who is legally right or wrong doesn't seem all that relevant to me. If I chose to do hundreds of hours of work and pay out for a website that I then chose to let people view for free.... thats my business. Why the hell should somebody else be able to take all that work, copy it onto a disk and then make a profit out of it?

Visit Thailand

12:42 pm on Aug 6, 2003 (gmt 0)

I'll admit I have not fully read all the posts above but I found this interesting on the page on Alexa:

over 3.5 billion unique URLs, 3 billion unique pages, all updated every 60 days

60 days is an eternity on the web why would anyone be interested in this?

although they do add this:

Special collections may be created on request and updated as often as needed.

They say historians etc may be interested but why when you have Google at your fingertips?

ADD IN

Can someone advise what the bot is that collects info for the Wayback machine?

edit_g

12:51 pm on Aug 6, 2003 (gmt 0)

There are also other issues, such as:

When you take a graphic down, they won't have it anymore either (I don't know if they save graphics as well, I'd think not), and lots of little red x's may make people precieve your website as sloppy or broken if they're viewing it through the Alexa archive.

Your "money pages" won't work! Most of us have websites so that people can come and buy stuff - the payment pages, dynamic product pages and all the "money pages" won't work - so they are taking away the ethos of most peoples sites.

If you've updated your webpage with new logos, pricing information or taken something down which had incorrect information - then you won't want people to see the old stuff - but now they can...

Just a few issues.

Giacomo

4:27 pm on Aug 6, 2003 (gmt 0)

If you feel there is one, then maybe we should all be suing google for charging Yahoo! a fee for our websites content in it's search engine?

The huge difference is that Google does not sell content; just search results.

If you don't want your files to be archived by Google, you can disallow Googlebot and/or Googlebot-Image and/or use the

NOARCHIVE

meta tag attribute. You can even ask Google to remove one or more URLs from their index [google.com]. The latter simply doesn't seem possible with Alexa.

Copying or otherwise reproducing copyrighted material for profit requires written permission from the copyright owner. You don't have to ask permission if you're reproducing something you have access to for private personal use (e.g., printing or saving a web page to your hard drive). But to put that web page on a diskette and charge someone for it with no authorization from the copyright owner is simply illegal according to the current international IP protection legislation.

Something puzzles me here. Alexa search is powered by Google then why, apart for use at the archive site, is Alexa crawling our sites in the first place?

So that they can (legally) store and (illegally?) reproduce and sell our content.

60 days is an eternity on the web why would anyone be interested in this?
although they do add this:
Special collections may be created on request and updated as often as needed.

They say historians etc may be interested but why when you have Google at your fingertips?

Data mining is the answer.

Once you have a 3.5 billion URL database, you can extract just about any sort of valuable (=marketable) information from it: stats, correlations, etc.

Example of what a "special collection" request might look like:

"Please extract 20,000,000 UK corporate e-mail addresses from your web archive and burn them on a CD. I will use them for unsolicited commercial emailing (since those addresses are freely available on the Web). I'll pay big bucks for that stuff."

And you know, e-mail addresses don't change every 60 days. ;)

msgraph

4:45 pm on Aug 6, 2003 (gmt 0)

...and why people download their toolbar and follow their stats still amazes me.

Giacomo

4:56 pm on Aug 6, 2003 (gmt 0)

Can someone advise what the bot is that collects info for the Wayback machine?

ia_archiver [pages.alexa.com]

It's on WebmasterWorld's banned bots list [WebmasterWorld.com].

...and why people download their toolbar and follow their stats still amazes me.

I would bet 90% of them are webmasters/SEOs checking their competition.

HayMeadows

6:06 pm on Aug 6, 2003 (gmt 0)

Personal opinion here but probably in the vocal minority:

We have put together our website for others to view. The more people that view it the better through whatever medium as long as we still get credit. Why the fuss that Alexa or even Google's cache offer the services they do is beyond me. Its the internet for crying out loud.

Both Alexa and Google still give credit to the owner of the website. I can understand about being upset with people who steal pictures, audio, text, etc. without giving credit to the original but these complaints about someone's property being copied in the way that Alexa and Google services do, sound like a bunch of people crying to me.

I'm sick of reading about it, but had to give my opinion at least once <g>.

Giacomo

9:25 pm on Aug 6, 2003 (gmt 0)

these complaints about someone's property being copied in the way that Alexa and Google services do, sound like a bunch of people crying to me.

I'm not crying at all, HayMeadows, nor am I complaining. Just telling facts as they are: there are coyright laws, and there are copyright infringements.

If you don't care that Alexa may (illegally?) sell whatever information it can extract from your web site, that's fine.

The quality of information available on the Web varies greatly. Most of it is free. But even free information may have strategic value when extracted and processed on a very large scale. Many of us are just concerned about what a third party like Alexa may or may not do (legally speaking) with the information that we publish on our web sites.

That said, everyone is free to allow ia_archiver to crawl their site if they wish so.

Brett_Tabke

9:37 pm on Aug 6, 2003 (gmt 0)

> Would Google go down this route?

What evidence do you have that they have not done this already? There is none to suggest they haven't sold the entire set of databases.

Kackle

11:06 pm on Aug 6, 2003 (gmt 0)

The data mining and cross-correlation possibilities of having the web off-line on fast computers would be of interest to high-end customers. These could range from marketers, to email address scrapers, to antiterrorism "Total Information" bureaucrats. You can't do this sort of analysis online, because crawling the web is painfully slow compared to grabbing the data from a CD or from RAM. To put it bluntly, we're talking about a new level of copyright theft and/or invasion of privacy that's one step beyond what either the Wayback Machine or Google's cache copy are doing.

In terms of the law, I believe that Google's cache copy is fundamentally the same as what Alexa is selling here. Google is redistributing content by displaying the cache link and making such links available independently of the original website, at the rate of hundreds of millions of links a day. They're doing it to make money. Legally, I don't see much difference between Alexa and Google here. This latest move by Alexa simply makes it more essential to get the copyright issue into court.

If there's a market for the web on disks, you can bet Google will get into it. Everyone says, "Google would never do that." Next thing you know, Google has grabbed all of your images (June 2001) and only later tells you that they're doing this. Everyone scrambles to move their images to disallowed directories. "Google would never sell out." Now Google is the world's largest ad agency. "Google would never tamper with PageRank." Then Google tells the judge that PageRank is just their opinion of a page, and they have First Amendment rights to do whatever they want with a site's ranking, and you better believe we zapped SearchKing, Your Honor.

Google is insensitive to privacy issues. They'd do it in a heartbeat if there was money in it. And I think there could be big money in it. Intelligence agencies, advertising agencies, trend-spotting gurus, etc., would love to have the web on disk combined with cool data mining software, and maybe some data visualization toys also. Lots of fun for those who can afford it.

Visi

11:41 pm on Aug 6, 2003 (gmt 0)

Some fine hair splitting here IMHO. Google has already sold their database to companies such as Yahoo. Lets not try to hide behind some technical definition. Also includes cache copy. No different than what Alexis is doing. They have the information, might as well make some money from it.

This 124 message thread spans 5 pages: 124