|Ownership Of Content. And Google's Rights To Use|
I've been thinking for a while that this topic needs its own thread so we can get some broader, focused input and differing points of view on the subject.
Some comments are surfacing in other threads that I feel are part and parcel of a contentious issue for most site owners.... Google's obvious intention to draw from the work performed by others, however and whenever they choose. Knowledge Graph is a case in point.
|You may own your content now, but who profits from that labor in the future may be a different story. |
I suspect that comment goes to the heart of the issue for many people. I understand the concept of "fair use" (I assume most countries have similar, but differing, laws on this) but the idea that one party can take, repurpose and profit from the endeavours and intellectual property of another, with total impunity, does not fit within my concept of "fair use". Is this really how the law actually works in your country?
|if your business model depends [on] click-throughs to retrieve publicly available information, I would rethink the business model. |
I understand the point... but if the information becomes publicly available (on the internet) because my time, expertise and funding put it there, then surely another party does not have the right to appropriate the information for its own purposes?
Google can only regurgitate data from its index (ie internet data). It does not have access to every fact or bit of information that exists in the public domain. It learns new facts/information only when someone publishes it.
When that new information surfaces, why should it automatically become Google's to use at the possible expense of the poor sod who did all the work in creating it?
|brotherhood of LAN|
I remember a well-known media tycoon also taking exception to Google gathering and regurgitating its content.
The boilerplate answer is that robots.txt would be respected if you wanted to prevent content getting indexed. The fact that a scraper could end up getting it listed (and perhaps considered the originator of the content) makes that argument a bit murkier.
>It does not have access to every fact or bit of information
Indeed. All of the additional 'enhanced' SERPs have basically been 'manually' bolted onto the search results, but in the near future I think their knowledge graph and the increasingly structured/semantic web will have a bigger role.
It truly is a great topic to think about, unfortunately it is personal to a lot of members because it's involving livelihoods here too. What I'm finding most interesting is breaking down the idea of a fact, and how someone can come to the conclusion that they're the owner of that piece of information.
FWIW I think Google pushes the boundaries, sometimes a little too far but ultimately someone else would be pushing the envelope if it wasn't them.
|...but the idea that one party can take, repurpose and profit from the endeavours and intellectual property of another, with total impunity, does not fit within my concept of "fair use". |
Google asserts that the SERPs are it's legally protected free speech, a product of editorial opinion.
|What I'm finding most interesting is breaking down the idea of a fact, and how someone can come to the conclusion that they're the owner of that piece of information. |
I don't think webmasters think they own the facts. To me it seems the complaints are that the presentation of these facts was taken straight from their websites word for word. To take the example from the screenshot previously published in New look "Google Knowledge" replaces results with content [webmasterworld.com] thread:
If Google has "learned" about these facts and then compiled/constructed the text in their own words and published this text as their Knowledge Graph then the complaints of Google using someone's else hard work would not stand.
On the other hand, Google could easily address Knowledge Graph text complaints by introducing a new robots meta tag (e.g. something like "noknowledgegraph") and ask webmasters to put it on their site if they do not want their text to appear in Knowledge Graph.
Whilst webmaster would then have a choice to "opt out" from the Google Knowledge Graph, in reality this new tag would not change anything with regards to Knowledge Graph SERPs. There would be enough sites that would either not be aware of the new meta or would see this meta tag as an opportunity to appear in Knowledge Graph because other sites may be blocking it.
But brotherhood_of_LAN's comment on scraper then potentially being included in Knowledge Graph would then also be a very valid one.
I think people worry about this stuff too much. Google can't possibly provide a better level of info than other sites, because there will always be a limit as to how much of the scraped info they can show.
They can get away with doing short answers, stuff like "when was elvis born" etc, but there is another thread on webmasterworld at the moment which shows screenshots of medical queries. No patient is ever going to be satisfied with googles five lines about their ailment — would you be? If you were ill? So they are bound to click on a result. Google will never be able to provide a complete answer for those users, because they can't print a whole page of scraped info without getting into legal trouble.
And let's be serious about it... people do not regard google as an expert on medicine. They know full well that google does not employ doctors. So they are unlikely to accept "googles" diagnosis. They are much more likely to be satisfied when they read the same thing on a medical site. That is not something that google can ever fix.
|Google asserts that the SERPs are it's legally protected free speech, a product of editorial opinion |
As a search engine Google will obviously generate SERP's and it is solely their business how they decide the order of ranking for the sites that appear in those SERP's. No argument.
It's when they go beyond that function and start siphoning data from the indexed websites for their own vested self interest that, for most people I suspect, a line has been crossed. That has nothing to do with generating SERP's.
Sorry…. but I have a problem with the argument that if a robots.txt file allows Google to index a site, then by default, it is an indicator they can do whatever they want with the indexed data.
Robots.txt is simply a set of instructions that defines which robots can/cannot access which parts of the website. It is a statement that defines what can be indexed…. that's it. Period.
|Googles answer to "What Is Robots.txt" |
The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention for advising cooperating web crawlers and other web robots about accessing all or part of a website which is otherwise publicly viewable.
Robots.txt is not a permission statement which says "once you have indexed the data you can take whatever you want and use that data for your own purposes"
@austtr - I was about to make essentially the same reply but you put it far better than I would.
Yes, austtr and piatkow.
Also, if we're speaking of Google's Search Engine here, it's a fact that the engine is a database - it indexes content for search but doesn't own it.
It's 'fair use' from Google's part if they just order and categorize the content the way they think suits their users' needs better, or get anonymous statistics based on the most clicked or linked content, but selling the content without a webmaster's permission doesn't fall under 'fair use'.
|brotherhood of LAN|
>then by default, it is an indicator they can do whatever they want with the indexed data.
I meant robots.txt allows you to opt-out, to circumvent any issues you have with what Google does with data from your site. Indeed it doesn't say what people can and can't do with your data. I don't think there's much middle ground for "you can download the data but are only limited to do X Y and Z with it".
|Robots.txt is simply a set of instructions that defines which robots can/cannot access which parts of the website. It is a statement that defines what can be indexed…. that's it. Period. |
Crawl. Not index.
Crawl. Not index.
Crawl. Not index.