homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google Finance, Govt, Policy and Business Issues
Forum Library, Charter, Moderators: goodroi

Google Finance, Govt, Policy and Business Issues Forum

Featured Home Page Discussion

This 32 message thread spans 2 pages: 32 ( [1] 2 > >     
Google is building a new Knowledge Vault

 10:41 am on Aug 29, 2014 (gmt 0)

Google is building a new database of information called Knowledge Vault
Here are some two quotes from the article:

Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it....

As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages

Read the full article at [newscientist.com ]
The article speculates that Google will use this Knowledge Vault to provide quick answers right on the search results page, but from a much larger database than the current Knowledge Graph.



 5:48 am on Aug 30, 2014 (gmt 0)

Anyone remember Asimov and "Foundation"? The early times are upon us.


 12:37 am on Aug 31, 2014 (gmt 0)

Everyone be sure to markup your site with schema.org markup to help accelerate Google's learning of all facts and information in the universe we call the web and to help hasten the demise of your own sites.

Before long they won't need organic listings. Google will be able to answer any question that is posed without them having to refer users to your sites. Freebase... Knowledge graph... now Knowledge Vault. The plot thickens. They have a sinister plan... and "all your site's information are be [Google's]".


 5:56 am on Aug 31, 2014 (gmt 0)

@ZydSEO, Google's plan is not going to work. schema.org markup is going to be abused so heavily it will not be reliable enough for this and not everyone will use it. Look at how well the authorship markup experiment worked.

Even with manually integrated sources Knowledge Graph is far from reliable. For a long time it listed the National Museum of the US Army as one of the places in Galle, Sri Lanka, (because the US Army museum is in Fort something, and there is something with "national museum" in its name in Galle Fort).

Wolfram Alpha does a far better better job than Knowledge Graph, and Knowledge Vault is trying to do something a lot more complex than either.


 7:37 am on Aug 31, 2014 (gmt 0)

Sounds like the best of Wikipedia.


 12:07 pm on Aug 31, 2014 (gmt 0)

Is this a fair use of intellectual property or just plain theft?

An example scenario:

If a publisher receives attribution from Google when the answer to a user's question is displayed on Google Search, what benefits will that publisher receive if < 5% of users click through to visit their website? The display advertising revenue model is an important revenue stream for many websites and is not exclusive to Google. If Google is going to shave such a large percentage of click throughs to websites, with a knowledge graph/vault, the information they are displaying should carry a requirement under intellectual property law that it be licensed.

Google moves in small steps and conditions publishers to accept these types of changes. For example, Google has overlooked/ignored meta page titles and meta page descriptions to display what they want for a while now. Even those not using schema markup have lost control over how their works are presented to others. Those using schema, with the hope of Google sending them more traffic, will find out too that their content can be assimilated if Google deems it worthy enough.

brotherhood of LAN

 12:27 pm on Aug 31, 2014 (gmt 0)

It seems like Google has always treated copyright as a "sum of its parts", the cache was always unpopular but no longer visible in the SERPs (the link is still there though).

Knowledge graph, based on Freebase has a large number of structured datasets to work with, most notably wikipedia, this image [wiki.freebase.com] shows a bunch of the sources. Their own dataset is larger than the one offered for download [developers.google.com]. A large proportion of the time a "knowledge box" is shown, it's from this open-source structured data.

We've seen a number of reports of knowledge boxes being shown from unstructured markup too. Seems to me that they're only displayed when there's a kind of affirmation between data sources, i.e. the statement of fact(s) are from multiple sources, and it'll result in a knowledge box showing a snippet from one of the sites.

If I say something like "there's a Scottish referendum on independence on the 18th of September", I imagine Google could, taking from other sources also... make a knowledge box or answer out of that. If they used my exact words, I don't see it as a big deal.

I suspect that's how their argument goes, they're almost taking "quanta" of information about the words we use, comparing it with similar statements of fact, and not lifting and redisplaying your entire work.

From the content creator's PoV I understand the concern. It could be like a death from a thousand paper cuts, with your content dissected into fragmented parts. Copyright (and privacy) has had its boundaries pushed for years by the big web companies and it seems like something will have to change. If only there was a way to imburse people for divulging new and interesting information... at the minute it's at the whim of online marketing and ranking algos. Still, make hay while the sun shines.


 1:22 pm on Aug 31, 2014 (gmt 0)

Is this a fair use of intellectual property or just plain theft?

Facts are not intellectual property, they are public domain.

They cannot be copyrighted or stolen.



 1:31 pm on Aug 31, 2014 (gmt 0)

Google is building a new database of information called Knowledge Vault

I bet the next gig will be called Knowledge Bunker where they would slap Ads on pages filled with the info they gathered about specific One.


 1:51 pm on Aug 31, 2014 (gmt 0)

Facts are not intellectual property, they are public domain.

Though factual information is not copyrighted, how those facts are documented, assembled and presented to users on a private domain is. Once Google employs a staff to perform the labor involved with documenting all the facts in the world, instead of lifting the data, then they would have a good argument that would change my opinion. Taking someone's research and labor for their own profit is criminal and well beyond the scope and legal protections given to search engines, in my opinion.


 4:19 pm on Aug 31, 2014 (gmt 0)

Facts are not intellectual property, they are public domain.
They cannot be copyrighted or stolen.

i tried including football schedules on my site once the dates of upcoming premier league games. it was just very simple stuff like this:

10 Jan - Arsenal v Liverpool
11 Jan - Man Utd v Tottenham
12 Jan - Brazil v Wycombe Wanderers

nothing more than that.
to any regular person they would seem like facts, but i got an email from the people who own the copyright, telling me to take them down.
it sounds totally bizarre, but there is a body which actually owns the copyright to the premier league and lower league schedules, and you have to stump up a couple of hundred quid to reproduce just one team's schedule

i'm guessing google didn't get the same email.

if you can copyright that, then why cant you copyright event listings too? just simple stuff like this:

10 Jan - Cliff Richard at Wembley Stadium
11 Jan - Cliff Richard at Hollywood Bowl
12 Jan - Cliff Richard in Wandsworth Prison

i cant see any difference at all, but that is exactly the kind of thing that google scrapes on masse


 5:08 pm on Aug 31, 2014 (gmt 0)

It seems perhaps relevant to this discussion, to mention another concurrent thread here regarding Google's "dropping" of "authorship support" [webmasterworld.com...]
as someone who has watched with Google with interest, I suspect that the two things may well be related.

After all, if Google is intending expanding it's use of material / content from other people's websites via a "knowledge" "whatever" above organic SERPS, without directly sending the visitor to the website(s) from which Google has "gathered" the data / information / content..Then Google would hardly want to acknowledge the original source author with a thumbnail photo..

To do so might risk many to say that Google knows that the author is not Google..and that Google might well be considered to be in breach of copyright ( depending on one's legislation, and judge )..witness how Image search now "works"..each image presented by Google ( on a Google page ) says "may be subject to copyright" ( if they come from a third party website the copyright is certainly not Google's, nor is showing an entire image "fair use" ), almost all images are subject to copyright, except those marked expressly "not copyright" by the original author(s), or similarly indicated "copyright free"..Google can argue ( and have done so , IMO highly cynically ) that they cannot know that the site upon which they found the image was not the true copyright holder, hence the "may be"..By removing support for "authorship", they prepare the terrain to do the same with webmaster's written content as they have done with their images, and, given that blocking Google from content via robots.txt only prevents them taking it "directly", and does not prevent them "aquiring" content via other sites who have copy and pasted, screen shotted, pinned etc .."opting out" via robots.txt is not effective against "may be copyright"..

If, as blend27 puts forward, Google decide to put ads around either the text in their knowledge graph / vault
"results"..( currently they do put ads very close to their "knowledge graph" "insert", I'm expecting them to add ads to their image search interstitial results pages in image search, in countries where they can get away with it soon, if they do, will they only do so with images that originate from copyright holders from those countries I wonder ? )..how would ( does ) someone living elsewhere know what Google ( or any other SE for that matter ) is doing with their copyright material ?

To bear in mind...if Google was able to use your content ( images or text etc in such as blend27 posits ) would they need to keep adsense going ? after all, if they are going to do things ( with accompanying ads ) which would drastically reduce your SE traffic, why would they place adsense on sites that they were "accidentally" starving of visitors..

It could be like a death from a thousand paper cuts

Or perhaps like "slowly boiling frogs"..some of us have been saying for a long time now, that Google in particular, but also Bing, Facebook, pinterest et al were "heating the water"..

The final sound a boiled frog hears may well be Google*

*Other search engines etc are trying to do the same, but Google is more "onomatopoeic" in this context..

I suspect that within 5 years, if you have an "informational" site, the SE's will be using your content ( via "knowledge vault" or similar ) and not sending you the visitor / traffic..and if you have a commercial site the SE's will require you to buy adspace to be anywhere in their results.."organic results" as we now think of them , will have disappeared, as will adsense for all but a very reduced number of sites..


 6:25 pm on Aug 31, 2014 (gmt 0)

i tried including football schedules on my site once

Football League fixture lists had copyright protection under English law from 1959 until the European Court of Justice lifted it for the whole of the EU in 2012.

12 Jan - Brazil v Wycombe Wanderers

I'm not sure that one would qualify as a fact.



 8:28 pm on Aug 31, 2014 (gmt 0)

To me, the Google book scanning project got snuffed thanks to the courts. This appears to me to be the same philosophy as the book project, with the exception that courts won't have a say in who "owns" the information. As in, the internet as a whole is a much less labor intensive scrape as it were. We in a sense who uploaded info/pages to the internet did Google a big service if you think about it. In a sense we all scanned the books for Google and they can publish all the information. Of course the arguing about right, wrong, and ownership is simply just background noise. Authors? As we all know, our internet contributions can be scraped with ease and used by anyone. The actual defense against taking information is FEEBLE at best.


 8:33 pm on Aug 31, 2014 (gmt 0)

Another point to add, the ads aspect is rubbish. Even without the use of ads or commercialization, it doesn't fly with me or make it more right. In fact, if they do keep people on their site, then they are in fact denying me possible ad clicks because perhaps I'm putting ads on MY SITE. If they essentially take over that information for themselves? I'm out the revenue. I don't think they dare put ads on any of this. It will be "for the good of mankind". The webmasters losing traffic and potential ad clicks will be more insignificant that background noise.


 8:57 pm on Aug 31, 2014 (gmt 0)

In an ideal world, each website/domain should have a unique 'Paid Content ID' (similar to Adsense Publisher ID). Whenever a search engine uses/displays content from the site, the owner is paid $X.

A 'content' would be defined as at least X direct words, an image, audio, video, etc.

brotherhood of LAN

 9:22 pm on Aug 31, 2014 (gmt 0)

I like that idea Selen, I think it's a great approach for our "knowledge based" market.


 9:47 pm on Aug 31, 2014 (gmt 0)

In an ideal world, each website/domain should have a unique 'Paid Content ID' (similar to Adsense Publisher ID). Whenever a search engine uses/displays content from the site, the owner is paid $X.

A 'content' would be defined as at least X direct words, an image, audio, video, etc.

I like that idea Selen, I think it's a great approach for our "knowledge based" market.

That "approach" would not prevent scrapers..who would then have the scraped content shown ( and the "showing" paid for )by search engine(s) in place of the original authors..Blogger* anyone ..

*Owned by ?...

Nor would it deal with crowd sourced scraping, such as that by Pinterest users or ( "spinning" ) such as is done by wikipedia or ehow etc, or other entities whose owners, backers have ties with the search engines..

Very easy to arrange to give ( in exchange for "content use/ display"..or even ads clicked on ) monies to yourself or your friends or backers, or via "proxies"..such "deals" exist already, ( adsense, amongst other ways ) involving some of the entities that I mentioned in the previous paragraph..

brotherhood of LAN

 10:01 pm on Aug 31, 2014 (gmt 0)

Yes Leo', that's the inescapable problem in a nutshell, if you don't want to give your content to the bots, someone else can.

Any more ethically pleasing SE startup would also have the problem of deciding the authoritative source of content that already exists.

What'd be needed for a system of micro transactions of content/payment would (unfortunately) have to be a somewhat centralised and authoritative source that's able to discern the originator of content, checksums or signatures of the unique knowledge. Kind of like how HTTPS works at the moment.

You'd either work inside that sphere of influence or outside it, but to me I think if there was mass adoption of the idea, crawlers/engines would have a hard time ignoring it. If they did, scraped content could simply be ignored.

Anyways, a bit OT but I think there's something workable but obviously involves a huge change in the way things are done.


 12:01 am on Sep 1, 2014 (gmt 0)

I doubt Goo will pay a dime to anyone for anything. If they can shove out Adsense publishers, they will. The only thing that matters to Goo is Goo. Same old, same old; if you don't want your content, photos, whatever scrapped, don't put it on the internet.

This isn't a leap forward--it is just an inevitable leap. Adapt or die.


 1:37 am on Sep 1, 2014 (gmt 0)

I've really focused on a quote from Cutts regarding the dilemma of the knowledge graph situations and whether content owner should in fact see money from the use of it. Paying for content? That's a tough sell to shareholders isn't it? Fact is this method of utilizing the information from the internet if far superior to the book scanning. It's easy to work collectively isn't it? If Bing is doing it, why can't I? If they do, then so can I. If this is unethical, then aren't we all? I'm no more worse than those other guys. The debate we're even having here is who owns what. Who put it up? Who published those scores? Who came up with that weather forecast?

Perhaps the answer box was the soft launch? I think it's going to take a trend of a great number of information scrapers before Google really goes out on a limb with this one. Remember this might have started with the greedy image search. Imagine that both Bing and Google came up with the same type of system at right around the same time. A great number of webmaster were snuffed, myself included on that transition. Scraping information will be next on the agenda. I would expect the same outrage as there was for the image search. In other words, those trampled under foot will not be remembered or worried about.

Doesn't anyone agree that this concept is basically the book scanning project except like 2.0 or more cost effective? Same moral issues except this time there aren't a bunch of authors crying about it. The idea of this truly brushes me the wrong way, but at least I'm a realist about how things are eventually going to shake down. I can still exist, but certainly it's a nasty and unethical move.


 2:02 am on Sep 1, 2014 (gmt 0)

Lots of people who write fiction put up the occasional short story or flash fiction on their websites. If it's good and they're well known lots of people might link to it, so we're talking about trustworthy and interesting content. How is Google planning to differentiate that from regular facts? And if readers liked it and created their own fanfic based off it, thus spreading the "facts" in the story even more widely, what then? Alternate history authors are going to have a field day.

I sense the next evolution of Google bombing around the corner.


 5:15 am on Sep 1, 2014 (gmt 0)

It begs the question, "Why did they buy out that click fraud detection company?" <-- read that again...They're losing search share slowly to mobile apps and the knowledge vault seems like another effort to keep the money flowing for as long as they can. Read that part in quotes above a third time.


 11:07 am on Sep 1, 2014 (gmt 0)

Implement your own site wide authorship annotation is one solution. I have done that in many parts of my sites and am continuing the process. Adding "date published" information will also help protect your copyright if needs be.

creative craig

 8:43 pm on Sep 1, 2014 (gmt 0)

The paid ID could be locked to an IP address and a domain, it doesn't fix the duplicate content issue but it could add a thin layer of security


 12:03 am on Sep 2, 2014 (gmt 0)

The more I think about it, the more interesting this becomes. The monetization aspect of it. If for example I could get an article or portion of an article published on/in the biggest single source of information? This obviously goes well beyond the joy of Google picking my site to be on page 1 or near the top of page 1. If people can't make it to my content, but Google for example desires that content, then I'm more than happy to do just that. I mean if this is a trend of how things are going, then let's talk profit sharing. If my site is getting sparsed, but the content isn't then let's talk about a payment system which in turn would likely create a darn accurate record of who wrote what and when. Couldn't a system of submissions actually work moving forward? Before I publish it on my site, I submit it to Google for example. There would have to be a method of date stamping. Beyond my pay scale to figure out, but if truly everyone like Apple, Google and Microsoft are going to start lifting all the good stuff off the internet for their own use, then perhaps a system of payments has to setup and considered first. I can always dream. And to think how hard page 1 results are? Only 1 chosen one for a given answer? That has to be like a Gold medal in the olympics and thus should have a lottery type winnings associated with it.


 1:23 am on Sep 2, 2014 (gmt 0)

Has potential to be another huge deterrent to posting good material on websites; could add to the thread with satirical title on "let's all quit" as success with google is too tough now.
[I post a lot less on my own sites nowadays; a lot to Facebook for quick responses]

Thanks to Leosghost for good laugh re Cliff Richard venues.


 3:29 pm on Sep 3, 2014 (gmt 0)

Though factual information is not copyrighted, how those facts are documented, assembled and presented to users on a private domain is.

I once attended a seminar on copyright and intellectual property where the IP attorney for a major corporation (one of Google's competitors, by the way) had a simple prescription for avoiding infringement and fair-use issues:



 8:10 pm on Sep 3, 2014 (gmt 0)


and that is all Goo would have to do--just enough--swap a word or two here and there and spit it out that way. I'm pretty sure they are smart enough to write that up, then that's it--all over.

When Goo and Bing started swiping my photography I put my name and website watermark on every one. It has worked out pretty good for me in the long run. If clients want a high res copy, they find me.

Content is a different challenge. I believe all content will turn into this gray mass of mediocrity one day. My solution is; since the Goo has to back off book scanning, go back to traditional publishing. That should filter out the weaklings and hold off the search engines maybe long enough to for me to profit from my own original work (content).

The web has been a lot of fun and money for me, however, it is dying in regard to individualism. I'll stick around until it is all the way dead too. There is a better, more satisfying life back on the other side, where I started--in the real world--maintaining what I can of my privacy and creativity. My 'real book' library has been my best investment.

The air is much better outside of the 'vault.'



 1:52 am on Sep 5, 2014 (gmt 0)

and that is all Goo would have to do--just enough--swap a word or two here and there and spit it out that way.

That kind of "spinning" isn't rewriting. But in any case, I see no evidence that Google or the other big search engines want to rewrite, repackage, and republish everything of value on the Web. Facts? Those are easy. In-depth articles, papers, or entire sites? Not so much.

This 32 message thread spans 2 pages: 32 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google Finance, Govt, Policy and Business Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved