| This 47 message thread spans 2 pages: 47 (  2 ) > > || |
|Publishers Aim For Some Control Of Search Results|
|Global publishers, fearing that Web search engines such as Google Inc. are encroaching on their ability to generate revenue, plan to launch an automated system for granting permission on how to use their content. |
Buoyed by a Belgian court ruling this week that Google was infringing on the copyright of French and German language newspapers by reproducing article snippets in search results, the publishers said on Friday they plan to start testing the service before the end of the year.
"This industry-wide initiative positively answers the growing frustration of publishers, who continue to invest heavily in generating content for online dissemination and use," said Gavin O'Reilly, chairman of the World Association of Newspapers, which is spearheading the initiative.
Publishers aim for some control of search results [today.reuters.co.uk]
Belgian Courts tell Google - no cached pages allowed [webmasterworld.com]
|WAN is a Paris-based umbrella organisation encompassing 72 national newspaper associations |
Convulsions of a dying corpse.
I can not understand what these publishers are fighting for? Truth is nobody would have found them searching on Google if there were not such snippets of their articles on the serps.
If you do not like to be indexed you simply put a noindex tag.
You can not have both Google sending visitors to your site and not at least a small snippet of what is your site about! This is technically impossible.
You can have:
- Your site is indexed and Google sending traffic to your site.
- Your site is not indexed and Google can not send traffic to your site.
- Your site is indexed but Google is not sending traffic as your content is of a poor quality and you do not rank well on the serps.
You can not have:
- Your site is not indexed but Google is sending visitors to your site.
Obviously these publishers want exactly that, which is certainly ridiculous.
Read the articles, the problem isn't over indexing, it is about CACHING, one thing this site doesn't allow with the use of the no cache command.
The main argument is that the engines are caching without permission. It is no use saying they can use the "no cache" tag as it is rather like me saying anyone can take my ipod from my desk as i don't have a big sign saying not to.
|...like me saying anyone can take my ipod from my desk as i don't have a big sign saying not to. |
Actually, your ipod is not on your desk. Your ipod is on the street! As it is on the street, yes, you have to have a sign saying not to.
You have configured your servers for anonymous access, right? You PUBLISH your site on the Web and certainly the Web is NOT your desk. It is a public place. And yes, you have to place signs there that everybody that goes by can understand!
[edited by: wildbest at 2:56 pm (utc) on Sep. 22, 2006]
|The main argument is that the engines are caching without permission. It is no use saying they can use the "no cache" tag as it is rather like me saying anyone can take my ipod from my desk as i don't have a big sign saying not to |
There's one flaw in that analogy: Stealing an iPod is clearly and indisputably illegal; caching a page in a search engine is not (except possibly in Belgium).
Still, if we're going to use analogies, let's try this one on for size: Would a business leave its offices, warehouse, or plant unlocked and unattended? If it did, wouldn't that be regarded as foolish and irresponsible? And would it get much sympathy from its insurance company or the general public if it defended its failure to lock the doors by saying "It isn't our duty to secure our premises"?
|You can not have both Google sending visitors to your site and not at least a small snippet of what is your site about! |
...err, yes Google CAN do this - there's nothing "impossible" about it.
Some sites ban Google with robots.txt. If there are enough inbound links then Google sometimes lists the URI in the SERPS even if Googlebot has never visited.
Also have a look at what Matt Cutts has to say about noindex [mattcutts.com].
I don't understand the Reuters article. Have they completely missed the point?
The ruling was about caching of pages, not snippets in the SERPS, unless I've misread the ruling?
|a Belgian court ruling this week that Google was infringing on the copyright of French and German language newspapers by reproducing article snippets in search results |
If that is indeed correct, then that throws a whole new light on the ruling.
[edited by: trillianjedi at 3:11 pm (utc) on Sep. 22, 2006]
SEs should call their bluff; a simple solution would be to create a meta tag (cache = yes please), then after six months, all other sites (or those with cache = no thanks) would no longer be cached.
this would not affect the day-to-day indexing at all, but would make caching an 'opt in' procedure, subject to SE terms and conditions, ending these frivolous law suits forever.
I think that this is very much a case of biting the hand that feeds (or cutting your nose off to spite your face!).
For some years publishers have been affected by downturns in advertising revenues - partially this is economic, partially it's because publishers have failed to deliver value for money/return on investment to advertisers.
The claim now - in a "why didn't we think of that" kind of way - is that search engines, Google in particular, are able to make money from publishers' content and thus must be depriving publishers of their much needed revenues. Sadly, as traditional advertising revenues continue to decline, publishers believe it must be someone elses fault but their own.
Newspaper readerships are on the decline. Publishers are falling over themselves to produce free newspapers in a last gasp attempt to get eyeballs. Citizen journalism undermines the need for mainstream media - it doesn't supplement it. Where, then, are news publishers to turn if their very raison d'etre is being eroded?
|...err, yes Google CAN do this - there's nothing "impossible" about it. |
...err, just trying to imagine what serps would look like?
Certainly, visitors to such a "search engine" would be entirely left in the dark as they would have absolutely no idea what link they click on as there would be no information AT ALL but only a list of meaningless URLs. Is that your idea for how search engines should look like? Then are we supposed to make these snippets part of the URL to meake it easier for humans to make their choice on what they click?
This is all about testing legal boundaries -- what rights do copyright owners have? This battle/debate is similar to ones that took place WRT books (and xeroxing pages in libraries) video tapes (and recording copies for personal use or commercial distribution) and music (file sharing, etc.)
It is costly to create content, including journalistic content. That provides an economic incentive for newspapers to try to keep as much legal control as possible over the content they produce.
Sure, most small content owners are grateful to Google for sending them visitors, but that is because Google has so much market power. Might doesn't always make right, especially from the perspective of newspapers and other offline content producers who's livelihood is deeply threatened by what Google is doing.
There is a very important difference between snippets as they are presented in typical SERPs, and news summaries. Many users are satisfied reading the headline and first sentence of a news story, and have no desire to read anything more. Or, they prefer a short synopsis of the news, rather than the real thing. Think CNN Headline News.
To pursue the analogy a bit further, the Ipod isn't actually out on the street -- it has been placed on the newspapers' front yard, right next to the sidewalk so that pedestrians can easily walk up, enjoy the music. and read some ads posted next to the Ipod.
The owner has noticed that a very powerful, rich corporation and potential competitor has taken their Ipod and replicated it down the street, on a table along with a lot of other Ipods. Many people are crowded around the table, listening to the music without bothering to walk down the sidewalk to the original Ipod, where they might see the ads posted in the owner's front yard.
It isn't unreasonable for content owners (in this case newspapers) to probe the legalities of the situation to find out what is allowed and what is not allowed without explicit permission.
Sure, the Ipod owner could discourage this activity by putting up a polite sign asking Google not to take their Ipod, but the presence or absence of such a sign may or may not have any impact on the legalities of the situation. Even if Google respects the sign, some other content scraper/aggregator may not respect the sign, so it makes perfect sense for them to try to get the police to protect their Ipod for them.
Of course, copyright law only protects specific creative expressions. It doesn't protect the underlying data, or the actual news itself. Newspapers have been rewriting each others' reporting for more than 200 years.
So, even if Google isn't allowed to do what it has been doing, the newspapers won't necessarily be able to fully protect their economic interests, because Google and other websites always have the option of presenting summaries of the news that have been rewritten to avoid violation of the copyright laws.
|This is all about testing legal boundaries -- what rights do copyright owners have? |
It's also about what happens when corporations try to exploit a new medium without fully understanding or embracing the medium. IMHO, it's comparable to the mentality exhibited by corporate-owned sites that try to defeat "deep linking" or that forbid linking at all without their written permission.
I'm not really a big fan of the IPod analogy.
It was mentioned above and I think it's true. Google should just follow the no-cache rule and treat it also as the noindex tag. What? You don't want us to cache your pages? Fine. Have fun not showing up in the SERPs then!
I think Google at this point can still deliver pretty good SERPs without needing 600,000,000 results for a keyword phrase. Hell, it's not like anyone goes past page 2 or 3 anyway!
The publishers are being big babies about this. It's not like Google is USING the cached pages to make money. It's just a tool to deliver the search results faster. You're in the SERPs? You're getting traffic? That traffic is generating money due to the advertising you get, and you get the advertising due to convincing someone to buy it based on the amount of traffic you receive! It's a symbiotic relationship, and now they want their IPod and eat it too....errr...wait.
|The owner has noticed that a very powerful, rich corporation and potential competitor has taken their Ipod and replicated it down the street, on a table along with a lot of other Ipods. Many people are crowded around the table, listening to the music without bothering to walk down the sidewalk to the original Ipod, where they might see the ads posted in the owner's front yard. |
You are missing the point!
If you put something in a public place you should place also a clear instruction what is allowed and what is not? Because I may just piss on your ipod thinking it is a new kind public utility for that purpose.
Humans can see the Copyright sign and can decide which part of your content to what extend is protected. Robots can not understand this. They can see the C sign, but they can not determine to which part of your content it relates and under what conditions copying content is allowed. That is why a special sign for robots must be used - the NOCACHE tag. Yes, it is that simple.
|SEs should call their bluff; a simple solution would be to create a meta tag (cache = yes please), then after six months, all other sites (or those with cache = no thanks) would no longer be cached. |
I agree this is the most simple and effective solution.
However, we should honor the main principle that drives progress forward - everything that is not explicitly forbidden is allowed! Just imagine where humankind would be if people had always done just those things that were listed on the things-allowed-to-be-done list?
That is why, if you publish something on the Web, just tell people and machines passing by what they are not allowed to use your content for.
I don't know if I'm reading a different article, but the one I see doesn't say the newspapers are trying to ban Google from their sites collectively. They want to be able to have some options other than the take-it-or-leave-it ones that have been listed in this thread, and to set up a permissions system that bots can understand:
|Global publishers, fearing that Web search engines such as Google Inc. are encroaching on their ability to generate revenue, plan to launch an automated system for granting permission on how to use their content....... In one example of how ACAP would work, a newspaper publisher could grant search engines permission to index its site, but specify that only select ones display articles for a limited time after paying a royalty. |
I'll agree that the statement about "granting permission to index its site" is rather ingenuous, but limiting the time an article can be displayed, or asking for a royalty, doesn't seem too outlandish if a publisher wants to follow that option. As I understand it, each publisher would decide how to use the system. For example, small ones that depend on search engines for traffic could leave themselves as open as they want, while larger ones that are "destination sites" could impose limits if they want; the search engines could decide if a large "go to" newspaper is important enough to list that they're willing to pay a royalty.
The idea seems to acknowledge that when it comes to symbiotic relationships with search engines, one size doesn't fit all. Maybe I'm missing something, but I don't see why a site owner - and copyright holder - shouldn't be able to have some choices in how its material is used, besides all or nothing. If an individual site makes a bad choice, it would pay the consequences.
And if the whole idea falls apart, the consortium of newspapers behind it will be the ones losing the money.
|It's not like Google is USING the cached pages to make money. It's just a tool to deliver the search results faster. |
And Google isn't competing with any other search engines, so providing a faster search has no monetary advantage for them, right? ;)
[edited by: Beagle at 5:03 pm (utc) on Sep. 22, 2006]
|I'll agree that the statement about "granting permission to index its site" is rather ingenuous, but limiting the time an article can be displayed, or asking for a royalty, doesn't seem too outlandish if a publisher wants to follow that option. |
Some publishers do this already by posting a bit of the article on one page and then linking to a members only section that one must pay for in order to read the rest of the articles. I think this is fair, it's like a subscription process.
A lot of people say that Google has grown into an entity that the public relies on so much that it should be regulated as such. I say poppycock! They are not breaking any anti-trust rules. They have competition. Look at Microsoft for example. You can still choose to not buy and/or install Microsoft products on your computer, albeit for most this is a tedious affair. You can choose to not associate yourself with Google as well.
I don't think they should be putting Google on trial for this. It's Google's search engine, follow the rules or don't include yourself (or exclude yourself actually I guess). You have many options: noindex tag, disallow Googlebot, etc. Now if Google would just get down to following these tags like they should...
I guess what I'm saying is that if Google is liable at all here it is for ignoring tags they should be respecting...
I still get the feeling that two different articles are being confused here. The one in the title of this thread: "Publishers Aim for Some Control of Search Results," has nothing to do with putting Google on trial. What the newspaper consortium is trying to do is set up a better (from their POV, anyway) system of tags, giving more information to the bots (and more choices to the websites) than just "come in" or "stay away." That's it. It seems quite reasonable to me.
ETA: Looking back through the thread, Quadrille's solution of creating a new tag is exactly the kind of thing they want to do - but with a few more choices. I'd love to read a discussion of what people think about the system that's actually being proposed.
[edited by: Beagle at 5:22 pm (utc) on Sep. 22, 2006]
If it were that simple, it would be reasonable. But several reports have confirmed that the real agenda is to get Google to pay them for the right to cache.
That may or may not be reasonable; but caches do not currently earn money for Google (no adsense, for example), so the publishers are seeking to further monetize the web - whichever way you look at it, if Google gives in, sooner or later the user will end up paying.
Hence my suggestion (and I wasn't the first!):
|SEs should call their bluff; a simple solution would be to create a meta tag (cache = yes please), then after six months, all other sites (or those with cache = no thanks) would no longer be cached. |
In my view, publsihers currently gain more than they 'lose' by SE listings; so this attack on Google stinks of self-serving greed, not any high-falutin' protection of copyright.
Gale's relationship with LookSmart through FindArticles.com is one model that serves the publishers and SEs well.
I haven't seen the "several reports," just the Reuters article. Are you saying this is a "slippery slope" issue? Something that all publishers would be forced to participate in whether they wanted to or not? If you're saying that all publishers would be forced to "cut off their nose to spite their face," no, I wouldn't be for it. That wouldn't leave any more choices than publishers have now - or would actually decrease them.
(And, of course it's for monetization. That doesn't necessarily mean it's bad. In fact, that's what copyright is all about.)
[edited by: Beagle at 5:36 pm (utc) on Sep. 22, 2006]
Please stop with the ipod analogies!
Comparing copyright infringement (or fair use) to any form of theft just shows that you have no concept of copyright.
Copyright is violated, not stolen. When someone takes your ipod, you no longer have that ipod. When someone violates your copyright, you still own the copyright. The analogy is broken, please move on.
What the newspapers want is for Google to pay them a royalty for the right to include their content in Google News.
It is not about caching the article in Google news, nor is it about advertising showing up on Google news, because neither of those things happen. The caching question was just a side issue to help support their point.
Due to the case in Belgium, the complaining periodicals have been totally removed from google. It is not what they asked for, but it was the easiest solution for google.
It is also unlikely that the newspapers would win that case in the English speaking world. Google News has far more protections under copyright law than their regular SERPs. I can oly think of one point where they might get in a tiny bit of trouble, and that would have to be judged on a case by case basis, and they are still unlikely to lose.
I expect that Google will push their Fair use rights in the countries where that applies, and simply remove those sites that win cases against them. If a country gives a blanket ruling, Google will include those that want the free listing, and not those that want royalties.
I just can't see google paying for the right to send you traffic.
Other threads on this issue linked to articles quoting publishers admitting that they wanted a slice of the cake (even though it's currently cake=0).
I'm not saying that monetization is necessarily bad, simply pointing out that this particular one will have consequences wiser than most commenters may have considered.
On the face of it, this could all be Good News For Publishers; personally, I doubt that, I think it's a sad restrictive step with zero consideration for the Internet as whole.
I hope Google has realised that is is quite possibly more than a little local difficulty, and act accordingly.
|ETA: Looking back through the thread, Quadrille's solution of creating a new tag is exactly the kind of thing they want to do - but with a few more choices. I'd love to read a discussion of what people think about the system that's actually being proposed. |
There is nothing wrong with the system, as long as they realize that it is not binding on the SE or anyone else.
Having that tag does not make it a 2-way contract unless the other side actually agrees. It cannot override the actual copyright law.
If the other party does not agree, that does't mean that they cannot use the content. It only means that they are reverting back to copyright law rather than contract law.
That's quite right - I'm proposing a contractual system of SE caches.
Having the tag is signalling assent to SEs keeping a cache (and on their terms); the keeping of the cache by an SE completes the contract.
I'm further suggesting that after a fixed period (I suggested 6 months), that SEs would cease to cache any non-assenting sites.
That would make it an entirely opt-in system.
I hope that clarifies things.
[edited by: Quadrille at 5:57 pm (utc) on Sep. 22, 2006]
|Please stop with the ipod analogies! |
Now that's good advice.
How about a more appropriate analogy? If you publish a book, should you have a page detailing the copyrights protecting the intellectual property in it, and which rights you reserve? Sure you do.
How is a no-cache tag more onerous than that?
Maybe instead of a cache tag, a full copyright tag:
copyright 2006 = index, cache, follow,
copyright 2006 = index, no cache, follow
Simplest solutions are so often the best. :)
... and that's enough from me! ;)
[edited by: Quadrille at 6:20 pm (utc) on Sep. 22, 2006]
|If you put something in a public place you should place also a clear instruction what is allowed and what is not? Because I may just piss on your ipod thinking it is a new kind public utility for that purpose. |
Now I know what Webmasters really mean when they say "my Google rankings have been hosed." :-)
|Having the tag is signalling assent to SEs keeping a cache (and on their terms); the keeping of the cache by an SE completes the contract. |
Then you missed my point completely.
If Google choses NOT to accept the terms of your contract, in the United States they are allowed to cache your site. They do not have to accept your terms to do that.
They would have to do something that they would not otherwise have a right to do, like serve up your content with their ads replacing your's.
You, as the publisher, cannot unilaterally give yourself more rights than copyright law give you.
For example, let's take a look at the no archive or no index meta tags. There is a VERY good chance that search engines would not be required to honor that tag under US copyright law. The same goes for robots.txt.
The reason that they have to follow those is because they say that they follow them. By saying that they follow them, and you taking them at their word, it forms a weak oral contract.
It does give them a little firmer footing in the copyright infringement cases involving the cache, but in the Nevada case they won on ALL points, not just that there was this industry standard that they followed. They have a DMCA and Fair Use right to cache your pages in the way that they currently do it.
They are simply not going to accept a system that another industry group tries to force on them, that would reduce their rights under copyright law.
Would your 6 month plan be from the time that they first crawl the page, or the last time?
Do you understand the burden this puts on the SE in regards to record keeping on 20 billion pages stored on thousands of servers in hundreds of data centers?
It isn't unworkable, but it isn't simple. And it certainly is not legally binding on the SE.
| This 47 message thread spans 2 pages: 47 (  2 ) > > |