Forum Moderators: not2easy
The billionaire told Sky News Australia he will explore ways to remove stories from Google's search indexes, including Google News.
Mr Murdoch's News Corp had previously said it would start charging online customers across all its websites.He believes that search engines cannot legally use headlines and paragraphs of news stories as search results.
"There's a doctrine called 'fair use', which we believe to be challenged in the courts and would bar it altogether," Mr Murdoch told the TV channel. "But we'll take that slowly."
Ok, so we all know that he could simply prevent indexing, but I'm sure he's aware of that too. Seems the wily old fox wants to go much further than just that and seeks change the entire landscape of the web by making it illegal for anyone and anything to use text from copyrighted stories as search result snippets.
The ramifications are huge. How very, very interesting...
Syzygy
Are you suggesting that Google News is scraping the "entire content" of Mr. Murdoch's newspapers every day and reprinting it under its own logo?
I think you are misinterpreting this. More or less the entire content of Google news is scraped from elsewhere. What would be the problem in someone rescraping it?
seeing as they haven't blocked my bot (...i've had a look at google's robot's file, and i'm not there)
[google.com...]
Whoops, I guess you don't know how to read a robots.txt file
It starts off like this....
User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
<snip>
and so on........
Maybe you weren't aware but * is meant to indicate all. So yes, you have been disallowed from spidering pages in Google.
Did you really look Londrum? I understand you don't agree with Google but you are spouting things off that aren't even true and worse you are making statements that you confirmed they are true.
Worse than scraping sites is intentionally misinforming people about "facts" that aren't true.
This is a respected site that you are a senior member of, if you are going to say things like "My spider isn't mentioned in Google's robots.txt, I checked"... then perhaps you should actually check.
[edited by: Demaestro at 10:49 pm (utc) on Nov. 12, 2009]
if they want our content then they should pay for it, like everyone else. sending me a few thousand people who won't buy anything is no recompense for scrapping my stuff
Then if you find that objectionable mate - block Google and any other SE. Not hard. Do you intend to block them because "they want our content then they should pay for it"? I bet you don't and I'll also bet Murdoch won't.
@mack
I don't think robots.txt is the answer. The problem isn't with serps, but how news is presented within Google news
Do you believe Google would fail to honour anyone's robots.txt? I believe they would.
@londrum
so presumably that means we are all well within our rights to scrape the entire content of google news everyday and reprint it under our own logo?
Any AdSense publisher will tell you that this is rife with MFA sites masquerading as pseudo-search directories or whatever. Absolutely no difference.
Thinking about it, why doesn't Murdoch file a DMCA notice if he's remotely fair dinkum with his complaints? If he continues to bleat but not act, his case weakens.
We had a legal precedent of that in Oz over using the expression "Claytons" for a variety of things and Claytons lost their case for failing to act to protect their trade mark which had by then entered into public domain as an expression to be used by all.
First, he/his team wants to go after the concept of "fair use" and change it. Well, I don't think they'll be able to, as it will impact too many industries/products. For example, back in the days the TV stations sued newspapers because they published TV-station-guides, and the TV alleged that this is copyright infringement. The courts ruled it isn't. The idea of "fair use" permeates our lives entirely, from papers that cite others, to books, movies, videos and pretty much any media out there.
Second, he doesn't understand the value of organic SE traffic. If Murdoch's online empire isn't indexed, well no one will ever visit it.
Third, when your industry is bleeding, you don't build walls to stop the attack: you counter attack by finding new markets/new ideas new businesses. Look at how the record industry got killed, and Apple made it (BTW, Apple is a bigger Gorrila than G, both in profit/revenue and market size)
Fourth, the real killer for newspapers isn't online readers on G News, its Craigslist. I know because I've been involved in the newspaper publishing industry. Calculate how much money a full page of classified ads makes in your newspaper, and you will fall on the ground. In the last 10 years the number of classified ad pages have melted to almost nothing.
This is an attempt to scare G, that's all. I hope they don't bite, because if they do, the repercussions will be terrible.
I believe this is correct. Classified ads were subsidizing everything else in the paper. So what happens when classified-ads consumers decide they don't need to subsidize sports-scores-and-blather consumers?
Craigslist, that's what happens.
And suppose your sports-scores-and-blather consumers have the blather piped in through cable TV?
(So, for that matter, do the political-scandals-and-blather consumers--as well as consumers of most of the other newspaper sections. As for consumers of international news, the newspaper abandoned them already.)
You have one news aggregator dinosaur with a texas-sized meteorite aimed at its vital organs. Its head can bleat or groan or whimper--it doesn't matter any more. The meteorite may miss its body, but the ecosystem that provided its food is minutes away from being coal-seams.
so that's what he does now: in the recent weeks he released "nachrichten.de", a news aggregator, exactly what google news is about. scraping other peoples content and republishing the snippets, framed in big fat ads (btw google news doesn't show ads yet in germany).
need another example of print publisher hypocrisy? axel springer on the same line: "google is stealing our content".
what did he do in the last days? he's "enriching" his newspaper articles with content from social networks. scraping copies of written text from users of facebook, twitter and the like. without asking anyone of course.
you understand now? ranting here and doing worse things there. it's all a big deception. i don't believe these guys one word. "fair use" my a$$.
so that's what he does now: in the recent weeks he released "nachrichten.de", a news aggregator, exactly what google news is about. scraping other peoples content and republishing the snippets, framed in big fat ads (btw google news doesn't show ads yet in germany).need another example of print publisher hypocrisy?
@moTi: this is not really hypocrisy. After complaining for a little bit and finding no sympathy from anybody (neither the courts nor consumers), the publisher decides to do *something* to compete with Google.
Makes total sense and should be encouraged. What are you expecting them to do? Wait until their business dies completely, play clean and not hit back when the other guy is pummeling them on the nose, with his gloves off?
I already posted in this thread and I am happy to reiterate again my firm belief that *the only* way for news publishers to pressure Google is to collectively (or individually, although that's less likely to succeed) launch another news aggregator.
That's what I think Murdoch's next move should be.
I already posted in this thread and I am happy to reiterate again my firm belief that *the only* way for news publishers to pressure Google is to collectively (or individually, although that's less likely to succeed) launch another news aggregator.
I wonder if a "collective" launch of a news aggregator by a cabal of news organizations (with the intent of quashing competition from Google and other search engines) mightn't run afoul of antitrust laws?
If Rupert Murdoch wanted to launch a house-brand aggregator site for his own properties, that would be a different story, but he'd need a way to attract readers. In the U.S., he might be able to create a kind of "Fox News" aggregator that would appeal to a certain segment of the political spectrum (i.e., the kind of people who get steamed whenever they see New York Times or Washington Post stories on Google News), but his news empire is global--and so is the Web.
Whoops, I guess you don't know how to read a robots.txt file
...I understand you don't agree with Google but you are spouting things off that aren't even true and worse you are making statements that you confirmed they are true.
...This is a respected site that you are a senior member of, if you are going to say things like "My spider isn't mentioned in Google's robots.txt, I checked"... then perhaps you should actually check.
i did check.
maybe you checked
http://www.google.com/news, and that is why you're telling me off, http://news.google.co.uk/ if you look at the robots.txt file then the only lines that refer to news are these...
User-agent: *
Disallow: /news
Allow: /news/directory
neither of those lines block us scraping stuff from
http://news.google.co.uk/, http://google.co.uk/news if they want to block us scraping stuff from the subdomain, then they need a robots.txt file which can be found there saying
User-agent: *
Disallow: /
i'm pretty sure i'm right about that.
...so until they block our bot, we're all well within our rights to scrape and republish their stuff. that is the argument that google are using against murdoch.
Many have said simply block Google from indexing using robots.txt this isn't the answer, Google is a search engine, why would any site want to drop out of the standard serps.
Some may see the "allow" on robots as opt in. I see your point, but this is for Google search.
The problem is not with Google indexing news and making them searchable. The problem is Google creating a service exclusive to news search, where they push the limits of fair use. Why would Murdoch block Google from spidering, when they can get free organic traffic. Google is a search engine remember, that's what they do. Provide search, but the real argument here is Google taking things to far.
Mack.
Get off the free willy web lowest denominator class gimme cause I do it for all net neutrality ride... none of it is free, kiddies. Wondering what planet we're on (I'm here, too!).
Also when Google wants to do something new it seems that they just go ahead and do it. They then face any consequences if they arise and more often than not get their own way.
The Google news thing is another example of them making the rules to suit themselves. When they do stuff nowadays they are seldom challenged. When does one company become too powerful? How long until we all live in Google world?
Go Rupert! Go Bing! :)
edit:spelling
I like to think of G as a maximum/minimum thermometer. Don't know how often you see those things around these days but we used to use one in the physics lab at my school in the late 70s. There was some sort of metal marker that would get pushed higher and higher up the thermometer as each new maximum was reached...and it wouldn't come back down or move at all until the mercury pushed it further up. This was obviously to measure the maximum...so similar wizardry was on display at the other end of the tube to record the minimum.
Google pushes and pushes, the metal always going upwards. Occasionally, they suffer a minor setback but you know they'll be back pressing onwards sooner or later. And the little measuring marker keeps advancing. They've done it with everything. They've even told us what their goal is, they're not coy about it. They want to organise the world's information.
Oh, BTW, that's YOUR information. They are the shelves of a supermarket, nothing else. The shelves of the store are going to be getting more money than the products themselves.
The problem is not with Google indexing news and making them searchable. The problem is Google creating a service exclusive to news search, where they push the limits of fair use. Why would Murdoch block Google from spidering, when they can get free organic traffic. Google is a search engine remember, that's what they do. Provide search, but the real argument here is Google taking things to far.
So what would you suggest--that Google dump everything in one giant, undifferentiated index? That doesn't make much sense from a curatorial point of view or, for that matter, from a searcher's point of view. And can you imagine the cries of pain at Webmaster World if wallys-widgets-for-sale.com had to compete with the latest Silicon Valley widget headlines in Dinty Moore Stew-style SERPs? (We already hear enough complaints about Universal Search!)
if people need some information, then google should tell them where to find it. that's what an index is. they shouldn't try to give them the information ourselves (which they've taken from us).
so until they block our bot, we're all well within our rights to scrape and republish their stuff. that is the argument that google are using against murdoch.
Yes based on what you are saying you will be allowed as long as you don't republish it word for word.... would you use snippets of snippets in that case?
I say go for it. You won't be the first, you might even make an income from it.
Google doesn't get to play by any special rules that you don't. If they can do it, you can do it.
That is why they aren't the only search engine out there.
I don't understand your point about scraping Google, do you think they would be bothered by it? If they were they would just add that page to their robots entry and request that you don't index their site. Something Murdoch will never ask Google to do because they know that it would.
Google is a search engine remember, that's what they do.
This is a common misperception. Google is an ad server. That is their core business and has been for some time.
Everything else they do is to provide a platform for their ads. Until/unless one grasps this then arguments over scraping/aggregating/fair use/etc. will go on and on.
I don't understand your point about scraping Google, do you think they would be bothered by it? If they were they would just add that page to their robots entry and request that you don't index their site.
From Google TOS
You may only display the content of the Service for your own personal use (i.e., non-commercial use) and may not otherwise copy, reproduce, alter, modify, create derivative works, or publicly display any content. For example, you may not use the Service to sell a product or service; use the Service to increase traffic to your Web site for commercial reasons, such as advertising sales; take the results from the Service and reformat and display them, or use any robot, spider, other device or manual process to monitor or copy any content from the Service. If you are uncertain whether your intended use of the Service is permissible, please contact us. In addition, Google shall have the right in its sole discretion to suspend or terminate the Service or your access to it...Last updated January 17, 2006.
Scraping G-News is easy enough. I must admit that I had a site a couple of years ago with a page "widgets in the news" that scraped [news.google.com...] and repackaged/reformatted it, complete with adsense ads. Even had it ranking #1 for the phrase 'widgets in the news' for awhile. I dropped the page when the site went through a major redesign - variety of reasons, but in the end I just saw it as too risky for what was overall a good solid and well ranking site. The news page added too little.
Considered using the RSS feed, but the Google branding looked too cheesey...
Now, what do you suppose would happen if I used a similar script to "aggregate" articles from a few European tourist info type sites and slapped a bunch of ads on them. Would that be fair use? Shucks, I'd be providing free traffic wouldn't I?
So what would you suggest--that Google dump everything in one giant, undifferentiated index?
@ signor_john - No, the honorable thing for Google to do would be to offer a way to opt out of all "Google-branded" services while staying in SERPs. In fact, I am of the opinion that they should be legislatively forced to do so. (please don't reply with another "you want to have your cake and eat it too!")
This affects not only newspapers, this affects everybody. I can see about 95% of all content-based site potentially decimated when Google finally launches the right product to do so. Right now it's newspapers squealing. Soon all the travel sites will cry (have you seen where Google Places is going?). Then, comparison shopping sites. After that, all real estate sites. The list will go on and on. I think Google has adopted the philosophy that no site has the right to be a destination in itself, only Google can be a destination. Everybody else will be chopped in pieces and served under the Google brand to eliminate the need of going there directly - using SERPs as the leash to keep them from closing off their content. This needs to be changed, urgently.
Now, what do you suppose would happen if I used a similar script to "aggregate" articles from a few European tourist info type sites and slapped a bunch of ads on them. Would that be fair use?
If you follow the same rules as Google... yes it would
Do you think that there is a special set of rules for Google that doesn't apply to you?
The rules of automated site indexing and directory listing sites are laid out. You don't have to pay off a politician, you don't have to be a multi-billion dollar company. You just have to play by the same rules as Google, Yahoo, Bing, Cuil, Dogpile, Ask, altavista, metacrawler, lycos, alltheweb, and many many many more.
There isn't a set of rules for Google and another for the rest of us. We can all create online directories, it has been a part of the web for a long time.
Does anyone remember what it was like before directories were common place. If you didn't know the name of a URL then you wouldn't ever find it. If your site wasn't linked from a site people went to then no-one found your site unless you did print ad campaigns that had your URL advertised. Directories are our friends.
Google is a search engine remember, that's what they do.This is a common misperception. Google is an ad server. That is their core business and has been for some time.
Google is a search engine. Asserting that it isn't is silly. That is like saying NBC isn't a tv station it is an ad server... it is a tv station that is why people view it. Google is a search engine that is why people view it.
People don't change their channel to NBC to watch ads.
People don't go to Google to view ads
NBC runs ads because they have something (tv shows and news) that people view and others find those eyeballs worth something.
Google runs ads because they have something (Internet search results) that people view and others find those eyeballs worth something.
Google doesn't run search results because people come for ads.
NBC doesn't run TV programs because people are watching commercials.
How silly does this sound?
NBC isn't a Tv Network that is a common misconception. They are an ad server.
Now, what do you suppose would happen if I used a similar script to "aggregate" articles from a few European tourist info type sites and slapped a bunch of ads on them. Would that be fair use? Shucks, I'd be providing free traffic wouldn't I?
Again, Google News doesn't aggregate entire articles. We all know that, so why waste bandwidth or your own credibility on false analogies?
Getting back to the topic of Rupert Murdoch, I find it ironic that at least one of his own newspapers (the NEW YORK POST) is a client of Daylife, an OEM news aggregator that "collects huge amounts of the web’s best content, deeply analyzes and parses it, and produces oceans of data to be reused in an infinite number of ways by any publisher."
According to Daylife's FAQ, "Daylife is an intelligent content services platform that harvests and deeply analyzes high-quality Web-based news and information sources in real time.....Daylife clients work with our highly flexible, easy-to-use platform to complement their own editorial expertise by instantly creating new pages, sections, and sites with little or no staffing."
Daylife clients include the aforementioned NEW YORK POST, Telegraph.co.uk, National Public Radio, USA TODAY, Guardian.co.uk, NEWSWEEK, Sky News, and other major news organizations.
Bottom line: News aggregation goes far beyond Google News. But then, news aggregation existed long before Google News was invented; Google News and Daylife.com are just high-tech versions of what newspaper "rewrite men" have been doing for generations.
Google is a search engine. Asserting that it isn't is silly. That is like saying NBC isn't a tv station it is an ad server... it is a tv station that is why people view it. Google is a search engine that is why people view it.
These days commercial TV has pretty much become little more than an advertising medium. I must admit, I don't have TV and only watch in now and then on the road, but what do you get, 7 minutes of programming with 3 minutes of commercials - roughly 1/3rd ad content to programming? There is a difference though, NBC pays for their content.
Part of Google's business is search. Most of their business is ads. AdSense, AdWords, etc. Then there are "free" services by which they acquire information about how to better serve ads. G-Mail, Chrome, Analytics, etc., etc.
Google is no more a search engine than is Microsoft. That both offer search hardly points to their raison d' etre.
Again, Google News doesn't aggregate entire articles. We all know that, so why waste bandwidth or your own credibility on false analogies?
Nor did I suggest aggregating tourist sites would involve entire articles. a headline and opening paragraph would be sufficient:
Where to Sleep in VeniceLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ipsum urna, hendrerit quis lacinia a, pretium sit amet nibh. In sollicitudin blandit felis, sit amet adipiscing lacus mollis sed. Curabitur dictum diam sit amet ligula tempor rutrum. Lorem ipsum dolor sit amet, consectetur adipiscing elit...read more from example.com
View all 1250 Articles from Around the Web<adsense code here>
Are you OK with something like that? That is really a pretty simple script to put up.
It is quite different than a directory, ala DMOZ/Yahoo where the title and accompanying text are either written by the editor or provided by the site owner or other submitting user.