Forum Moderators: martinibuster
How about if I add my own features with scrapped data. For instance if some site is showing product catalog and I implement RATING SYSTEM on my site for that data,Will it still be illegal?
Since I dont have much idea therefore I want to learn more about it
Copyright infringement violates the AdSense T&Cs:
AdSense publishers may not display Google ads on webpages with content protected by copyright law unless they have the necessary legal rights to display that content. Please see our DMCA policy for more information.
How a site exposing data in form of RSS feeds is different than scrapping
Even RSS feeds by default are *NOT* allowed to be republished unless the author allows it.
They are intended for feed readers only, not other publishers, so check their policy before using them.
what is your definition of "Stolen"
How a site exposing data in form of RSS feeds is different than scrapping
Since I dont have much idea therefore I want to learn more about it
Take some time to learn about copyrights, DMCA, and the penalties (besides just AdSense) of using copyrighted material without proper permission.
What is the difference between what the OP is asking and search engines?
They take publishers' content and then display it how THEY deem fit and how THEY deem it to be relevant and advertise to all and sundry.
</devil's advocate & dont shoot me down, one rule for THEM and NOT for competitors?>
I do appreciate that the OP asked about AdSense and they have THEIR rules attempting to negate ANY competition:-)
What is the difference between what the OP is asking and search engines?
The search engines use content with CONSENT (controlled by robots.txt) and send traffic so we can make tons of money.
The OP wants to use content WITHOUT CONSENT by scraping the search engine in order to make his own tons of money.
Huge difference.
[edited by: martinibuster at 3:12 am (utc) on Aug. 13, 2009]
[edit reason] [webmasterworld.com...] [/edit]
The search engines use content with CONSENT (controlled by robots.txt)
Doesn't matter, SE's have permission, he doesn't.
The OP has not suggested he would violate robots.txt, and anyway the robots exclusion standard does not infer consent.
The issue is not necessarily with copyright infringement (if the OP respects fair use - admittedly a big "if"), but the requirement for "substantial, original content".
The standard disclaimer "We ain't no steekin' lawyers, even if you pays us!" applies.
Me, I say: "Scrape at your own peril. Might come back to bit you in the a$$ when you least expect it."
there re several blogs which are showing RSS feeds data from various sources. I did nto get it. RSS is public syndication. If does not want to show feeds then why would he go to expose data via RSS?
Is there any list that what kind of data can I publish from other sources on my site? I see various sites(esp, Blogs) showing data from other feeds and earning it.
<Devil's Advocate>
What is the difference between what the OP is asking and search engines?They take publishers' content and then display it how THEY deem fit and how THEY deem it to be relevant and advertise to all and sundry.
</devil's advocate & dont shoot me down, one rule for THEM and NOT for competitors?>
<Devil's Advocate>
So I can scrape your content then and use it for myself? No, I will not be linking back to you.
</Devil's Advocate>
So I can scrape your content then and use it for myself? No, I will not be linking back to you.
No where I said I will not be revealing the source of the content on my page. I better elaborate my scenario so that you guys could have a better idea.
I want to make a sort of "Web 2.0" app which will be showing product catalog of different sites on a single page based. The user will enter a term and I will be querying different sites and then will scrap the resultant in my own format which will be displayed on my site. The user will compare the result from different source and will "Rate" particular product catalog on my site. Something which was given here:
<link removed, see TOS #13>
Now what I want to see Ads relevent to search results+ will be recording user behaviour that what kind of stuff he searches most and will show Ads accordingly(Adsense+ my own Ad system). Obviously user will be revealed main source URL for further navigation after voting the result. I will also be recording things like "Top Search of the day" etc where I will be showing main site URLs.
Now in this scnerio, how Adsense's implementation would be illegal?
Thanks
[edited by: incrediBILL at 8:43 am (utc) on Aug. 13, 2009]
[edit reason] link removed, see TOS #13 [/edit]
The search engines use content with CONSENT (controlled by robots.txt)
There are many webmasters (copyright holders) who don't even know that something like robots.txt exists. No search engine asked them for their consent to their copyrighted content being stored in SE caches and displayed on SE servers/URLs, etc.
You need to ask for permission to use a copyrighted work, not assume it is implicitly granted by the non-existence of a robots.txt file.
(For the snippets, SE's actually rely on the Fair Use doctrine, and for the cache, questionably, on exceptions granted to libraries, archives, and similar entities.)
BTW, I'm not defending scrapers.
[edited by: incrediBILL at 5:11 pm (utc) on Aug. 13, 2009]
[edit reason] tos #4 [/edit]
there re several blogs which are showing RSS feeds data from various sources.
Yes that's true. There are people who go into banks and leave with money. Some do it legally and some not so legally. Just because some do it illegally doesn't mean it's OK for others to duplicate their practices.
RSS is public syndication.
"Public syndication" doesn't mean it's available to anyone to use anyway they choose.
RSS feeds usually have terms associated with them - and often those terms prohibit using the feed contents for commercial purposes. If someone is using RSS in violation of those terms that doesn't make it OK, it just means their time to get caught and shut down hasn't yet arrived.
FarmBoy
You need to ask for permission to use a copyrighted work, not assume it is implicitly granted by the non-existence of a robots.txt file.
Sadly, whether you think it's nonsense or not, the rules got reversed on the web.
I'd love to see a real case based on absence of robots.txt hit the courts and win, I truly would.
the rules got reversed on the web
On the web if you don't say no you are effectivly saying yes. But people only give when they think they are going to be able to take, or when they dont se content harvesting as theft.
In the case of search engines they generaly abbide by robots.txt , the ones that don't generaly end up in bot traps and getting barred from servers. The SE's that follow the "rules" are generaly accepted because they have the potential to send us traffic and basicaly make us money.
Content theft is very different. Screen scraping is just as bad as going to a website and doing a copy paste. The content is stolen and will almost certainly be copyright of the person who spent time writing it.
If you spend a lot of time writing content than discover someone has ripped you off and stolen your content, you arent going to hesitate sending a DMCA the offenders webhost, every major search engine and if it costs the site owner a penny you will probably get sued.
My suggestion...
The time you would spend learning about content scraping, finding tools, locating sources and putting all this together.
Just write some content :)
Mack.
Obviously user will be revealed main source URL for further navigation after voting the result.
Why after voting?
Moot point though, it's still illegal. I don't think you realize that when you scrape sites and paste their words on your page, search engines will penalize you both for duplicate content. It doesn't benefit the other site to have their content stolen, even if you are linking to them. Nobody likes it when others make a profit off of their work.
I've noticed you never mentioned asking the sites in question if you can scrape their content. Then again, I think you already know what they would say to that.
That's utter nonsense. Obviously.
There are many webmasters (copyright holders) who don't even know that something like robots.txt exists. No search engine asked them for their consent to their copyrighted content being stored in SE caches and displayed on SE servers/URLs, etc.
I think most of us generally dislike any kind of "have to opt out" permissions. I do. However, since google puts serious money in my pocket I'll let the courts decide what is and is not legal in society.
But one comment. You talk about webmasters who don't know about robots.txt. Too bad, so sad. Once, when most webmasters were amateurs you could get away with that excuse.
Now, webmastering is a profession. If you are one, and don't know the basics, then you get what you deserve. Sorry if that sounds harsh, but running a website now requires people to learn, or suffer the consequences. Incompetence and ignorance is no excuse for any occupation.
...and it's good the OP asked. That IS how we all learn, and hopefully become competent, and learn about ethical online behavior, yes?
[edited by: incrediBILL at 5:05 pm (utc) on Aug. 13, 2009]
[edit reason] formatting [/edit]
the rules got reversed on the web
In this respect, there's no difference between Internet and say TV, or the press. Copyright still works on the web (Fair Use is an exception, but that's not just on the web).
The fact that a TV channel broadcasts a movie and lets everyone (legally) record it using their home VCR does not mean that the copyright holder waives the copyright.
The same goes for publishing copyrighted content on the web. Uploading content to a website is not implicit waiver of copyright.
[edited by: incrediBILL at 5:12 pm (utc) on Aug. 13, 2009]
[edit reason] TOS #4 [/edit]
The same goes for publishing copyrighted content on the web. Uploading content to a website is not implicit waiver of copyright.
Go tell Google and their fancy lawyers all about it!
Like I said, they somehow managed to reverse it on the web, and they have smart layers.
Every cache page on every search engine is a clear copyright violation, especially when they cache and display your entire site.
So why aren't they being sued upside down?
So why aren't they being sued upside down?
They have been sued for this, and they won. If you want the answer to the question, I'd suggest you hunt down the court decision. If you object to the decision, then lobby legislators to enact better law. If it's not worth your effort, well, then maybe it's not that important to you. (it's not to me, because the practice lines my pockets)
Precedant exists, so why bother suing.
I want to make a sort of "Web 2.0" app which will be showing product catalog of different sites on a single page based. The user will enter a term and I will be querying different sites and then will scrap the resultant in my own format which will be displayed on my site. The user will compare the result from different source and will "Rate" particular product catalog on my site.
even if you forget all the legal and moral arguments, scrapping stuff over and over like that is just asking for trouble.
if your site gets busy and the people you're scrapping from notice that you keep coming back for more, they might block you out (which is a pretty trivial thing for them to do). then your entire site goes down the swanney, because with no data you'll have no site.
if you really want to make it a success then you need to ask permission beforehand so you don't end up with a dodo down the line.
Like I said, they somehow managed to reverse it on the web
Anyhow, there have been many flat out wrong decisions made by crazy low-court judges. What really matters in the US is what the Supreme Court says. Has this been judged by the Supreme Court?
Besides, it's still just the US. I'm not aware of any authoritative decision or law saying "rules got reversed on the web" here in Europe. The US is not the world.
Like I wrote, they may have "applied for" the protection that is granted (by law) to archives and libraries.
They aren't an archive nor a library, and even libraries and archives have authorized copies. Plus libraries and archives don't do it for profit, they do it for the public, which this mult-billion money making company can't say with a straight face.
Thankyou guys for helping me out to learn ins nd outs of Adsense usage.
[edited by: incrediBILL at 1:52 am (utc) on Aug. 14, 2009]
[edit reason] See tos #13 [webmasterworld.com...] [/edit]