AdSense and Screen-Scraping - OK? - Google AdSense - Display Ads forum at WebmasterWorld

Forum Moderators: martinibuster

Message Too Old, No Replies

AdSense and Screen-Scraping - OK?

kadnan

9:14 pm on Aug 12, 2009 (gmt 0)

I want to know what are legal issus for using Adsense on the content which is the resultant of "screen scrapping" via YQL(Yahoo! Query Language) and typical SCREEN SCRAPPING techniques. Can I show the processed data from different sites and show in my own layout and apply Adsense? is it Ok?

LifeinAsia

9:19 pm on Aug 12, 2009 (gmt 0)

So, you want to know if there is a legal way to monetize STOLEN content? Or am I misunderstanding something?

If you have permission to use the content (in which case, you should be able to get it without scraping), that's a different matter.

kadnan

9:29 pm on Aug 12, 2009 (gmt 0)

I don;t know what is your definition of "Stolen". How a site exposing data in form of RSS feeds is different than scrapping?

How about if I add my own features with scrapped data. For instance if some site is showing product catalog and I implement RATING SYSTEM on my site for that data,Will it still be illegal?

Since I dont have much idea therefore I want to learn more about it

incrediBILL

9:33 pm on Aug 12, 2009 (gmt 0)

So you want to know how to get a DMCA complaint filed against your site and removed from all search engines, is that what you're asking?

AdSense publishers may not display Google ads on webpages with content protected by copyright law unless they have the necessary legal rights to display that content. Please see our DMCA policy for more information.

How a site exposing data in form of RSS feeds is different than scrapping

Even RSS feeds by default are *NOT* allowed to be republished unless the author allows it.

They are intended for feed readers only, not other publishers, so check their policy before using them.

StoutFiles

9:36 pm on Aug 12, 2009 (gmt 0)

Can I show the processed data from different sites and show in my own layout and apply Adsense?

Of course you can!

is it Ok?

It's perfectly ok if you enjoy losing your Adsense account, acquiring DMCA complaints, and possibly getting sued by the victims. Go for it!

LifeinAsia

9:45 pm on Aug 12, 2009 (gmt 0)

what is your definition of "Stolen"

Using something without the express permission of the person you "scrapped" the data from.

How a site exposing data in form of RSS feeds is different than scrapping

If you look at the ToS for most RSS feeds, you are allowed to use the data for your own, personal use. Most explicitly disallow re-use or at least disallow commercial use of it.

Since I dont have much idea therefore I want to learn more about it

Well, it's good that you want to learn about it instead of just going out and doing something illegal (and getting banned for doing it).

Take some time to learn about copyrights, DMCA, and the penalties (besides just AdSense) of using copyrighted material without proper permission.

HuskyPup

12:35 am on Aug 13, 2009 (gmt 0)

<Devil's Advocate>

What is the difference between what the OP is asking and search engines?

They take publishers' content and then display it how THEY deem fit and how THEY deem it to be relevant and advertise to all and sundry.

</devil's advocate & dont shoot me down, one rule for THEM and NOT for competitors?>

I do appreciate that the OP asked about AdSense and they have THEIR rules attempting to negate ANY competition:-)

incrediBILL

12:44 am on Aug 13, 2009 (gmt 0)

What is the difference between what the OP is asking and search engines?

The search engines use content with CONSENT (controlled by robots.txt) and send traffic so we can make tons of money.

The OP wants to use content WITHOUT CONSENT by scraping the search engine in order to make his own tons of money.

Huge difference.

farmboy

1:14 am on Aug 13, 2009 (gmt 0)

What is the difference between what the OP is asking and search engines?

I wonder if the OP plans to link to the origin source of the content? Search engines do.

FarmBoy

incrediBILL

1:24 am on Aug 13, 2009 (gmt 0)

Doesn't matter, SE's have permission, he doesn't.

johnnie

2:25 am on Aug 13, 2009 (gmt 0)

What's up with these threads lately. First we have a guy talking about dubious Chinese traffic, now this.

[edited by: martinibuster at 3:12 am (utc) on Aug. 13, 2009]
[edit reason] [webmasterworld.com...] [/edit]

encyclo

2:51 am on Aug 13, 2009 (gmt 0)

The search engines use content with CONSENT (controlled by robots.txt)

Doesn't matter, SE's have permission, he doesn't.

The OP has not suggested he would violate robots.txt, and anyway the robots exclusion standard does not infer consent.

The issue is not necessarily with copyright infringement (if the OP respects fair use - admittedly a big "if"), but the requirement for "substantial, original content".

tangor

3:37 am on Aug 13, 2009 (gmt 0)

Back to OP: Scrapping is scrapping: ie: theft, even just an itsy bitsy little bit. Fair Use is something else altogether (visit a courtroom to find out). Long and short. As described this seems to scream for a DCMA--which will shut down the adsense and maybe the website, too. Then again, WW is not a legal forum and AT BEST all one can get is a lay OPINION fromn NON-LEGAL folks.

The standard disclaimer "We ain't no steekin' lawyers, even if you pays us!" applies.

Me, I say: "Scrape at your own peril. Might come back to bit you in the a$$ when you least expect it."

kadnan

4:38 am on Aug 13, 2009 (gmt 0)

is it not Irony that crawlers like Gogole's bot and other are allowed to poke into my site, grab the data and sell it to others and I am not? If I don't have robots.txt then does it I am allowing to give away the data? If that;s true then how me is not allowed to capture data which is not being protected by robot.txt?

there re several blogs which are showing RSS feeds data from various sources. I did nto get it. RSS is public syndication. If does not want to show feeds then why would he go to expose data via RSS?

Is there any list that what kind of data can I publish from other sources on my site? I see various sites(esp, Blogs) showing data from other feeds and earning it.

StoutFiles

4:45 am on Aug 13, 2009 (gmt 0)

<Devil's Advocate>
What is the difference between what the OP is asking and search engines?
They take publishers' content and then display it how THEY deem fit and how THEY deem it to be relevant and advertise to all and sundry.
</devil's advocate & dont shoot me down, one rule for THEM and NOT for competitors?>

<Devil's Advocate>

So I can scrape your content then and use it for myself? No, I will not be linking back to you.

</Devil's Advocate>

kadnan

6:21 am on Aug 13, 2009 (gmt 0)

IMO Crawling==Scrapping but what I think that:

if(_searchEngine)
legal=true
else
legal=false

[edited by: kadnan at 6:32 am (utc) on Aug. 13, 2009]

kadnan

6:31 am on Aug 13, 2009 (gmt 0)

So I can scrape your content then and use it for myself? No, I will not be linking back to you.

No where I said I will not be revealing the source of the content on my page. I better elaborate my scenario so that you guys could have a better idea.

I want to make a sort of "Web 2.0" app which will be showing product catalog of different sites on a single page based. The user will enter a term and I will be querying different sites and then will scrap the resultant in my own format which will be displayed on my site. The user will compare the result from different source and will "Rate" particular product catalog on my site. Something which was given here:

Now what I want to see Ads relevent to search results+ will be recording user behaviour that what kind of stuff he searches most and will show Ads accordingly(Adsense+ my own Ad system). Obviously user will be revealed main source URL for further navigation after voting the result. I will also be recording things like "Top Search of the day" etc where I will be showing main site URLs.

Now in this scnerio, how Adsense's implementation would be illegal?

Thanks

[edited by: incrediBILL at 8:43 am (utc) on Aug. 13, 2009]
[edit reason] link removed, see TOS #13 [/edit]

true_INFP

11:29 am on Aug 13, 2009 (gmt 0)

The search engines use content with CONSENT (controlled by robots.txt)

There are many webmasters (copyright holders) who don't even know that something like robots.txt exists. No search engine asked them for their consent to their copyrighted content being stored in SE caches and displayed on SE servers/URLs, etc.

You need to ask for permission to use a copyrighted work, not assume it is implicitly granted by the non-existence of a robots.txt file.

(For the snippets, SE's actually rely on the Fair Use doctrine, and for the cache, questionably, on exceptions granted to libraries, archives, and similar entities.)

BTW, I'm not defending scrapers.

[edited by: incrediBILL at 5:11 pm (utc) on Aug. 13, 2009]
[edit reason] tos #4 [/edit]

farmboy

12:30 pm on Aug 13, 2009 (gmt 0)

there re several blogs which are showing RSS feeds data from various sources.

Yes that's true. There are people who go into banks and leave with money. Some do it legally and some not so legally. Just because some do it illegally doesn't mean it's OK for others to duplicate their practices.

RSS is public syndication.

"Public syndication" doesn't mean it's available to anyone to use anyway they choose.

RSS feeds usually have terms associated with them - and often those terms prohibit using the feed contents for commercial purposes. If someone is using RSS in violation of those terms that doesn't make it OK, it just means their time to get caught and shut down hasn't yet arrived.

FarmBoy

incrediBILL

12:59 pm on Aug 13, 2009 (gmt 0)

You need to ask for permission to use a copyrighted work, not assume it is implicitly granted by the non-existence of a robots.txt file.

Sadly, whether you think it's nonsense or not, the rules got reversed on the web.

I'd love to see a real case based on absence of robots.txt hit the courts and win, I truly would.

mack

1:53 pm on Aug 13, 2009 (gmt 0)

incrediBILL has hit the hail right on the head.

the rules got reversed on the web

On the web if you don't say no you are effectivly saying yes. But people only give when they think they are going to be able to take, or when they dont se content harvesting as theft.

In the case of search engines they generaly abbide by robots.txt , the ones that don't generaly end up in bot traps and getting barred from servers. The SE's that follow the "rules" are generaly accepted because they have the potential to send us traffic and basicaly make us money.

Content theft is very different. Screen scraping is just as bad as going to a website and doing a copy paste. The content is stolen and will almost certainly be copyright of the person who spent time writing it.

If you spend a lot of time writing content than discover someone has ripped you off and stolen your content, you arent going to hesitate sending a DMCA the offenders webhost, every major search engine and if it costs the site owner a penny you will probably get sued.

My suggestion...

The time you would spend learning about content scraping, finding tools, locating sources and putting all this together.

Just write some content :)

Mack.

StoutFiles

3:12 pm on Aug 13, 2009 (gmt 0)

Obviously user will be revealed main source URL for further navigation after voting the result.

Why after voting?

Moot point though, it's still illegal. I don't think you realize that when you scrape sites and paste their words on your page, search engines will penalize you both for duplicate content. It doesn't benefit the other site to have their content stolen, even if you are linking to them. Nobody likes it when others make a profit off of their work.

I've noticed you never mentioned asking the sites in question if you can scrape their content. Then again, I think you already know what they would say to that.

coachm

3:13 pm on Aug 13, 2009 (gmt 0)

That's utter nonsense. Obviously.
There are many webmasters (copyright holders) who don't even know that something like robots.txt exists. No search engine asked them for their consent to their copyrighted content being stored in SE caches and displayed on SE servers/URLs, etc.

I think most of us generally dislike any kind of "have to opt out" permissions. I do. However, since google puts serious money in my pocket I'll let the courts decide what is and is not legal in society.

But one comment. You talk about webmasters who don't know about robots.txt. Too bad, so sad. Once, when most webmasters were amateurs you could get away with that excuse.

Now, webmastering is a profession. If you are one, and don't know the basics, then you get what you deserve. Sorry if that sounds harsh, but running a website now requires people to learn, or suffer the consequences. Incompetence and ignorance is no excuse for any occupation.

...and it's good the OP asked. That IS how we all learn, and hopefully become competent, and learn about ethical online behavior, yes?

[edited by: incrediBILL at 5:05 pm (utc) on Aug. 13, 2009]
[edit reason] formatting [/edit]

true_INFP

4:28 pm on Aug 13, 2009 (gmt 0)

the rules got reversed on the web

In this respect, there's no difference between Internet and say TV, or the press. Copyright still works on the web (Fair Use is an exception, but that's not just on the web).

The fact that a TV channel broadcasts a movie and lets everyone (legally) record it using their home VCR does not mean that the copyright holder waives the copyright.

The same goes for publishing copyrighted content on the web. Uploading content to a website is not implicit waiver of copyright.

[edited by: incrediBILL at 5:12 pm (utc) on Aug. 13, 2009]
[edit reason] TOS #4 [/edit]

incrediBILL

5:08 pm on Aug 13, 2009 (gmt 0)

The same goes for publishing copyrighted content on the web. Uploading content to a website is not implicit waiver of copyright.

Go tell Google and their fancy lawyers all about it!

Like I said, they somehow managed to reverse it on the web, and they have smart layers.

Every cache page on every search engine is a clear copyright violation, especially when they cache and display your entire site.

So why aren't they being sued upside down?

coachm

5:20 pm on Aug 13, 2009 (gmt 0)

So why aren't they being sued upside down?

They have been sued for this, and they won. If you want the answer to the question, I'd suggest you hunt down the court decision. If you object to the decision, then lobby legislators to enact better law. If it's not worth your effort, well, then maybe it's not that important to you. (it's not to me, because the practice lines my pockets)

Precedant exists, so why bother suing.

londrum

5:41 pm on Aug 13, 2009 (gmt 0)

I want to make a sort of "Web 2.0" app which will be showing product catalog of different sites on a single page based. The user will enter a term and I will be querying different sites and then will scrap the resultant in my own format which will be displayed on my site. The user will compare the result from different source and will "Rate" particular product catalog on my site.

even if you forget all the legal and moral arguments, scrapping stuff over and over like that is just asking for trouble.
if your site gets busy and the people you're scrapping from notice that you keep coming back for more, they might block you out (which is a pretty trivial thing for them to do). then your entire site goes down the swanney, because with no data you'll have no site.
if you really want to make it a success then you need to ask permission beforehand so you don't end up with a dodo down the line.

true_INFP

6:05 pm on Aug 13, 2009 (gmt 0)

Like I said, they somehow managed to reverse it on the web

Like I wrote, they may have "applied for" the protection that is granted (by law) to archives and libraries.

Anyhow, there have been many flat out wrong decisions made by crazy low-court judges. What really matters in the US is what the Supreme Court says. Has this been judged by the Supreme Court?

Besides, it's still just the US. I'm not aware of any authoritative decision or law saying "rules got reversed on the web" here in Europe. The US is not the world.

incrediBILL

8:04 pm on Aug 13, 2009 (gmt 0)

Like I wrote, they may have "applied for" the protection that is granted (by law) to archives and libraries.

They aren't an archive nor a library, and even libraries and archives have authorized copies. Plus libraries and archives don't do it for profit, they do it for the public, which this mult-billion money making company can't say with a straight face.

kadnan

9:17 pm on Aug 13, 2009 (gmt 0)

the sites in questions were B2b sites like <removed site names>

Thankyou guys for helping me out to learn ins nd outs of Adsense usage.

[edited by: incrediBILL at 1:52 am (utc) on Aug. 14, 2009]
[edit reason] See tos #13 [webmasterworld.com...] [/edit]

This 33 message thread spans 2 pages: 33