homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

This 352 message thread spans 12 pages: < < 352 ( 1 2 3 4 5 6 7 8 [9] 10 11 12 > >     
Why does Google AdSense sponsor "scraper" spam sites
zeus




msg:1367916
 11:17 pm on Mar 22, 2005 (gmt 0)

I remember when I signed up for google adsense I was a little nervous how professionel a site must be to be accepted, but I did not have any troubles.

I hope we agree on that site full of links/google search results is a pure spam site, if so, WHY does google adsense sponsor such sites, there are 10000 sites like that which a sponsored by adsense, dont they want good search results any more, be cause the more they support those site, more there will be in the serps.

I refuse to beleive that its just because of the money.

 

StupidScript




msg:1368156
 10:30 pm on Mar 29, 2005 (gmt 0)

Copywriting = Writing content.
Copyrights = Ownership of content.

The Contractor




msg:1368157
 10:40 pm on Mar 29, 2005 (gmt 0)

fischermx msg #:238 you are correct, but I didn't catch it in time and the system won't allow me to edit it ;)

aleksl




msg:1368158
 10:45 pm on Mar 29, 2005 (gmt 0)

TheContractor: You state that your sites disappeared from Google...

as well as to Atticus: You don't listen.
1) I am not a scraper, although I am seriously considering becoming one.
2) Can you defent YOUR argument instead of attacking a person? I told you, you already lost debate by doing this.
3) None of the reasons I gave you why OUR sites are out of Google$ came directly from Google$. This is what we THINK happened to them. Google$ just kicks you out and refuses to comment on why it did so.

to blend27, post#227:
Yes, We've advertised for our largest site recently. 4 other sites have AdSense on them.

---

So besides "Google brings me 1000th of visitors", and "you don't understand squad" I don't see anyone having a legitimate argument besides "trust" on the topic of "how Google is different from {some} scraper sites".

HughMungus




msg:1368159
 10:53 pm on Mar 29, 2005 (gmt 0)

Contractor, I posted an example of how Google could devalue someone's content just by showing snippets. Would you like to explain to me the difference between that and a scraper site displaying snippets that suppsedly devalue the original site's content?

StupidScript




msg:1368160
 10:54 pm on Mar 29, 2005 (gmt 0)

Yes, please. Remember, scraper sites have LESS value than Google does (feel free to argue against that, too), so pick a good baseline for your argument.

(Still waiting for your argument for how scraper sites do NOT devalue the target site's content.)

[edited by: StupidScript at 10:57 pm (utc) on Mar. 29, 2005]

aleksl




msg:1368161
 10:57 pm on Mar 29, 2005 (gmt 0)

Reply to StupidScript, post#230:

Very good analysis, indeed. Except the fact that it's been studied (and I am sure I can find you sources, maybe even Jacob Nielsen wrote it), that Search Engines actually change user's behaviour, incouraging more "hunting". So when regular user comes to your content page, and your content doesn't fit something she thought she was looking for, she is greatly encouraged to go back to SE and keep hunting, instead of searchging YOUR content site.

You also missed one point in your movie analogy. Movie company producing a trailer actually OWNS a trailer and the movie. Google doesn't own anything, it scrapes content just like every other scraper.

Also, given $50mln. as a startup capital, one can attempt to build a scraper site into a search engine of their own. Too high barrier to entry, but that doesn't mean it is undoable.

StupidScript




msg:1368162
 11:04 pm on Mar 29, 2005 (gmt 0)

aleksl: Yes .. search engines do indeed modify the standard "meatspace" behaviors AND modify online behaviors when compared to surfing instead of searching. This is yet another element in the "trust" cycle. I would be very interested to see any work done to analyse how visiting a scraper site affects behavior.

You also missed one point in your movie analogy. Movie company producing a trailer actually OWNS a trailer and the movie. Google doesn't own anything, it scrapes content just like every other scraper.

The movie company owns the trailer, and they encourage movie theaters and television stations (or pay them) to show the trailer (title plus brief description) of the movie in order to make people aware of it and get them to the theaters to experience it and reap its value.

I own my intellectual property and I encourage Google and Overture (or pay them) to spider my sites to show a link (title plus brief description) to my site in order to make people aware of it and get them to the site to experience it and reap its value.

Google = theater owner authorized to show my trailer
Scraper = punk with a video camera between his knees

[edited by: StupidScript at 11:10 pm (utc) on Mar. 29, 2005]

aleksl




msg:1368163
 11:07 pm on Mar 29, 2005 (gmt 0)

StupidScript, lets compare Apples to Apples.

There are many Search Engines besides Google, and most of them don't give you neither value nor visitors. I have a site that on a good day can receive move visitors from top 3 country-specific Search Engines than from Google - so what.

So let's not argue for all scraper sites either, nor bring pr0n into this dicussion. Lots of scrapers are bad, but some of them do bring visitors as well as add link popularity - which Google doesn't. So I am correcting myself:

How a Search Engine is different from a "good" scraper site?

<edit>I did accept your argument as difference number one - "trust". But that is too emotional to fly in court.</edit>

[edited by: aleksl at 11:11 pm (utc) on Mar. 29, 2005]

HughMungus




msg:1368164
 11:08 pm on Mar 29, 2005 (gmt 0)

EF Cultural Travel BV v. Explorica, Inc:

Our basis for this view is not, as some have urged, that there is a "presumption" of open access to Internet information. The CFAA, after all, is primarily a statute imposing limits on access and enhancing control by information providers. Instead, we think that the public website provider can easily spell out explicitly what is forbidden and, consonantly, that nothing justifies putting users at the mercy of a highly imprecise, litigation-spawning standard like "reasonable expectations." If EF wants to ban scrapers, let it say so on the webpage or a link clearly marked as containing restrictions.

[laws.lp.findlaw.com...]

The Kelly case DOES indicate that a representative portion of the work IS allowed as long as it doesn't devalue the original. That points back to my comment about how any search engine's snippets CAN devalue the content from which they are extracted...but none of the anti-scraper crowd has chosen to address it (just as they have ignored caching).

HughMungus




msg:1368165
 11:15 pm on Mar 29, 2005 (gmt 0)

And regarding trust: I thought we were talking about snippets being lifted, not whole websites. If a whole website is being copied by a scraper, the webmaster who owns the copyright can complain to Google and the scraper's host under DMCA. Pretty simple.

fischermx




msg:1368166
 11:16 pm on Mar 29, 2005 (gmt 0)

I have a crazy idea.

- Get a spare domain name and enough bandwith.
- Setup a website where you submit scrapper sites, then the scrapper site get fully copied, so a copy of all their great listings are this site. Not hard to do such script.
- Add some them 302's back to them, just in case.
- Assure a good PR for the anti-scrapper site by linking it from our content site.
- Wouldn't that bury them?

What do you think?

StupidScript




msg:1368167
 11:22 pm on Mar 29, 2005 (gmt 0)

Thank you, Mr. Mungus!

Our basis for this view is not, as some have urged, that there is a "presumption" of open access to Internet information.

As I urged in my initial comment ... about 40 pages ago.

The CFAA, after all, is primarily a statute imposing limits on access and enhancing control by information providers.

I.e. Who has the control? It's supposed to be the "information providers," not the "information scrapers."

we think that the public website provider can easily spell out explicitly what is forbidden

and corrollarily:

If EF wants to ban scrapers, let it say so on the webpage or a link clearly marked as containing restrictions.

Especially sweet. Okay, all you scrapers ... all we ask is that you honor our robots.txt files and other similarly explicit access conditions, and use user-agents that we can recognize and control, like legitimate bots do.

I say it now for the world to read: NO SCRAPERS ALLOWED ON MY SITE ... ALL SITES ... FOREVER.

That ought to do it! ;)

That points back to my comment about how any search engine's snippets CAN devalue the content from which they are extracted

"Devalue" means to diminish the perceived worth. If a visitor arrives at a website directly or via a "trustworthy" source, the perception of the worth of the copyrighted original content is DIFFERENT than the perception of the worth of that content when coming from a scraper site.

Are you saying this just is not true? That the behavior of people who come from a scraper site is exactly the same as from a Google or an Overture or an ESpotting? That there is no discernable distinction between their reactions to the junk they find after click on an algorithm result and landing on a scraper site and the original content they find on a "real" site? Are you honestly arguing that there is NO difference?

(just as they have ignored caching)

ibid. That's a specious argument. Do scrapers cache?

aleksl




msg:1368168
 11:23 pm on Mar 29, 2005 (gmt 0)

I apologize for highjacking this thread a bit, hopefully the argument we had was related to the topic.

aleksl




msg:1368169
 11:32 pm on Mar 29, 2005 (gmt 0)

If a visitor arrives at a website directly or via a "trustworthy" source...

Person A walked into your store from the main street, coming from center city, passing Town Hall and all center attractions. Person B came from a side street that goes directly into a bad neighborhood.

* Which one of the two is likely to buy?
* Does a brick-and-mortar store owner care which street her visitors came from?

I don't buy it as legitimate argument.
Besides, internet is a world-wide phenomenon, there are many-many countries besides US where buyers can come from - and fortunately you can't lock it up, how some people would like to see it.

Atticus




msg:1368170
 11:38 pm on Mar 29, 2005 (gmt 0)

Part of the criteria for determining if copyright infringement has taken place is the possible negative effect of the copy on the commercial value of the original work.

Only the owner of the original work is in a position to determine if use of his content elsewhere either helps or hurts his bottom line.

So, HughMungus, it doesn't matter if a third party thinks that G is hurting the telephone directory you cite in your example. It is up to the owner of that site to complain to G, block G, or to take the matter to federal court. You are in no position to sue G because you contend that they are hurting the commercial interests of a telephone directory you do not own.

aleksl,

I don't recall posting that you owned or operated scraper sites; I do apolgize if you feel that I have made such an inference and if you feel that such an association is damaging to your reputation. I do admit that I sometimes become confused between those who admit they are breaking the law and those who merely advocate for, and consider joining in, the breaking of the law.

Is it possible to define as an ad hominem attack, negative opinions expressed about thieves, thievery and comments from those who think that theft is a fine and glorious thing?

After all, "Stop! Thief!" pretty much justifies itself, doesn't it? We've had one scraper publisher admit in this thread that he scrapes G (in violation of its TOS) and who extracts other publishers' content, and links it in a way that misleads the reader into clicking on paying ads on his own site.

These are clearly examples of copyright infringement. You can not defend them.

StupidScript




msg:1368171
 11:45 pm on Mar 29, 2005 (gmt 0)

All us "legitimate" site builders would like is the ability to control the harvesting of our content by people who do not operate genuine "search engines".

Do any of you scrapers have a problem with using a distinguishable user-agent so I can block you from scraping my sites?

I wrote the content. I gave G and O and whomever permission to market my site as part of their search engines.

If I decide I no longer want these business relationships, I can simply modify my control mechanisms to inform them of that decision, and they will honor it by (a) no longer spidering my sites, (b) no longer including my content in their services and, if I do the code, (c) removing me entirely from their indexes and search results.

As long as scrapers defy my access restrictions and give me no mechanism for indicating my preferences with regard to them taking my content, they are outside of the business relationship, and are not acting under the owner of the intellectual property's authority. They are stealing what I would gladly give them if they could demonstrate their value to my business, as Google and the other "real" engines have done for many years. In the absence of a demonstration of that value, I do not approve of their appropriation, and in fact object to it and demand such stealing of my content without my authorization be stopped.

The problem is, scrapers don't honor things, they defile them. They take my beautiful picture of my wife and put it on their glory-hole. I can hear them laughing all the way to the bank. It's outrageous.

(Apples to apples would be Google to Overture or one scraper site to another scraper site, not Google to scraper sites.)

Atticus




msg:1368172
 11:53 pm on Mar 29, 2005 (gmt 0)

StupidScript,

I like the way you process information and the clarity with which you express your views. I, for one, will take care to read your comments in this and other threads.

aleksl




msg:1368173
 11:55 pm on Mar 29, 2005 (gmt 0)

I guess I need to rest my case here...I don't seem to get through, or folks are having difficulties to make a step from part to whole.

* Googlebot comes to your site.
* Googlebot scrapes ALL content and ALL images
* Google places a cached version of your site on their server (A COPY OF EACH PAGE OF YOUR SITE). A user can spend all day browsing your site in Google and never actually hitting your site once.
* Google places snippets of pictures from your site into Google Images. It then FRAMES your site to encourage user to stay in Google

Replace Googlebot with "scrapebot" and Google with "scraper site" in this definition, and then I want to see you argue that Google is not a scraper site. PER "SCRAPER SITE" DEFINITION.

Let's define terms here
Search Engine: a website that collects snippets of content from other websites and displays it in search results. FOR PROFIT (AdWords).
Scraper Site: a website that collects snippets of content from other websites and displays it for a given keyword. For Profit (AdSense).

Err?

<edit>Before Search Engines have clearly separated Ads from free SERPs, they were a constant target for lawsuits</edit>

Also, and I hope I clarified this before, I do not defend scraper scumbags. My argument was originally that "Why does Google AdSense sponsor 'scraper' sites: because Google is essentially a scraper site".

StupidScript




msg:1368174
 12:02 am on Mar 30, 2005 (gmt 0)

Thanks, Atticus ... likewise.

aleksl: You naughty imp. :)

What this thread has brought out is that there is a great difference of opinion about the differences (if any) between G and "those sites".

If I may, I think this thread has also pointed up that G should take a good look at their policies concerning how it interacts with site operators, and how "those sites" should be integrated with the search results.

After all, if they can penalize someone for cross-linking between two of their own sites, surely they can be more aggressive and transparent on this issue.

Atticus




msg:1368175
 12:10 am on Mar 30, 2005 (gmt 0)

aleksl,

If you don't want G to list or cache your site, use the proper meta tags and robots.txt and they won't.

But you don't refer to sites you own, you use the second person when you make repeated reference to what G does to "your site."

Well, as I said before, it is not up to you to defend my site from G. You have no justification for advocating for poor, pitiful me against the all powerful G.

That wiley ole' G, sometimes they rip me off so bad that I make five figures a month. Those bastards! And I'm just too damned dumb to do anything about it...

[edited by: Atticus at 12:13 am (utc) on Mar. 30, 2005]

The Contractor




msg:1368176
 12:11 am on Mar 30, 2005 (gmt 0)

Ok, let me try to answer this.

HughMungus wrote:
Contractor, I posted an example of how Google could devalue someone's content just by showing snippets. Would you like to explain to me the difference between that and a scraper site displaying snippets that suppsedly devalue the original site's content?

Because I want them to
Because I invite them to
Because I can stop them very easily if I wish and they will drop all pages from my site
Because they send the bulk of traffic to internet websites
Because through their traffic I make $1000's of dollars per month
Because I can control where they go and what they take from a site
Because they provide tools for removal of one/many pages of my site
Because they provide contact info and you don't have to look for it in whois data

Enough reason? How many scrapers provide the above? Quit making fools of yourselves and comparing yourself with Google. If you are a scraper and making money from it – go get it. I do not report sites, but won't hesitate to file DMCA's, contact hosting company, or the NOC/Datacenter for copyright infringement if I cannot contact the offending site or they do not respond within 24 hours.

Again, you guys can argue law all you want. I can have any site who takes content of mine (whether it be a paragraph or page), provides no contact info on the site, or refuses/does not respond to a removal request banished from their hosting and every "real" search engine out there. You do not fall under fair use or provide removal specs, respond to DMCA's as the SE's among probably 100 other items, so quit putting yourself in their league… it's embarrassing. Why do you think Google responds almost immediatly when asked to remove content - because they have to under the law.

Contact your lawyer that specializes in copyright law and DMCA if you doubt me.

Again, face the facts you are no more like Google than I would be if I copied the top 10 posts from each forum on WebmasterWorld into http:/www.search-engine-seo-web-design-help-information-forums.com
Yep, that would make my site just like WebmasterWorld.(please respond to this if you believe it would be OK to do this in the example I gave with WebmasterWorld.)

Aleksl wrote:
as well as to Atticus: You don't listen.
1) I am not a scraper, although I am seriously considering becoming one.
2) Can you defent YOUR argument instead of attacking a person? I told you, you already lost debate by doing this.
3) None of the reasons I gave you why OUR sites are out of Google$ came directly from Google$. This is what we THINK happened to them. Google$ just kicks you out and refuses to comment on why it did so.

I think it may be more the way you explain things. Google never has and never will give you a reason for your sites being penalized (if they are). Why would they tell someone how they can circumvent the system? Wouldn't be too smart would it? I am not attacking, telling it like I see it.

[edited by: The_Contractor at 12:18 am (utc) on Mar. 30, 2005]

aleksl




msg:1368177
 12:12 am on Mar 30, 2005 (gmt 0)

:) StupidScript - no war intended, I did however streched my own imagination - and hopefully yours as well - a bit to the left in here.

Here's actually a difference #2, and it came from these lengthy legal posts that I didn't bother to read through.

It is a "Fair Use of Content". A Search Engine that brings traffic actually increases value of your business (site). Especially if these are 1000th of users per day. A scraper site is unlikely to do so.

What I was trying to point out though, is that "X many users from search engine Y" is really an unpredictable measure of site value, since in Google's case one's site can be excluded for an unexplained reason at any given moment - just like it happened to our sites.

aleksl




msg:1368178
 12:28 am on Mar 30, 2005 (gmt 0)

...deleted my own blah...

HughMungus




msg:1368179
 2:06 am on Mar 30, 2005 (gmt 0)

SS, my comments regarding "devaluing" content by using snippets (in scrapers and scrapeengines, er, search engines) is that a search engine ALSO devalues the content on a website sometimes so, therefore, you can't use the argument that "a search engine is different from a scraper because it doesn't devalue my content".

HughMungus




msg:1368180
 2:08 am on Mar 30, 2005 (gmt 0)

"Devalue" means to diminish the perceived worth. If a visitor arrives at a website directly or via a "trustworthy" source, the perception of the worth of the copyrighted original content is DIFFERENT than the perception of the worth of that content when coming from a scraper site.

Are you saying this just is not true? That the behavior of people who come from a scraper site is exactly the same as from a Google or an Overture or an ESpotting? That there is no discernable distinction between their reactions to the junk they find after click on an algorithm result and landing on a scraper site and the original content they find on a "real" site? Are you honestly arguing that there is NO difference?

To most surfers who have no idea what a scraper site is, no, there is no difference. And, again, I'm talking about when just parts of the site are lifted, not when the whole website is copied (which I can see would devalue the original website's perceived value).

HughMungus




msg:1368181
 2:14 am on Mar 30, 2005 (gmt 0)

So, HughMungus, it doesn't matter if a third party thinks that G is hurting the telephone directory you cite in your example. It is up to the owner of that site to complain to G, block G, or to take the matter to federal court. You are in no position to sue G because you contend that they are hurting the commercial interests of a telephone directory you do not own.

My point (again) is that you can't use the "search engine good, scraper bad" argument because search engines showing snippets (as scraper sites do) is not always good for some types of sites and that it's hypocritical to complain about scrapers without complaining about search engines. In other words, if you put the information out there, you have to expect that others are going to use it in ways that you don't want (and I don't mean wholesale copying of sites; I mean snippets). If "fair use" applies to search engines, it also applies to scraper sites.

Atticus




msg:1368182
 2:17 am on Mar 30, 2005 (gmt 0)

If a junkie accompanied you on job interviews against your wishes just so he could say nice things about you, would you encourage it?

--Brought to you by The Grossly Exaggerated Analogies Inspired by Arguing About the Difference Between Black and White With a Blind Man Department.

yosemite




msg:1368183
 2:18 am on Mar 30, 2005 (gmt 0)

Scrapers do devalue my site's content when they scrape a long paragraph or more (my entire review for a particular book or movie, for instance). What incentive does the site visitor have to visit my site when they've just read the entire review on the scraper site?

Like I said way back many pages ago, it wouldn't be so bad if the scraper just scraped a sentence or two, like Google does. But they scrape more than that—sometimes taking enough to take the entire gist of what my site was trying to say, thus making it pointless for the visitor to click on my site.

Also, Google's caching sites is in no way related to scraping. Who here prefers to visit the cached site (with broken links, graphics that won't load) over the "real" one? The caching is only there just in case the original page is missing or the site is down. I can conceive of no "normal" web surfer who spends all of his time wandering around a cached Google version of a site when the proper one is up and running.

StupidScript




msg:1368184
 2:21 am on Mar 30, 2005 (gmt 0)

Mr. Mungus, my initial reaction was, "Fair enough."

My subsequent response is:

My argument is/was that genuine search ... er .. scrape-engines* do not violate copyright's "fair use" doctrine, at the very least, not based on whether they devalue the copyrighted original content on a website that they cache and/or keep a title and description of in their public index. And that scraper sites do ... see above.

*To which a website operator/owner subscribes or has a reasonable amount of control over the data gathered and posted by the engine's 'bot ... and can be made aware of.

I just want an honest buck for an honest day's work, and not to be undercut by folks who manipulate the system without creativity or elegance. That's all.

Onward I go to the next eye-poking match! :)

(LOL, Atticus! Brilliant.)

[edited by: StupidScript at 2:24 am (utc) on Mar. 30, 2005]

HughMungus




msg:1368185
 2:24 am on Mar 30, 2005 (gmt 0)

Enough reason?

I didn't ask you why you think it's bad. I asked you what's the difference between Google's snippets devaluing a site and a scraper's snippets devaluing a site.

HughMungus




msg:1368186
 2:30 am on Mar 30, 2005 (gmt 0)

My argument is/was that genuine search ... er .. scrape-engines* do not violate copyright's "fair use" doctrine, at the very least, not based on whether they devalue the copyrighted original content on a website that they cache and/or keep a title and description of in their public index. And that scraper sites do ... see above.

And I gave you a great example of how they do (the phone number lookup). Too bad you don't see it because it kicks the legs out from under your fair use argument.

And again: I'm not a scraper.

This 352 message thread spans 12 pages: < < 352 ( 1 2 3 4 5 6 7 8 [9] 10 11 12 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved