homepage Welcome to WebmasterWorld Guest from 54.163.89.8
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

This 223 message thread spans 8 pages: < < 223 ( 1 2 3 4 5 [6] 7 8 > >     
What is a scraper site?
sunzfan

10+ Year Member



 
Msg#: 7116 posted 4:11 pm on Jun 2, 2005 (gmt 0)

Okay - people keep referring to scraper sites and I'm not sure exactly what that is - could someone quickly give me a definition?

It's different than spam pages?

 

deano6410

5+ Year Member



 
Msg#: 7116 posted 11:58 am on Jun 7, 2005 (gmt 0)

hyperkik,

you clearly dont know what you are talking about, they scrape the top 10 sites links from google/yahoo etc.... so by saying they are not relevent is like saying the top 10 for any keyword phrase on the planet is NOT relevent.

You are 100% wrong.

However, scraper sites are immoral, cheap, ugly and a pain... but they are relevent.

spaceylacie

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 7116 posted 12:55 pm on Jun 7, 2005 (gmt 0)

Okay, so I do a search, now I'm looking at the top 10 for that keyword, how would it help me to click on their site, scroll past their ads, then view the same results? That's relevant?

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 7116 posted 1:05 pm on Jun 7, 2005 (gmt 0)

There are two distinct questions. The question of whether they are useful or not is something that everybody seems largely to agree on.

now I'm looking at the top 10 for that keyword, how would it help me to click on their site...

It wouldn't help you but here's the crunch: You shouldn't be getting their site in the top ten results. That's something Google is going to have to work on. What's pathetic is that they've had lots of time to clean their SERPs of the scraper rubbish but seem to have chosen not to do so.

Why? The obvious answer is "because Google makes money from ads". While that is the typical Google basher answer the truth lies deeper. You think they like scrapers in the top ten? No, they don't. But they can't justify removing a site that provides exactly what they do i.e. snippets of other sites.

Webmasters can do whatever they want with their sites, like you and i do whatever we want with our sites. If we choose to make scrapers of them that's entirely upto us. If Google then crawls that site and figures it's a top ten... blame that on the poor algo. Google also gets tripped up by cloaking, redirections, inflated links, keyword stuffing. Does that make it immoral, stupid, deceptive or unkind of you to use those tricks? No. They are all legal. What about having a desc or keyword meta tag that's too long? It's your site dammit. Do what you want with it. And it's upto Google to work out the good results for its visitors and, more importantly, to detect the cr*p because people will always try to get cr*p in.

Atticus



 
Msg#: 7116 posted 1:36 pm on Jun 7, 2005 (gmt 0)

The toxic mold scraper site that links to my educational resources site with a porn description is so on topic I can hardly stand it.

hyperkik

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 3:51 pm on Jun 7, 2005 (gmt 0)

you clearly dont know what you are talking about, they scrape the top 10 sites links from google/yahoo etc.... so by saying they are not relevent is like saying the top 10 for any keyword phrase on the planet is NOT relevent.

I didn't say that, did I? So you're spouting even more nonsense than a typical purveyor of scrapers. In any event, even pretending that all scrapers operate in the way you describe (which, simply, is untrue), regurgitating somebody else's SERPs isn't helpful to searchers - if they want Yahoo's results or Google's results, they can go to Yahoo or Google and get them. N'est-ce pas?

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 10:30 pm on Jun 7, 2005 (gmt 0)

Nope. My site has ads, but it doesn't exist for the ads.

If I inherited ten million bucks tomorrow and decided to maintain the site purely as a labor of love, I could pull the ads and the site would still have a reason to exist. That can't be said of scraper sites.

Right, so, we're back to the age-old and well-worn concept of INTENT regarding this issue which cannot be judged. Therefore, saying that Adsense may not go on pages published for the purpose of showing ads is meaningless.

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 10:32 pm on Jun 7, 2005 (gmt 0)

HughMungus, you can see in the Internet Archive how the mentioned travel hub/webguide was operating for some years on the Internet without any AdSense on it. :-)

Yes, but, how do you know that he wasn't publishing all the information with the intent of someday monetizing it? Again, intent is impossible to determine in these cases (except after the fact).

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 10:34 pm on Jun 7, 2005 (gmt 0)

OK, try to think of it this way: If you were to take the ads off a site (whether it's a content site, directory, or search engine), what would be left? Would the site have a reason to exist? Would it have intrinsic value for the user? And if it didn't consist of original content, would it provide what GoogleGuy referred to as the "value add" in a discussion of affiliate sites?

If it had outbound links organized by topic, sure.

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 10:36 pm on Jun 7, 2005 (gmt 0)

But they can't justify removing a site that provides exactly what they do i.e. snippets of other sites.

True.

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 10:40 pm on Jun 7, 2005 (gmt 0)

regurgitating somebody else's SERPs isn't helpful to searchers

Round and round.

Is creating a few pages of "content" a day that is basically the same information repackaged by you helpful to searchers? Is a new website with the same products as some other websites using the same datafeed as some other website helpful to surfers?

hyperkik

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 10:47 pm on Jun 7, 2005 (gmt 0)

We're only going around in circles, in the sense that defenders of scrapers are chasing their own tails. I don't think anybody is being fooled by all the hand-waving.

europeforvisitors



 
Msg#: 7116 posted 12:04 am on Jun 8, 2005 (gmt 0)

HughMungus wrote:

Right, so, we're back to the age-old and well-worn concept of INTENT regarding this issue which cannot be judged. Therefore, saying that Adsense may not go on pages published for the purpose of showing ads is meaningless.

Well, you're the guy who judged my intent by suggesting that my site existed "solely for the purpose of displaying ads." Your current assertion that intent can't be judged seems a bit disingenuous after your earlier post.

In any case, you're wrong when you say that the issue can't be judged, because it most definitely can be judged by Google. If Google decides that a site violates the TOS for any reason (invalid clicks, "made for AdSense," the owner's posting disparaging remarks about Google, or whatever), then the Webmaster is left with no account and no recourse except to gnash his teeth and wail on this forum.

badtigger

5+ Year Member



 
Msg#: 7116 posted 12:32 am on Jun 8, 2005 (gmt 0)

I'm still laughing at how many people actually call em "scrapper sites"

sailorjwd

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 1:05 am on Jun 8, 2005 (gmt 0)

badtigger,

I wish I had an amusement threshold as low as yours. I wouldn't be in such a bad mood all the time since the update :)

europeforvisitors



 
Msg#: 7116 posted 1:18 am on Jun 8, 2005 (gmt 0)

I'm still laughing at how many people actually call em "scrapper sites"

Well, they do shred and recycle search results, and their defenders appear to be a scrappy lot. :-)

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 2:22 am on Jun 8, 2005 (gmt 0)

Well, you're the guy who judged my intent by suggesting that my site existed "solely for the purpose of displaying ads." Your current assertion that intent can't be judged seems a bit disingenuous after your earlier post.

Prove that your website doesn't exist only to serve ads. See my point?

In any case, you're wrong when you say that the issue can't be judged, because it most definitely can be judged by Google. If Google decides that a site violates the TOS for any reason (invalid clicks, "made for AdSense," the owner's posting disparaging remarks about Google, or whatever), then the Webmaster is left with no account and no recourse except to gnash his teeth and wail on this forum.

Right, Google can do whatever it wants. But you can't knock scraper sites by saying that they violate that specific wording in the TOS becuase that specific wording relies on judging someone's intent. I can't judge yours and Google can't judge theirs so it's dumb to even use the "pages published specifically to show ads" argument. But Google is giving tacit approval to scraper sites.

HughMungus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 2:23 am on Jun 8, 2005 (gmt 0)

defenders of scrapers are chasing their own tails

Yeah, that's why the people who think scraper sites are OK are starting these threads.

totter

10+ Year Member



 
Msg#: 7116 posted 6:25 am on Jun 8, 2005 (gmt 0)

Through all 6 million messages in this thread no one has been able to define a scraper site without using words and phrases like "Intent" or stuff like "I know one when I see one". I bet someone could replace the words scraper with porn and the entire thread would make perfect sense.

I would think that if you use words like "intent" or something like that it would require some sort of humnan judge to decide what is and isn't a scraper and as we all know that isn't googles way.

Since scaper sites do keep the ad prices for content sites cheap and attractive to potential and new advertisers, I don't think google will be in any rush to fix this problem any time soon, but if they did decide to fix the problem of scraper sites my guess they will do so in a couple of ways:

1. hand judge all sites -- very unlikely or
2. (feel free to argue this) Require that all adsense sites use a linking structure that a spider can follow and then use an algorithm to determine whether the outgoing links on any given page match what the page is about. If you have a certain number of links on a page that don't go with what the page is about than you can then be labled a scraper site.

My reasoning - If someone hand picks their outgoing links I would think the links on that page usually would follow a certain theme. If the links aren't hand picked then the words of the links will follow a certain theme, but the links may or may not follow a certain theme.

badtigger

5+ Year Member



 
Msg#: 7116 posted 7:26 am on Jun 8, 2005 (gmt 0)

totter: those are good ideas, and much better than what we have now. My question is with suggestion #2:
why aren't they doing this already?

The answer can only be (as you implicitly suggested) that scraper sites actually control inflation of advertiser's costs to a substantial degree.

So then, economically, they are good for the Adsense business model; ethically, they are bad for the consumer. (Users annoyed with all the dupe, spammy or robotic content might/will migrate to Yahoo, or another SE which doesn't have this problem.)

mzanzig

10+ Year Member



 
Msg#: 7116 posted 8:26 am on Jun 8, 2005 (gmt 0)

Apparently, the definition of scraper site has two aspects, that are constantly mixed and blurred. Maybe intentionally, maybe unintentionally.

#1 - Technical Aspects

Technically, a scraper site is indeed somewhat similar to a SE. They grab other peoples content (or snippets thereof) and display them in a way that the content appears to be relevant to the visitor (be it a SE or a human visitor). They place a bunch of ads around the 'content' and earn money. Maybe the scraper does not care about robots.txt, but then again, in many cases the scraper does not even visit the original source of the content. Sometimes the scraper eats up bandwidth just like a real SE.

#2 - Moral Aspects

Here it's getting darker, because the scrapermasters argue that if the system (i.e. Google) does nothing to stop them (and surely G could if they wanted to), their behaviour must be okay or even welcome. That's why they usually point out that G should take care of the problem (if there is any), not the advertisers, and not the competing publishers. But this is where most white hat webmasters - me included - start to bark. Loudly. And the reasons are clear and obviuous:

a) Scrapers are working off content that was put up to a website without any intention to add value, e.g. to make the surfers quest for information easier. (Big difference with true SEs - they were built with the intent to offer a valuable service: making life easier for users!) - Yes, they both put ads on their page to earn money, but it's still a huge difference whether you want to provide something useful to users or not. Just imagine a magazine with nothing but blind copy and ads. Compare this to the yellow pages.

b) Scrapers did not ask the owners of the content for permission to display the snippets, which makes them basically thieves. (Again, big difference with true SEs. We usually can lock them out by using robots.txt!)

Oh yes, I can hear the scrapermaster chorus already - "but if Google allows all this it must be okay!". No. It's not. If you steal from a shop, it's illegal in most countries, EVEN IF you get away with it. It does not make it any better, if you say with a smiling face "well nobody stopped me on my way out". It's still theft and it's still illegal.

And this makes any discussions with scrapermasters as useless as the sites they run. They believe that it's okay because they get away with it. We whitehats, however, see the moral aspects behind it, and we see what efforts go into our sites to create unique useful content.

No doubt this will be an infinite discussion.

-- M.

bbcarter

5+ Year Member



 
Msg#: 7116 posted 9:28 am on Jun 8, 2005 (gmt 0)

No doubt this will be an infinite discussion.

And a hugely profitable one, too :-/

birdstuff

10+ Year Member



 
Msg#: 7116 posted 12:05 pm on Jun 8, 2005 (gmt 0)

mzanzig:

You articulated in one post what I was apparently unable to get across in this entire thread. Search engines and scraper sites both get their data the same way: they "scrape" content from other sites. That's the technical side of this argument, and it's undeniable.

The moral side is indeed where white-hat webmasters have problems with what I call "Made for Adsense scrapers", and rightly so. Search engines provide a valuable service to the web community at large and webmasters in general. I love Google, Yahoo & MSN for a selfish reason - they help visitors find my sites. But even if my sites didn't exist they still serve the web community at large.

"Made for AdSense" scrapers on the other hand provide no valuable service to the web community at large whatsoever. The only people who benefit from them are the webmasters who build them.

It's beyond me why some people have trouble separating the technical reality of scrapers from the moral reality of scrapers. From a technical standpoint the search engines are scrapers, and acknowledging that fact in no way implies that "Made for AdSense" scrapers are legitimate - they aren't. Why is that so difficult for some to understand?

Here's an analogy: Search engines use robots to crawl web pages and extract snippets and URLs for inclusion in their databases. Most webmasters consider that to be a good thing.

Spammers use robots to crawl web pages and extract email addresses so they can send out spam emails by the millions. Most webmasters consider that to be a bad thing.

Technically, search engine robots and spam harvesting robots do more or less the exact same thing: they extract and store information from web pages. Acknowledging that fact doesn't in any way imply the endorsement or legitimacy of email harvesting robots.

In short, it's possible for a clear thinking person to separate the technical aspects of scraper sites from the moral aspects.

hyperkik

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 12:39 pm on Jun 8, 2005 (gmt 0)

Search engines and scraper sites both get their data the same way: they "scrape" content from other sites. That's the technical side of this argument, and it's undeniable.

There's that lame, dishonest parallel between search engines and scrapers again. Perhaps you think if you repeat the same tired nonsense over and over again, or express abject nonsense with a sense of conviction, somebody will be fooled into believing it?

First, a typical scraper operates in a very different manner than a search engine. That is, the scraper typically produces static pages which are served to users, whereas a search engine produces pages from a database in response to a specific user inquiry.

Beyond that, the question is of whether the use by the search engine or scraper is "fair use". See, e.g., Kelly v Arriba, 366 F.3d 811, 822 (9th Cir. 2002) (Discussing whether, under "Fair Use doctrine" a photograph search engine may present thumbnails of images owned by others). Compare EF Cultural Travel BV v. Explorica, Inc., 274 F.3d 577 (1st Cir., 2001) (Suggesting that a "scraper" designed to collect specific pricing information from a target website for the purpose of creating a competing price structure was unlawful).

A scraper in the sense under discussion here has a very weak argument for fair use. Under the four elements of fair use, as discussed in Kelly, those factors which weigh in favor of the search engine weigh against the scraper. All four factors must be applied to any infringing use claiming to be "fair use". Here's a preliminary analysis, limited in scope by my available time:

1. Purpose and character of the use.

In Kelly, it was noted that the use of the copyrighted material was incidental, and thus weighed only slightly against fair use. ("Arriba was neither using Kelly's images to directly promote its web site nor trying to profit by selling Kelly's images. Instead, Kelly's images were among thousands of images in Arriba's search engine database.")

The same does not hold true of scraper sites, which use the excerpts gleaned from other sites in order to promote their sites in bona fide search engines. While scrapers don't seek to then profit through the sale of the copyrighted material, they do seek to profit indirectly through their use by diverting the Internet user to ad ad or affiliate link instead of to the copyright holder.

The question posed in Arriba of whether or not the infringing use is transformative depends upon the scraper site, the manner in which copyrighted work is reproduced, and the amount reproduced. However, as the scraper seeks to supersede the copyright owner's use by diverting traffic to the scraper site, with the result "that people could use both types of transmissions for the same purpose", and given that the scraper is most certainly not about "improving access to information on the internet" by leading surfers to the original content, the scraper's case for transformative use is also very weak.

2. Nature of the copyrighted work.

The fact that the copyrighted material at issue is already published on the Internet will weigh "slightly in favor" of a fair use argument. (Materials not yet published are given a bit more protection under "fair use doctrine", as publication of excerpts can substantially effect their future market value.)

3. Amount and substantiality of portion used.

The implications of this factor will vary depending upon the nature of the original work, and the purpose of the reproduction. In Arriba, a thumbnail of the entire original work was deemed proper, because a search engine of pictures has little value if its users cannot identify linked content from the thumbnails.

The amount of material varies depending upon the scraper site, but the harder test for scrapers to pass is substantiality. The fact that scrapers attempt to glean out those portions of a page which are of the greatest value, whether in terms of attracting Internet users or generating advertising revenue, as opposed to those passages most conducive to directing Internet traffic to the copyright holder's site, would weigh against them. Consider, e.g., Harper & Row, Publishers, Inc. v. Nation Enterprises 471 U.S. 539 (1985) (Holding that the publisher's use of the most valuable portions of a work weighed against its claim of "fair use".)

4. Effect of the use upon the potential market for or value of the copyrighted work.

As the Kelly decision explains, this "factor requires courts to consider 'not only the extent of market harm caused by the particular actions of the alleged infringer, but also 'whether unrestricted and wide-spread conduct of the sort engaged in by the defendant . . . would result in a substantially adverse impact on the potential market for the original.''"

The Kelly court found that this factor weighed in favor of fair use, because the images search engine would ultimately guide Internet users to the original work, and the infringing use would not substitute for the original. It also noted that the search engine was not in financial competition with the copyright owner, for example, by selling licenses to the original work.

This factor seems to weigh heavily against scraper sites. The scraper seeks to divert traffic from the original copyright holder to the scraper's own site. Widespread use of scraper sites will significantly impair the market for the original in a variety of ways, including making it more difficult for potential users to find the original, and possibly by triggering "duplicate content" penalties in search engines. The scraper is often in competition with the copyright holder, and seeks to divert Internet users to its own advertisers instead of any products or advertisements which might be offered by the copyright holder. Scrapers do not wish their users to find the original copyright holder, and many design their content, omit key information, and set up link structures which make it difficult for surfers to actually get to the original material.


birdstuff

10+ Year Member



 
Msg#: 7116 posted 12:48 pm on Jun 8, 2005 (gmt 0)

Facts aren't lame, they're facts. Your inability to understand reality is beyond me. What part of

Search engines and scraper sites both get their data the same way: they "scrape" content from other sites. That's the technical side of this argument, and it's undeniable.

is so difficult to understand, and exactly which part of it do you disagree with? Your lengthy quote is accurate, but completly irrelevant to my quote above.

birdstuff

10+ Year Member



 
Msg#: 7116 posted 1:00 pm on Jun 8, 2005 (gmt 0)

It's interesting that you have spent this entire thread nitpicking my posts when it's clear that we agree on the main issue: scraper sites are bad. They are morally illegitimate and they are of no benefit to anyone except the webmasters who make them.

Why you choose to be so combative over a minor point is beyond me, especially a point that is simply undeniable (search engines are technically scrapers). I could understand your reactions if we disagreed about scraper sites in general, but we actually see eye to eye on the jist of the topic - most scraper sites are scum.

Craig_F

10+ Year Member



 
Msg#: 7116 posted 1:24 pm on Jun 8, 2005 (gmt 0)

Search engines and scraper sites both get their data the same way: they "scrape" content from other sites. That's the technical side of this argument, and it's undeniable.

Exactly.

Your inability to understand reality is beyond me

Unreal isn't it? It's shocking really to see people at WebmasterWorld of all places entirely ingoring the facts.

I can only imagine that this is continuing for 1 of 3 reasons:

1) they don't understand the techology at all
2) they are just too upset with scrapers to see reality
3) they just like to argue for arguments sake

spaceylacie

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 7116 posted 1:32 pm on Jun 8, 2005 (gmt 0)

Okay, I give in. I blaim the folks who titled them "scapers" instead of "SERP spammers".

See what happens when you make up/change the meaning of a word?

Atticus



 
Msg#: 7116 posted 1:38 pm on Jun 8, 2005 (gmt 0)

Search engines spider the entire web and try to organize the data.

A scraper just copies links and snippets from search engines.

It's the same difference as a student who researches a subject and prepares a well thought out paper as opposed to another student who copies the paper his brother turned in a few years ago.

Scrapers and search engines do not get their content in the same way.

If you listen to the scraper people, Ansel Adams is a scraper human because he didn't invent trees and mountains, he just copies them. See, Ansel Adams and a Xerox machine are exactly the same thing! See how if I say that over and over again it becomes truer and truer!

spaceylacie

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 7116 posted 1:47 pm on Jun 8, 2005 (gmt 0)

Ansel Adams and a Xerox machine are exactly the same thing!

Ansel Adams and a Xerox machine are exactly the same thing!

Ansel Adams and a Xerox machine are exactly the same thing!

Ansel Adams and a Xerox machine are exactly the same thing!

Ansel Adams and a Xerox machine are exactly the same thing!

Ansel Adams and a Xerox machine are exactly the same thing!

Ansel Adams and a Xerox machine are exactly the same thing!

... yes, it's starting to make sense...

MrSpeed

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7116 posted 3:07 pm on Jun 8, 2005 (gmt 0)

Would scraper sites disappear if Google weighted pagerank more heavily again? I think they have risen in the serps based on their "on page optimization".

Very few sites would link to a scraper site but would link to a site with a "scraper like" footprint.

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 7116 posted 3:15 pm on Jun 8, 2005 (gmt 0)

the definition of scraper site has two aspects, that are constantly mixed and blurred. Maybe intentionally, maybe unintentionally.

because the scrapermasters argue that if the system (i.e. Google) does nothing to stop them (and surely G could if they wanted to), their behaviour must be okay or even welcome

Disingenuous, my friend. If there is any "mixing and blurring" it's happening by those trying to associate scraper creators with the Google detractors. The assumption that there is a connection is itself what's blinkering you from seeing the truth.

Pause for a second to consider the possibility that I don't have a scraper site. No, really, pause and consider it. What's glaringly obvious is that people like birdstuff and I have expressed clearly our distaste for scrapers but also our explanation for where the problem lies. You choose to ignore the former and offer distractions from the latter. No matter how badly you feel about scrapers your attacking the concept of scraping - or the people who indulge in it - won't make scraping go away. You know who it is who needs to take action to make scrapers extinct but you target all your energies in the fruitless barking at the wrong place - up the "scraper creator" tree. We all seem to agree that scrapers should go, we seem to differ only on the "how". I subscribe to the theory that Google should do something about it, many of you seem to think the responsibility for not creating scrapers should reside with individual webmasters. It is frustrating that at least some people seem to persist in thinking that the latter is even a remotely possible solution.

This 223 message thread spans 8 pages: < < 223 ( 1 2 3 4 5 [6] 7 8 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved