What is a scraper site?

Forum Moderators: martinibuster

Message Too Old, No Replies

What is a scraper site?

sunzfan

4:11 pm on Jun 2, 2005 (gmt 0)

Okay - people keep referring to scraper sites and I'm not sure exactly what that is - could someone quickly give me a definition?

It's different than spam pages?

weela

12:56 pm on Jun 3, 2005 (gmt 0)

Qur1uS

“don't get caught up in the hype of "scraper sites are bad"

Laughable.

Google is a search engine (among other things), Yahoo is a search engine, and directory
CNN is a news network, and a "website"

You’re misinformed as to what a scraper site is. A scraper site steals content and offers nothing back in return; Google, Yahoo and CNN all do, so your assessment is comical at best.

glengara

1:11 pm on Jun 3, 2005 (gmt 0)

So what are these scrapers doing for links, cross-linking domain farms?

jeffb

1:31 pm on Jun 3, 2005 (gmt 0)

>>So what are these scrapers doing for links, cross-linking domain farms?<<

I get link exchange requests from scraper sites all the time. I wouldn't be surprised if scraper sites now make up the majority of the link exchange requests I receive.

Some of those who scrape long lists of SERPs apparently then spider the sites they link to, harvest an e-mail address and send link exchange requests to as many sites that they link to as they can find e-mail addresses for. They count on getting links from those who are link-crazy enough to accept every link exchange request they receive.

europeforvisitors

2:32 pm on Jun 3, 2005 (gmt 0)

Google is not a "scraper site." That's a self-justifying lie promulgated by scrapers who'd prefer that the issue of scraper sites not be discussed. (It's on a par with the nonsensical claims that "spam" means "sites positioned ahead of mine" and that users benefit from blackhat SEO techniques).

Search engines add value to the Web by crawling pages, indexing the text, and delivering answers to search queries in the form of ranked search results. If you were to remove the ads from a SERP, the page would still have a reason to exist (though it might not be profitable for the search engine).

A scraper site, on the other hand, adds no value of its own; it merely steals or borrows results from another source and uses them as filler for an ad page (where, more often than not, AdSense ads are disguised as search results with the scraped search results being hidden "below the fold"). If you were to remove the ads from a scraper page, the page would have no reason to exist, because it was conceived and designed solely as a platform for ads.

oddsod

3:05 pm on Jun 3, 2005 (gmt 0)

A scraper site, on the other hand, adds no value of its own

Depends on how you use them :) They're pretty useful to see what Google's SERPs were before Bourbon... whatever your favourite search term.

I'd like to see scrapers out of SERPs but let's be fair - there is very little difference between them and Google

1. Google respects robots.txt (well, sort of, they will read and index blocked pages but just not show them in SERPs), scrapers don't
2. Google eats your bandwidth - scrapers don't. They tend to eat SEs' bandwith. Yes, those snippets they stole from your site was actually done without visiting your site. They "stole" if from the copy the SE took earlier.
3. Google attempts to provide the user with relevant results. Scrapers don't. And as long as scrapers appear in Google SERPs then it could be argued that Google isn't really providing relevant results.
4. Google will remove you if you ask, scrapers won't
5. Google sends traffic, scrapers send traffic (most of them anyway). Proportionate to their size scrapers may send you more traffic than Google does.

So far in the analysis Google is slightly ahead. But then scrapers don't leave 40 year cookies, track your movements with a toolbar, add links on your content to take people away to Amazon....

I'm not trolling you EFV. I'd like to see them gone. But, for lasting effect, they need to go for the right reason and via the right tools (via improving the algo ... not the shortcut of closing Adsense accounts).

security56

3:39 pm on Jun 3, 2005 (gmt 0)

The biggest difference between Google and a Scraper site in my opinion is that internet user actually go to Google with the intent of finding links which contain what they looking for. In the other hand scraper site are force into people, the internet user had no intent to go from Google to another site that has more links for them to search.

Craig_F

4:14 pm on Jun 3, 2005 (gmt 0)

By that operating definition Google is not a scraper, as nobody has to be listed in Google, and Google will respect anybody's wish not to be indexed or cached.

Oh? My TOS has very specific language about not scraping the content, but Google does daily.

Robots.txt you say? Ok, let me get this straight...I'm supposed to follow the rules of the scraper!?

Just playing devil's advocate a bit, but it makes sense to me. All engines are scrapers, just like the spammy scrapers, only difference between the two is that the engines are generally liked.

andrea99

4:15 pm on Jun 3, 2005 (gmt 0)

If a scraper site were edited to remove all the junk it would be more valuable than the Google page it scraped.

hyperkik

5:41 pm on Jun 3, 2005 (gmt 0)

Craig_F, you say you're playing "devil's advocate", but sometimes it seems that everybody here who plays that role (even if protesting to the contrary) operates scrapers.

hunderdown

5:52 pm on Jun 3, 2005 (gmt 0)

Hypothetical test:

Show 100 average Internet users a choice of using Google or a scraper site to find useful information. The ones who aren't drunk or asleep will choose.

So, play semantic games all you want. Find clever ways to demonstrate that Google is a scraper. Go right ahead.

But in the end what matters is that there are real differences, ones which even average users are aware of, on some level.

Qur1uS

6:22 pm on Jun 3, 2005 (gmt 0)

Firstly, I want to make it clear I do NOT operate a scraper site...I spend my time building sites that will be around LONG TERM that add value and add real experinces to my visitors.

Secondly, Scraper sites do not compete with me on SERPs...

As I stated earlier... I can't think of a scraper site that is in the top 10 for any keyword that is worth anything...other than the ones I've stated...btw...those sites I stated do NOT create all the content that you see on their pages....

Thirdly, I feel (my personal OPINION) SOME of those scapers are alot more useful than the results you get when searching for a less searched key phrase...

Lastly, Instead of making blank statements and broad stroke judgements we should try to debate inteligently.
Reading some of the posts I get a sense of great anger...I appreciate that you may have a reason to be angry, but if you would explain why these scraper sites have effected you so negatively it would add to the debate...

Have a wonderful day! ;)

hunderdown

6:38 pm on Jun 3, 2005 (gmt 0)

Qur1uS, for what it's worth, so far as I can tell scrapers have had no effect on my site. As a web user, though, I don't like them. They don't help me find information. I have yet to see one that has any obvious reason for existence other than as a place to put pages and pages full of AdSense ads.

jim_w

6:44 pm on Jun 3, 2005 (gmt 0)

It really kills me. Search engines Do NOT disguise themselves as browser when they crawl a site. Now using common sense, why would a legitimate search engine, directory, et al. need to disguise themselves? And they also don’t use broadband or dynamic IP’s.

jeffb

7:02 pm on Jun 3, 2005 (gmt 0)

I think people are arguing from totally different definitions of a scraper here.

If you define what constitutes a scraper site in terms of technology used (namely, using a spider to gather information from other sites and repackage it on their own site), then, yes, BY THAT DEFINITION Google and all the other search engines are scraper sites.

If you define a scraper site in terms of builder's intent (namely, repackaging information gathered from other sites for the purpose of convincing searchers that they will find relevent content there when in fact the site contains nothing but the same ads they saw in the search engine results they came from), then, no, BY THAT DEFINITION Google and all other major search engines are not scraper sites.

One definition is strictly technological, the other is predominently ethical. And the definition you choose to use is a matter of preference.

Craig_F

7:02 pm on Jun 3, 2005 (gmt 0)

hyperkik, thanks for the compliment! But, I can't code my way out of a paper bag :) I suppose I could build a scraper site by hand but that would kind of defeat the purpose.

play semantic games all you want

hunderdown, all I can really say to that is *you* seem to be playing semantic games if you think search engines are anything but large, well financed, highly polished, scrapers. and the engines are just the beginning, many other major sites scrape too.

the OP's question was about what a scraper site is, and I think it's important to call out that there are a variety of scraper sites, both good and bad. That's where we get to the meat of it -- which is which when the process, output, and ultimate goal is essentially identical?

hunderdown

7:25 pm on Jun 3, 2005 (gmt 0)

Craig_F, you're doing it again.

But I noticed you DIDN'T respond to my assertion that the average user sees a difference, even if you can define it away, as you just did once more.

This is a pointless discussion, really, because the two camps here just don't agree on the value or lack thereof of scraper sites, and so can't even agree on the definition of a scraper site.

oddsod

7:32 pm on Jun 3, 2005 (gmt 0)

jeffb has put it well.

The thing is there seems to be a feeling that anybody associating Google with scrapers has got to be operating a scraper site i.e. nobody would equate Google with a scraper unless they were running a scraper themselves. The logic is flawed.

I'm quite happy to say that I own no scrapers and would be happy to see them out of the SERPs. I'm also satisfied in my mind that Google is a scraper (albeit a welcome one in most cases).

>> Google doesn't pretend to be a browser
Not overtly like the scrapers do. No. It's done covertly ... like via the toolbar ... to collect information that the bot didn't get.

bbcarter

7:40 pm on Jun 3, 2005 (gmt 0)

Scraper pirates are getting more clever.

The latest is that they use randomization and thesauri in combination with scraped content to produce pages that look quite normal. (check out articlebot)

Having seen some of this, it's hard to see how Google's bots could counter it. These scraper guys are one step ahead. And I'm not certain they have consciences.

It won't be by content, but by the quality of backlinks that you're known.

Probably, content quality will always have to be judged by humans.

mzanzig

8:03 pm on Jun 3, 2005 (gmt 0)

Reading some of the posts I get a sense of great anger... I appreciate that you may have a reason to be angry, but if you would explain why these scraper sites have effected you so negatively it would add to the debate...

First of all, I cannot =prove= that these site have effected me negatively, at least not with figures ("I lost XXX due to scrapers").

But after having seen countless scrapers as a web user and also as an alert webmaster, I came to the following conclusion:

1) Scrapers do =not= provide any useful service (which is the difference to real search engines). They are so utterly ugly/bad/useless that any user who sees it either hits the back button or is being tricked into clicking one of the numerous ads. This reveals the real intention of the scrapermasters: Earn money (which in principle is fine, BTW).

2) They do it off stolen content, i.e. without adding any content or service on their own. A software creates thousands of useless pages containing just gibberish, fine-tuned to show up in SERPs, finetuned to trick users into clicking the ads.

3) So we know they do it for the money, and they use stolen content. But, you see, advertisers have limited budgets. Every single click for a scraper pulls money out of the advertisers pocket. Money that otherwise would be spent for ads on sites belonging to the real content owners. So I assume that the community of honest AS publishers is being hurt =collectively= by the community of AS scrapermasters with their zillions of useless pages. And that's why every honest webmaster gets emotional when it comes to scrapers...

We can not prove it, but we know that they hurt us.

-- Mark

bbcarter

8:25 pm on Jun 3, 2005 (gmt 0)

You've got to be kidding.

Of course scraper spam sites are injurious and steal money from us.

They also degrade the quality of the internet, which gets in the way of our customers reaching us, and decreases their trust in the internet as an information source.

Get real.

Let's move on to how to respond to them, please.

spaceylacie

8:29 pm on Jun 3, 2005 (gmt 0)

What is a scraper site?

A site that uses a program or programs to generate Internet pages with no human intervention.

Google uses plenty of man power, so not technically a scraper.

Surely, we can all agree on this much.

Loki99

8:35 pm on Jun 3, 2005 (gmt 0)

Europeforvisitors said:

“A scraper site, on the other hand, adds no value of its own; it merely steals or borrows results from another source and uses them as filler for an ad page”

Its funny you said that, I was just searching for “keyword + location” and your site was number 1.

Did it have the content I was looking for? No. It had links to other peoples content with Adsense ads on top.

If I didn't know it was you, I would have thought it was a scraper site.

spaceylacie

8:48 pm on Jun 3, 2005 (gmt 0)

Selecting good resource links, based on your knowledge of the subject, is not that same thing as a scraper site.

Scraper Site = No Human Intervention

asianguy

8:52 pm on Jun 3, 2005 (gmt 0)

Based on all your definations about the scraper sites, all of your sites are scraper sites because your have linked to other sites to provide other information one way or the other.

eCommando

8:53 pm on Jun 3, 2005 (gmt 0)

What about all the sites using rss feeds?
Are they considered scraper sites?

SuperSeo

9:14 pm on Jun 3, 2005 (gmt 0)

Is Google Directory and Google or Yahoo News considered scrappers?

Google Directory uses same data from DMOZ.
News use feeds from News sites.

I don't belive those are original content.
Also, I believe it's all automatic.

spaceylacie

9:19 pm on Jun 3, 2005 (gmt 0)

I started to respond, but then realized... I'd just be continuing this talking in circles.

I give up.

AffiliateDreamer

9:19 pm on Jun 3, 2005 (gmt 0)

>If you were to remove the ads from a scraper page, the >page would have no reason to exist, because it was >conceived and designed solely as a platform for ads.

i don't have any scrapers btw BUT, i think EFV statment here does not hold much weight. if you were to remove all ads from Yahoo & Google, do you think they would be around much longer? Isn't the goal of all search engines to profit from their scraped listings? i.e "conceived and designed solely as a platform for ads". Tell me if i am wrong, but no search engine was founded to make the internet a better place now was it? hehe Also take into account that 5/6 surfers don't know when they are clicking on a advertisment or a organic listing.

spaceylacie

9:22 pm on Jun 3, 2005 (gmt 0)

Yeah, everyone's a scraper...

bbcarter

9:24 pm on Jun 3, 2005 (gmt 0)

You guys are too philosophical. ;-)

The definition of scraper sites that is useful if your goal is to make money online is:

"a site that steals content from other sites without adding any utility etc."

the normal rules of plagiarism and fair usage should apply.

the pirate scrapers we're concerned with are the ones who don't want to do any real work, just want to steal content and make money with it.

now, what do you do about them?

This 223 message thread spans 8 pages: 223