Welcome to WebmasterWorld Guest from 54.196.223.42

Forum Moderators: incrediBILL & martinibuster

Message Too Old, No Replies

What is a scraper site?

     
4:11 pm on Jun 2, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 7, 2004
posts:81
votes: 0


Okay - people keep referring to scraper sites and I'm not sure exactly what that is - could someone quickly give me a definition?

It's different than spam pages?

4:14 pm on June 2, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 14, 2003
posts:4281
votes: 25


Google is a scraper site. A scraper site is any site that does not make their own content and uses a bot to crawl the web and publish snippits of other sites. Now when it is reffered to here they mean some guy bought some scraper software and puts up several thousand pages with adsense on it.
4:43 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 24, 2005
posts:965
votes: 0


"scraper site" is an abbreviation of "screen scraper site"

screen scraping is a technique where automated tools are used to download a web page and extract (scrape) some of the information on that page in order to place it on another web page.

Sample uses of screen scraping include:

Obtaining stock quotes from another site and displaying the data on your own site.

Grabbing a page from dmoz.org, and reformatting it on your own site to create your own web directory.

Creating a search engine and showing snippets in your SERPS (like Google do)

A scraper site on this forum usually refers to a site something like this...

Someone sets up a web site to show adsense ads for a particular high performing keyword.

They then take a look at sites that perform well in the search engines, and extract some of the text from those sites (maybe a paragraph from each site) and display it on a web page.

Note: frequently, rather than grabbing the text from the web site directly, they scrape the content from Yahoo SERPS snippets.

They then have a page that has a load of related snippets from various sites which is highly targetted to a specific (usually high paying adsense) keyword, but does not contain any useful information.

The search engines have a habit of ranking these pages high in the SERPS. A user types in a search term, visits the scraper site and finds that it's useless. Quite often the user will see the adsense ads which are likely to contain useful information relavant to the serach term they entered, so they click on the ad, and the owner of the scraper site makes some money.

4:49 pm on June 2, 2005 (gmt 0)

Junior Member

joined:May 10, 2005
posts:135
votes: 0


I would like to point out that there is a large gray area as well. Many legitimate sites begin with scraped data and add value by adding to it, editing it and rearranging it.

As was pointed out Google itself is a scraper site. Scraper abuse is the issue here not scraping itself.

A.

4:53 pm on June 2, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Apr 27, 2005
posts:333
votes: 0


Lets say I did a bunch of research on a particualr topic and am presenting the research I feel is
"the best" in a manner similar to google search results. Even though I have not used a bot, I am not using my own content but I link directly to the original source, is that considered bad practice or scrapping?
4:56 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts:2259
votes: 0


It's scraping, not scrapping. Automation is considered a key element in the scraper game.
5:36 pm on June 2, 2005 (gmt 0)

Junior Member

joined:May 10, 2005
posts:135
votes: 0


Even more than automation, UTILITY is key. When Google serps become useless Google falls into the same category as the abusers and while not an abuser itself, it is a scraper victim.

A broad definition could be utility itself which is similar to the classic definition of spam as "anything the recipient doesn't want."

A.

6:14 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 12, 2003
posts:851
votes: 0


As used in case law to date, a scraper is a site which makes unauthorized use of another site's copyrighted content. By that operating definition Google is not a scraper, as nobody has to be listed in Google, and Google will respect anybody's wish not to be indexed or cached.

Scrapers offer no such respect for the intellectual property of others, or couch their "respect" in terms of "if you want us to delist you, you have to block all spiders (and be delisted from every legitimate search engine along with our garbage sites)."

6:17 pm on June 2, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 31, 2005
posts:144
votes: 0


And please, never, ever refer to them as "scrapper" sites like half the folks do!

:)

Edit: Oops, I just noticed oddsod's entry above. Sorry.

6:23 pm on June 2, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Apr 27, 2005
posts:333
votes: 0


Just a typo, I assure you :P
7:43 pm on June 2, 2005 (gmt 0)

Full Member

10+ Year Member

joined:May 19, 2005
posts:203
votes: 0


MediaSpree wrote:

Lets say I did a bunch of research on a particualr topic and am presenting the research I feel is "the best" in a manner similar to google search results. Even though I have not used a bot, I am not using my own content but I link directly to the original source, is that considered bad practice or scrapping?


That's not a scraper, that's a hub.
7:49 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 22, 2004
posts:1082
votes: 0


Does Google Disables that publisher if i report the websites.I know many websites that are doing the same to my website.
8:00 pm on June 2, 2005 (gmt 0)

New User

10+ Year Member

joined:May 27, 2005
posts:29
votes: 0


About being a hub...

If most of your info is from other places already on the net, even if correcly edited for typo's and pages inserted for long articles, won't that demolish your ability to get good page ranking?

8:05 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 9, 2004
posts:1152
votes: 12


Does Google Disables that publisher if i report the websites.I know many websites that are doing the same to my website.

I have seen some that I reported lose AdSense. It doesn't mean that my act had them removed but over a month after I reported them they no longer had AdSense.

I investigated and found that this publisher had many sites using the same layout for each one. All scraper sites. It may take a while but Google will get to them.

8:29 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member beedeedubbleu is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 3, 2004
posts:6113
votes: 8


Sunzfan, are you from Dumbarton?
9:11 pm on June 2, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Sept 17, 2004
posts:273
votes: 0


I wish I could put a link to one of the scrapers so you can see it yourself :), but what they do is use a software which I is kindna like an addon to a link sql, which searches google for a keyword, then abstract all the top links in relation to the keyword and stores it in the database, now they have a site full of links to other site, lol man I hope google does something About it.

You guys must hear this funny story I would like to share.

This funny thing happen to me, I email google about this site which is a scraper site, this site is always #1 in my nitche, and if anyone want to see what site I am talking about just pm so you can see its a full blome scrapper site. Now this is the funny part I check 4 days later lol, and not only is the scrapper site on top but my site which use to always be in 2nd place was out completely, i mean it wasnt even in the top 50 :)

now I dont want to say it happen because I email google, because I want people to email google about this type of sites untyl google does something about it. But is just a funny story I think :)

9:27 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2004
posts:1425
votes: 0


The definitive sign of a 'scraper' site is simply that it scrapes (copies) content from other sources.
Scraping is often automated, especially with larger sites where manual copying is too much work. All those phony DMOZ directories are a fine example.
BUT automation is not necessary for the definition. There are manual scrape jobs, tailor made to specific situations.

Scrapers most often do so for adsense or other ad revenue, but even this is not strictly necessary.

You will see silly accusations that Google etc. are scrapers, as if this justifies scraping in general.
Draw your own conclusions why people would do so. -Larry

9:31 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2005
posts:1161
votes: 0


I've seen some terrible scaper sites. One I saw was taking one article, then whatever the keyword searched for, this word/phrase got plugged into the article, replacing a certain noun used throughout the article, like an ad-lib.
9:43 pm on June 2, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 31, 2005
posts:144
votes: 0


spacey,

I have seen those many times. Funny thing is, they are probably rolling in it (the dough). Ugh!

9:43 pm on June 2, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 8, 2004
posts:109
votes: 0


I really don't see how google can ever stop these "horrible scraper sites"....simply saying they are useless is not really looking at what they do....for the most part there are "good" scrapers and "bad" scrapers....there are some people that have no idea what they are doing with these scraper tools and put out what ever the default settings are....on the other I've seen a few good examples of what you can build with these sites...what is really bugging me is the blog and ping sites that use rss feeds....try going to blogger and go through 10 sites....you will see at least 2/10 scraper blogs....they are doing this because blog pages are getting amazing rankings....reallllllly fast....(sometimes hours)

I think a great idea for getting these sites would be to get users to rank a site relative their search term...so the user types in green widgets....and then clicks on a SERP....if its a site THEY deam is crap then the grade it accordingly...and this somehow ties into the SERPs algorithm....I am certain this type of system is around the corner....as "awesome" as google engineers think their algorithm is working...it may need a human element....dmoz is way tooooooo slow...

9:48 pm on June 2, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 25, 2004
posts:84
votes: 0


I'd love to see an example of a ""good" scrapers".

Anyone that takes verbatim text off a site without at the very least linking back, isnít "good" in my book.

9:52 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2005
posts:1161
votes: 0


I think visitors do vote, indirectly. This is the question I want to ask the Google Guys in New Orleans. I want to know how much of their algo is based on the human element... where people click, how long they stay... I think this should be given greater priority when determining serps.
10:08 pm on June 2, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 8, 2004
posts:109
votes: 0


weela,

There are so many examples....
google, yahoo, cnn, etc....

don't get caught up in the hype of "scraper sites are bad"

Those "bad" ones will diappear with time....just give it time...if they are screwing up your business report them to the SERPs you are competing in....

SpacieLacie,

I thought I read something about Yahoo doing something like I suggested somewhere but its a very faint memory...

In the end Content is King....if you have a great site full of great information you will get long visits, lots of incoming links, and with SEO lots of traffic...usually "bad" scraper sites can't compete with that....

10:18 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2005
posts:1161
votes: 0


Hey, I suggested to Google, when I first started, that they should allow 3 ad positions on each page and I attached a sample page for them to look at... with 3 ad positions. A month later, they announced that 3 ad units were permitted. Coincidence? Maybe. But.... you never know!

Yes, content is king, visitors won't stay on a scraper site, if this was figured more into the algos, scraper sites could be history... or at least a step in the right direction.

10:24 pm on June 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2005
posts:1161
votes: 0


P.S. Google, if you are listening, set a max time into that algo too, otherwise forum/member sites will take over the results.
4:52 am on June 3, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 8, 2004
posts:109
votes: 0


Look what I found on slashdot

Philipp Lenssen writes "Google registered a trademark for the word "TrustRank", as Search Engine Watch reveals. Is this a sign we can expect a follow-up to Google's PageRank? An earlier, possibly related paper on TrustRank is available; it proposes techniques to <b>semi-automatically</b> separate good pages from spam by the use of a small selection of reputable seed pages."

6:24 am on June 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 15, 2004
posts:612
votes: 0


Qur1uS:

I really don't see how google can ever stop these "horrible scraper sites"....simply saying they are useless is not really looking at what they do....for the most part there are "good" scrapers and "bad" scrapers....

First, I share the opinion that scrapers can not be good. They are all bad, just making a living of stolen content, messing up the web, cluttering SERPs, annoying web users, and finally making the web a whole lot less important. Just think of it - how cool it would be to enter whatever term into G and see a meaningful relevant high-quality site show up as #1. Followed by other equally relevant sites on #3 to #10. Now, that would be fan-tas-tique!

As to how to stop scrapers: It's simple - stop the money flow towards scrapers by tightening the quality guidelines for publishers

1) Manually check new domains/sites rather than new publishers.
2) Introduce a reporting system for each G user (including advertisers and publishers - their reports should have higher priority).
3) Manually check reported sites: if a certain number of reports has been reached for one publisher, *immediately* check his pages/sites.
4) Whenever there are no higher priorities (see #3) manually check each page/site that has been reported, beginning with those having the highest number of reports.
5) Whenever there are no higher priorities (see #4) manually check each site again, beginning with sites from the highest earning publishers.
6) If there is no useful content up there ('made for AS') ban the respective publisher - forever!
7) On Google Search, penalize *all* sites run by publishers who were thrown out of AS. No matter whether they actually carry AS or not. No matter whether they are 'useful' or not. This could be done by correlating the info to the domain owners on WHOIS.
8) Make all these measures and their consequences very clear to publishers. They should understand that by running scrapers THEY can damage their whole relationship with G.

The consequences -

1) AS becomes quickly very unattractive (financially) for scrapers who rely on AS.
2) AS becomes quickly very unattractive for webmasters who want to do anything with their web skills in the future. (Again, once out you'll never get back into the SERPs again, not even with 'useful' sites.)
3) Google SERPs will automatically clean up once the scrapers/useless sites are gone.

Of course, as mentioned many times before, we have to ask whether this is in the best interest of the AS team (who have high revenue/profit targets, I believe). Removing scrapers will remove a good share of the revenue as well.

I just hope that GG or ASA are listening here as well.

-- Mark

6:28 am on June 3, 2005 (gmt 0)

Preferred Member

joined:July 8, 2002
posts:584
votes: 0


geez Mark, shouldn't they be taken out back and shot just to be sure?
6:36 am on June 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 15, 2004
posts:612
votes: 0


andrea99:

Well, well, the answer is - no, obviously. ;-)

But I am convinced that by putting out higher stakes, G can easily increase content quality almost over night. Just think - if your future as webmaster running your own sites (listed on Google SERPs) depends on whether you run a scraper site or not, would you do it? Would you *really* do it?

-- M.

10:07 am on June 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts:2259
votes: 0


As to how to stop scrapers: It's simple - stop the money flow

What if the scrapers evolve and find an alternate way to monetise? How about cloaking to redirect visitors to pr0n?

People getting to scrapers via SERPs is a SERPS issue, not an Adsense one. And if Google is working on a solution it will be a SERPs solution (despite GG's noises about feedback on the "Ads by Google" button).

But, this is all off topic.

This 223 message thread spans 8 pages: 223