Welcome to WebmasterWorld Guest from 18.104.22.168
screen scraping is a technique where automated tools are used to download a web page and extract (scrape) some of the information on that page in order to place it on another web page.
Sample uses of screen scraping include:
Obtaining stock quotes from another site and displaying the data on your own site.
Grabbing a page from dmoz.org, and reformatting it on your own site to create your own web directory.
Creating a search engine and showing snippets in your SERPS (like Google do)
A scraper site on this forum usually refers to a site something like this...
Someone sets up a web site to show adsense ads for a particular high performing keyword.
They then take a look at sites that perform well in the search engines, and extract some of the text from those sites (maybe a paragraph from each site) and display it on a web page.
Note: frequently, rather than grabbing the text from the web site directly, they scrape the content from Yahoo SERPS snippets.
They then have a page that has a load of related snippets from various sites which is highly targetted to a specific (usually high paying adsense) keyword, but does not contain any useful information.
The search engines have a habit of ranking these pages high in the SERPS. A user types in a search term, visits the scraper site and finds that it's useless. Quite often the user will see the adsense ads which are likely to contain useful information relavant to the serach term they entered, so they click on the ad, and the owner of the scraper site makes some money.
joined:May 10, 2005
As was pointed out Google itself is a scraper site. Scraper abuse is the issue here not scraping itself.
joined:May 10, 2005
A broad definition could be utility itself which is similar to the classic definition of spam as "anything the recipient doesn't want."
Scrapers offer no such respect for the intellectual property of others, or couch their "respect" in terms of "if you want us to delist you, you have to block all spiders (and be delisted from every legitimate search engine along with our garbage sites)."
Lets say I did a bunch of research on a particualr topic and am presenting the research I feel is "the best" in a manner similar to google search results. Even though I have not used a bot, I am not using my own content but I link directly to the original source, is that considered bad practice or scrapping?
Does Google Disables that publisher if i report the websites.I know many websites that are doing the same to my website.
I investigated and found that this publisher had many sites using the same layout for each one. All scraper sites. It may take a while but Google will get to them.
You guys must hear this funny story I would like to share.
This funny thing happen to me, I email google about this site which is a scraper site, this site is always #1 in my nitche, and if anyone want to see what site I am talking about just pm so you can see its a full blome scrapper site. Now this is the funny part I check 4 days later lol, and not only is the scrapper site on top but my site which use to always be in 2nd place was out completely, i mean it wasnt even in the top 50 :)
now I dont want to say it happen because I email google, because I want people to email google about this type of sites untyl google does something about it. But is just a funny story I think :)
Scrapers most often do so for adsense or other ad revenue, but even this is not strictly necessary.
You will see silly accusations that Google etc. are scrapers, as if this justifies scraping in general.
Draw your own conclusions why people would do so. -Larry
I think a great idea for getting these sites would be to get users to rank a site relative their search term...so the user types in green widgets....and then clicks on a SERP....if its a site THEY deam is crap then the grade it accordingly...and this somehow ties into the SERPs algorithm....I am certain this type of system is around the corner....as "awesome" as google engineers think their algorithm is working...it may need a human element....dmoz is way tooooooo slow...
There are so many examples....
google, yahoo, cnn, etc....
don't get caught up in the hype of "scraper sites are bad"
Those "bad" ones will diappear with time....just give it time...if they are screwing up your business report them to the SERPs you are competing in....
I thought I read something about Yahoo doing something like I suggested somewhere but its a very faint memory...
In the end Content is King....if you have a great site full of great information you will get long visits, lots of incoming links, and with SEO lots of traffic...usually "bad" scraper sites can't compete with that....
Yes, content is king, visitors won't stay on a scraper site, if this was figured more into the algos, scraper sites could be history... or at least a step in the right direction.
Philipp Lenssen writes "Google registered a trademark for the word "TrustRank", as Search Engine Watch reveals. Is this a sign we can expect a follow-up to Google's PageRank? An earlier, possibly related paper on TrustRank is available; it proposes techniques to <b>semi-automatically</b> separate good pages from spam by the use of a small selection of reputable seed pages."
I really don't see how google can ever stop these "horrible scraper sites"....simply saying they are useless is not really looking at what they do....for the most part there are "good" scrapers and "bad" scrapers....
First, I share the opinion that scrapers can not be good. They are all bad, just making a living of stolen content, messing up the web, cluttering SERPs, annoying web users, and finally making the web a whole lot less important. Just think of it - how cool it would be to enter whatever term into G and see a meaningful relevant high-quality site show up as #1. Followed by other equally relevant sites on #3 to #10. Now, that would be fan-tas-tique!
As to how to stop scrapers: It's simple - stop the money flow towards scrapers by tightening the quality guidelines for publishers
1) Manually check new domains/sites rather than new publishers.
2) Introduce a reporting system for each G user (including advertisers and publishers - their reports should have higher priority).
3) Manually check reported sites: if a certain number of reports has been reached for one publisher, *immediately* check his pages/sites.
4) Whenever there are no higher priorities (see #3) manually check each page/site that has been reported, beginning with those having the highest number of reports.
5) Whenever there are no higher priorities (see #4) manually check each site again, beginning with sites from the highest earning publishers.
6) If there is no useful content up there ('made for AS') ban the respective publisher - forever!
7) On Google Search, penalize *all* sites run by publishers who were thrown out of AS. No matter whether they actually carry AS or not. No matter whether they are 'useful' or not. This could be done by correlating the info to the domain owners on WHOIS.
8) Make all these measures and their consequences very clear to publishers. They should understand that by running scrapers THEY can damage their whole relationship with G.
The consequences -
1) AS becomes quickly very unattractive (financially) for scrapers who rely on AS.
2) AS becomes quickly very unattractive for webmasters who want to do anything with their web skills in the future. (Again, once out you'll never get back into the SERPs again, not even with 'useful' sites.)
3) Google SERPs will automatically clean up once the scrapers/useless sites are gone.
Of course, as mentioned many times before, we have to ask whether this is in the best interest of the AS team (who have high revenue/profit targets, I believe). Removing scrapers will remove a good share of the revenue as well.
I just hope that GG or ASA are listening here as well.
joined:July 8, 2002
Well, well, the answer is - no, obviously. ;-)
But I am convinced that by putting out higher stakes, G can easily increase content quality almost over night. Just think - if your future as webmaster running your own sites (listed on Google SERPs) depends on whether you run a scraper site or not, would you do it? Would you *really* do it?
As to how to stop scrapers: It's simple - stop the money flow
What if the scrapers evolve and find an alternate way to monetise? How about cloaking to redirect visitors to pr0n?
People getting to scrapers via SERPs is a SERPS issue, not an Adsense one. And if Google is working on a solution it will be a SERPs solution (despite GG's noises about feedback on the "Ads by Google" button).
But, this is all off topic.