There are many sites out there that use articles with permission of the author. I've used articles written by professionals, with their permission. There are other articles that are offered as "free to use" if the author is given credit or if a link is returned to the author's site.
How does Google tell the difference between legitimate article usage and scraper sites?
Why would Google care? Why would they want your illegal duplicate content or your legal duplicate content? It doesn't really help a user if the top 10 results are all the same article just hosted on different sites. If you ever use someone else's content make sure to change it (if possible) or add to it and/or wrap unique content around it.
It's not that google should recognize these copied articles and put them in the serps, its that an entire site should not be penalized because of it. I do not think it's right to edit another person's article or manipulate it to the point that it doesn't seem like theirs, that doesn't seem legal to me.
I have hundreds of pages that have taken my articles and republished them on their sites in return for a resource box that links back to me. All of the articles are posted on my site, and were originally published there first.
So I am curious about this scraper site problem. I don't know how they could tell the difference.
I've never had a problem with duplication of content. And my suspicion is that all of those one-way IB links from my articles on other sites is helping me in the SERPS.
I'm not sure this helps this discussion except to add some relevant though tangential information.
I have been using (on my banned site) the meta description from various links as the description. But I don't use the entire meta if it exceeds 256 characters, I will edit long descriptions. On many pages, but not all I will add my own comment to the link's own description. So no more than 256 characters from any one site appears on any of my pages. Actually my pages are superior to any Google result page for that particular keyword--that may be the problem.
Now I've always felt this is fair use because the sites themselves provide these descriptions for others to use. But no, it seems it is dup content. I have never received a complaint from any website for using their description and I was getting 8,000 visitors/day before the ban.
Now I am STILL getting a lot of return and unreferred visitors in spite of the ban so I can't eliminate these thousands of descriptions and make the site useless just to please the picky Google engineers.
I am reconstituting the site, but I don't expect to be back in GOOG with any substantial number of pages for a year or more... Fortunately I don't really need all those thousands of extra AdSense dollars (it was nice though...)