joined:Apr 6, 2002
Here is why I think they can rank higher: Google says that their robots revisit sites more frequently if content is added quickly. If all I had to do was scrape content, I can get tons of content up quickly. If Google believes they were the first to post the content, they must have created it.
I agree with this. As aggressively as Google might crawl a site, many scrapers are more aggressive. After all, scrapers don't care if they crash your server (Google does) and they don't have to crawl the entire web like Google does. So, when you post a new page of content, there's an extremely high likelihood that a scraper will get your content before Google sees it.
From that point, the question is whether Google crawls the scraper site's page (with your content on it) before they crawl yours. If they do, they may erroneously assume that the scraper site wrote the content and you copied it from them (!).
To get their scraped page crawled before you, the scraper site just has to be a little more sophisticated than you -- e.g. they submit the scraped page to Google via RSS feed, XML site map, Twitter tweet, etc.
In contrast, if you are just hoping that Google will deep crawl your site and find your original content, before it gets to the scraper site's page, that's not a good bet.
For me, a takeaway from Panda is that I need to get my original content in front of Googlebot as fast as possible in order to make the record clear and stake a claim that it's my content, and doesn't originate from the many scrapers that can quickly grab it.