Agree completely with you, MadScientist. Nice post.
A website's content can be unique and yet you can be similar to a spammy site. If a similarity attribute is, for example, the ratio of ad space on your site to text space, and your ratio is similar to that of a spam site, it raises the probability that you are a spam site. There are many "tells" that suggest a site is spammy. Even if you are unique and good, you may inadvertently have enough of those tells to convince an algorithm that you are spammy. It has nothing to do with whether your content is unique and whether Google crawled your unique content before it crawled that same content on a scraper site. It's similar to Bayesian spam filters -- your email can be unique but it can still get flagged as spam based on its characteristics.