diberry - 12:15 am on May 16, 2011 (gmt 0)
Oh, I thought with "small" and "large" we were talking about traffic or some kind of authority rankings.
Most blogs would have 5k pages or more, just by the nature of how they dynamically generate pages. This would include many scraper sites, since automatically pulling blog feeds and running them via your blog is a common form of scraping. So if that's what you mean by large sites, then most blogs should have been processed by Panda already.
I'm thinking Panda was designed mainly to make it harder for sites to get ranking authority, and also to divide the overall site authority from the assessment of each individual page for each search query, so that no low quality pages would go to the top just because they were from high-ranking domains. I can see that endeavor inadvertently letting spammers and smaller sites float to the top, hence the appearance the small sites haven't been processed yet (and maybe indeed they haven't).