Tedster, what I've discovered is there is one piece of data, already collected in Google Analytics, which has a direct correlation with page quality. I think it really is that simple.
I started comparing our good and bad pages (as defined by this statistic) side by side and it just smacked me between the eyes. I could suddenly see very clearly WHY one page is better than another and I could immediately see what to fix. The more I did this, the more I realised how undefinable quality is, how subtle things can make a huge difference.....and that this is how Google are doing it. This is what Panda is all about.
I think this single piece of information, available to everyone, is what Google is using to determine page quality. They then assess the ratio of bad pages to good, how bad the bad ones really are and where they are in your site and if there are too many throughout your site or in one particular area, they hit your traffic to the bad areas and, to a lesser extent, any good areas that users can quickly reach bad areas from (I'd guess within one click). The result is Google visitors are protected from your bad pages.
The beauty is Google don't even need to analyse your site, they just collect this statistic from users via their browser and it tells them everything they need to know about every page people visit on your site without looking at anything else.
I am now working through our entire site trying to find an example that destroys my theory, but it just works every time. I've also been working through Google's guidelines on Panda and it's works for every point. It tells you if there's something about a page people don't like, if they trust your site, everything. You then just look at the page armed with that knowledge and it's so obvious it makes you laugh. It's genius.
I think when Google sat people down and asked them to compare good and bad pages (not telling them which is which) they had a feeling this statistic told them everything they needed to know about page quality, and when their human research confirmed it they must have been wetting themselves with excitement.
It also explains why Panda is 'run' each month. Anyone looking at their own site's stats for trends would look at data over an extended period. Analytics defaults to one month because it's a good period in which to make a judgement. They need one month's stats on your site to make a judgement.
What I've found really amazing about using this method though is it reveals how vast the range of things that cause a bad experience can be, and how each page is an individual case. There are no rules. We're an ecommerce site and we're finding that price, image, product description, a fact about a product, reviews, etc. can all have positive or negative effects on how people react to a page. In some cases where we worked on a page to improve the content it actually made things worse, that's how hard it is to judge quality - looking at those pages again, knowing they are not so good, it's obvious what we did wrong.
Absolutely everything about Panda now makes sense to me. There may be other factors in the mix but I have a suspicion it could be all about this one simple statistic because it is all encompassing. If people react badly to a page or your site overall, there is a reason - it may be isolated to your site or it could be because they've already seen that content on another site. So this brings in unique content. It's not essential but if lots of sites have the same content as you, people could react badly to your site if they saw the other sites first (which is perhaps why getting rid of sites that scrape your content can improve your own quality signals and get you out of Panda).
I have an ecommerce site and when Panda hit I thought it was the obvious ecommerce duplicate content issues but now I realise it wasn't, not really. People didn't like a large proportion of our pages for whatever reason. One reason could be that they saw similar products on another site, but equally it could just be down to price, our description, the way our site looks, anything. What I now realise is we have many, many good pages and most of the bad ones are actually bad products that don't sell or generate search engine traffic, but people see them and react badly. Removing them won't cost us traffic or sales but it will bring down the bad signals and maybe even improve conversions. Some we have to keep, so we'll address what we think is causing the bad experience and see how that affects things. This, I believe, will get us out of Panda. Understanding this will then keep us out of Panda and make our site much, much better.
If only Google could have told me this so I could have made my site better a year ago. Instead I've been paying out for professional content, adding articles, rewriting product descriptions, rebranding and redesigning the site, link building, you name it. Looking at our stats from before Panda until now, I can see all that made no difference (and in some cases created new 'bad' pages which increased my Panda demotion!).
It just explains everything. Why small sites go under the radar (not enough traffic to produce reliable stats over a month), why bigger sites might recover faster, why duplicate content can harm your site, why linking out to bad sites can harm your site. It's all about the user experience, as Google and many others keep saying. Think about your users. Well, yes Google, I do, but I didn't know how to judge what they like........until now.
It won't take me long to fix this, now I know what I'm looking for, so hopefully I'll be able to report back in a month or two with some good news!