This problem has opened my eyes to the need to improve the navigation and the organization of a site that has been evolving online for 11 years. I feel like my site is better for visitors now and hopefully for Google as well. But it's taken more individual tweaking to get my 950ed pages back and a few are still out.
Wikipedia is a great example of what is doing well. The pages are moving up like crazy, even ones with just a paragraph or so that are marked as stubs. I don't know if Google is giving less weight to sites that address one topic in depth or if its that Wikipedia just gets up there on other factors. About is another one that gets up there based on a massive site
In terms of validated code That couldn't have been a factor in my case as I'd lost just a small fraction of a few hundred pages. The coding is the same on all of them.
I think our time is better spent studying the phrase based patents. Start here with
Detecting spam documents in a phrase based information retrieval system [appft1.uspto.gov]