@dickbaker 10-4. I wasn't sure how you'd respond, I didn't know your numbers or situation, but it sounds like you and I may be in the "same group."
So, you decided 300 pages were thin content, and dumped them. Are there more to go? You said you noindexed some, and you are down to 2400. Anyway, already, 300 on 2700 is just over 10%. That's a lot. (Not being mean, ;), I had more.) And before that, you already had 500 pages that google said no thanks to.
On the dupe content, it can be tricky. I have some awesome pages, that truly beat the crap out of everyone, and I used to be #1/#2/#3. Now, few of them are above #15. That's not quite true, I still have some #1s, but that's getting to the bottom of the barrel 3-word phrases. So, I am able to say "Some of my pages are better than anything else, and I lost ranking!", so, I know this is more than a page-by-page thing. Over the years, I've had numerous issues with dupe content, and it has been the same every time - the whole site suffers, deal with the issues (usually dumping tons of pages that shouldn't have gotten into the index in the first place), and in a couple weeks, ranking is back. Usually takes 1-3 weeks, with 10 days being the average to recovery. A repeated experience, I don't know how many times, over the past 6 years.
Over the past few years, more and more people out there have been gaming the system, scraping to grab content to drop onto MLA sites. Over time, I have been scraped A LOT, more than I realized. Hard to discover when you have thousands of pages like we do, and everything seems fine otherwise (when all is well, you don't spend much time looking for problems).
What I see now is massive dupe content. I've been massively scraped. And, I must confess, I've done my fair share of scraping, back in the early 2000s when scraping didn't even have a name, it was simply a reasonable way to get manufacturer's info, etc. etc. onto your site. As I describe this, I always envision a scale, with my original content on one side, and my off-site/on-site dupe content on the other side. It gets to a point where the dupe content tips the scale. It's not about your awesome unique pages. Its about your dupe content. That's my operating premise right now. And I compare this to some of my other sites, where this is not an issue, and they are dominating the SERPs right now.
I understand your issue about 6" long is 6" long. We have the same issue. Not too many ways to throw a thesaurus at that and rewrite it. Some thin content, and some dupe content you will simply have to live with. But I am willing - to - bet - anything, that it has to do with the ratio of dupe:original content.
Honestly, I don't think working on your home page will do it. I think you need to look under the rocks on your site, those #*$!ty back room pages you've almost forgotten about. I bet your home page is / was fine. (In fact, screwing around with it too much may send the wrong signal, dunno, that's just a thought, don't put much weight on this remark, but I can imagine it doing harm, and I can't imagine it doing much good, I bet your home page was fine before and after. I may be wrong, but that would be my first bet.)
So, I'm back to your dupe content, and this is where I get excited on my own behalf because I wonder how much we share the same situation... We already know you've had some really thin content (100 pages of thin content are pretty-much-the-same as 100 pages of dupe content, as they all look the same to a bot... ; ) ) In fact, your numbers are probably at least close to 25% thin content (read: dupe content), right? 800/3200. For me, thin content = dupe content.
Soooo... The rest of the pages, the 2400 that remain. Take a small cross-sectional, as representational as possible, popular pages, pages you forgot about, etc. Maybe take 10-20. Count the # of sentences, check each sentence to see if it is a dupe at google. And, do you come at the top of the list or at the end of the list for each sentence you search? (Top, good; bottom, bad). Anyway, I am curious to know what kind of numbers you end up with. Of 20 representational pages, how many had 0 dupe content, how many had some, how many were all? And if you took a simple one-dimensional stat: Page 1, 10 sentences, 6 dupe content, 4 original -> Page 1 original score 40%, etc. etc. and averaged the 20 pages, where would you sit? I know I am boiling this down to one variable, but I bet the result may be startling.
For me, this is the simplistic analysis that I have been able to use so far to easily separate sites into 2 groups, affected and not affected. Notwithstanding all sorts of other issues / variables / whitelisting etc. etc. that I can quickly give an arm-waving case-by-case explanation on why it is extraneous. But when I compare sites that seem to be all at the "same level", the above hypothesis seems to stand.
And, when I am pragmatic about it, I can see why google wants dupe content out ($), and why they may even "penalize" a site with a wake-up shot across the bow to get your attention to help them stamp out dupe content. And if a site doesn't get the message, and they slowly wither away and die, well, sorry but, good riddance, we're cleanin' up the SERPs. ; )