Rowtc, my site has a lot of time series fluctuations and a good chunk of traffic based on external events so it is very hard to analyze by doing basic period comparisons. I'm looking for more sophisticated techniques. I suspect that Segment Analysis could do the trick, but I am still gaining experience with it, and it's all hunt and peck for me right now.
Ideally, I'd like a button that said "your traffic is down specifically because of a 75% dropoff in Canadian mobile users who are arriving via Google search via your team pages". I know that's not going to happen automatically, but I bet there are techniques to cull that information out of Analytics.
I don't know why my site was penalized but I have zero doubt that the penalty hit on April 24 and the penalty was lifted on Oct 12. The search impression graphs on WMT are absolutely clear on that. The dates suggest Penguin but other things suggest Panda.
I did the following things:
Introduced better site navigation with breadcrumb links on most pages.
Consolidated some pages to create fatter pages.
Did original research and added textual information to various league and team pages, maybe about 24 to 36 such pages that have the most complete information on the topic of any page on the internet (I focused on more obscure teams and leagues since trying to out-rank Wikipedia in a mainstream topic is futile).
Changed page titles to be more informative and changed H1 elements to be more descriptive and to remove internal links (that was part of the old site navigation).
Removed significant advertising so that there is now no more than 2 ads per page, in many cases just one and sometimes zero (if the content is thinner).
Put "noindex" on some pages such as search results or stub pages that had little or no content.
Solved a bunch of duplicate title/description problems, such as when two players or teams have the same name.
Added more text to player search results with a “suggested [internal] links” section.
Put in removal requests for thousands of infrastructure-type pages that shouldn’t have been indexed.
Fixed situation where a calendar-type link allowed Google to infinitely crawl non-existent pages which still returned 200 code (like requesting a season for the year 1492).
Fixed site architecture bug whereby valid content was returned even though site gave 404 code.
Included a canonical tag on pages so that a slightly different query string would not make it appear as though there were multiple pages.
Put rel=nofollow on the 10-12 links on my links page even though the links were organic, and not the result of paid placement or link exchanges.
Added noindex, nofollow meta tags to VBulletin section of site except for the index and post pages, blocking duplicate content such as archive pages or printable pages.
Upgraded VBulletin to latest version so that all embedded links from users are rel=nofollow.
Removed hidden spam that users had put on VBulletin (in user private messages, an area I didn’t even know existed). Also put in moderation of new users to keep spammers out.
Completely blocked a development domain which had some (but not many) pages indexed in Google - duplicating pages on my main server. Removed the development domain from Google via WMT.
Requested some backlinking forum sites where my site is linked like a blogroll to remove the links. That had resulted in link profiles such as 25,000 links to a single page on my site. Not all have removed the links though.
As always, added plenty more content, focusing on biographical information on hundreds of players so that the information about them is more than just a name (it now includes birthdate, birthplace, height, weight, and position). Site is constantly being updated with new information, for example, the stats from teams from about 30 different leagues for the current season, updated daily (something Wikipedia doesn’t have – very current data).
Added rich snippets to the player page, so that each player is marked up as a “Person” from schema.org.
Of course, now that my traffic has returned, I'm overly paranoid about it disappearing again. Since Panda is now a "baked-in" algorithm, that makes it hard to detect when you're hit by it since it runs all the time. After being +30% for March and each day a minimum of +12% and a max of +65%, I'm getting nervous because in the past 7 days my numbers have ranged from -6% to +11% with an average of 0% compared to last year.