rish3 - 3:28 am on Feb 27, 2013 (gmt 0)
I think that the nature of Panda, being a machine learning algorithm, means that all of these things are probably ranking factors. Some more than others. Machine learning implies that the factors aren't known beforehand, but rather, are dynamically discovered.
My high level synopsis of how I understood it worked was that human raters judged some sampling of sites and rated them on some scale of trustworthy/good to spammy/bad. The algorithm then crawled the sites and tried to establish some correlations with the raw crawled data (and perhaps some other information, like backlink profiles).
If that's indeed the high level, then anything that correlates strongly to either "good" or "bad" is probably used, even if seems trivial. Strong correlations, I imagine, get more weight than weak ones.
Since it's a machine learning algorithm, I don't think anything is entirely off the table. Things that might go into the "good", "bad" or "neutral" buckets:
- html comments and/or tags injected by "SEO" plugins
- <meta generator=""> tags that indicate an out of date CMS
- old copyright dates
- outbound links to specific, trusted resources
- outbound links to "bad neighborhoods"
- existence, and uniqueness, of common pages
- phone numbers, addresses, people's names, etc
- counts of broken links, unmatched tags
- use of unique images that don't appear on other sites
- other pages that hint at legitimacy (a "jobs" page, for example...or perhaps a "refund policy" page for ecom-looking sites)
- no doubt thousands of other factors