I applaud you for trying to test and to learn from the results firsthand.
There is a wide range of quality signals that Google is looking at. Not all quality signals are calculated into the formula at the same time. The algorithm is not a simple equation where a+b+c=ranking score. Even if the algorithm was as simple as this, unfortunately your test seems to have some flaws. It does not take into account the level of competition that can vary greatly for each keyword.
Also your test does not take into account Google's Rank-Modifying Spammers Patent. I am not saying they are using it but webmasters should remember it could be used and would skew tests results.
When I build a content page here are some things I like to include:
-H1 tags (not for spamming but usability)
-Synonyms (Google probably isn't using LSI but synonyms helps users and manual reviewers)
-Relevant content links (I don't rely on run of site links, I favor links within content)
-Inbound links from wide range of relevant sites on different IPs going to relevant deep content