Natural language analysis combined with statistical analysis of, as you said, grammar and spelling, plus vocabulary, should have the potential of making a pretty good guess in that regard about any article, given a pool of similar articles written at various skill levels
An algo can see HOW a subject is discussed
Natural language analysis is a great concept. But if the algorithm ranks a page simply because it is smaller (perhaps making it "user friendly"), then the language hasn't truely been analyzed (has it?); there may not be any substance there to analyze on extremely small pages (blurbs, sales copy, etc). I am not a statistician or purport to know anything about language analyses. I am simply trying to understand why preference may be given to smaller documents, all things considered equal. Again, this has been my observation over the past year and I have seen it work when tested. Was the test worth it? No.