Scott_F - 11:34 pm on Jan 24, 2011 (gmt 0)
The attributes scoring model posed by Wheel is bound to be inefficient because it examines only symptoms of creating spammy content: features of the content itself and its backlink structure. Thatís analogous to identifying a ship by examining its wake.
I imagine the most accurate and efficient test would come from evaluating the content itself: that is, its semantics, grammar, and originality. That has to be where Google may be headed in the short term.
So how to detect this? In the short term, perhaps by identifying word patterns that are symptomatic of this kind of writing. Maybe thatís progress, but that still doesnít get to the meaning of the content, which is how we humans evaluate. Hereís an example:
There is a kind of content that is grammatically correct and on topic, but lacks any original point of view. That is, it simply rambles on with a list of facts. You know it when you see it. The purpose could be (and often is) to manipulate rank, but it could just as well have been written with genuine intent, but simply immature. Either way, this kind of document is not adding anything unique to human knowledge pool, and therefore would not be highly valued.
Meaning is subjective -purely human, derived at a moment in time from the conditions present in that moment. But hereís the rub: I don't believe computers can determine human value with any kind of accuracy because value constantly changes, based on the workings of the market. And internet connections and Google's ranking are indeed a market.
Google has attempted to use links as a sort of measurable currency, but they are too easily scammed. But systems like eBay or various social media operate on reputation and seem to work pretty well; however they are all closed or restricted in some way. Can a reputation model be applied in a scalable, workable way to the internet, but be relatively free from manipulation?