jmccormac - 10:11 pm on Sep 22, 2012 (gmt 0)
Some types of problem website are simple to identify and a general solution can be put together easily. As for the spam in the comments section (that's typically user generated spam), that's even simpler to deal with from a search engine developer POV. Most of the sites that carry this spam in a fanged manner are abandoned or haven't been updated in ages. And if the comments are open then the junk comments are going to outnumber any real comments. Blogs have a very high rate of abandonment because most people don't bother updating them once they find out how hard it is to maintain a writing discipline. And of course Facebook has captured a lot of the people who would have been bloggers.
As for Google's AI, it all sounds very buzzwordy and attractive for clueless "technology" journalists but the reality of Search is that simple solutions work well by not allowing rubbish into the index. Once it is in the index, it is a far bigger problem and this is exactly what Google's Blind Crawling approach to search has created. Much of Google's approach seems to be that of people who really don't understand what they are dealing with (in terms of the web's diversity and half-finished nature) and hope that their algorithm can sort out the mess that results from spidering everything. While some level of human intervention in search quality is necessary, it would be better for Google if it came up with strategies to stop the junk getting into the index.