pontifex - 10:12 pm on Jan 23, 2011 (gmt 0)
Great - one of the threads that make WebmasterWorld the best!
There is little to add to the responses here, just a remark from where I sit:
IMHO there is a big difference in "determine the quality level of a page" and "separating SPAM from legit".
I know that most of you disregard my opinion that: the technology behind a large scale search engine is not rocket science, just pretty complicated.
Yet following my own opinion I still believe that Google is not performing miracles, but a series of programmatic steps that ultimately lead to a SERP for a keyword.
"... new classifier ... on-page content..." reads in my world as a new pattern matching algo that will be performed during ranking calculations.
That is IMHO primarily a scraper fighting approach - to filter and identify something as JUNK from a page, you need a pattern to look for.
So my 2 cents on that:
Clearly identified JUNK will be used to filter out NEW JUNK! That sounds OK to me!
internetheaven: So now I HAVE TO WRITE a junk piece of SEO nonsense on my front pages to stay in the index
Good one - but for photo-driven sites it was ALWAYS a good idea to have photos with an extensive info part in their header (EXIF) and is as important as it ever was. The source of these photos will rank for the EXIF info - on the photo search in any case.
Beyond that: ranking for a search term with a page that contains only a few words and a bunch of photos without EXIF infos - is that even possible ;-)