> It's pretty obvious that SE have a lot of checks to detect cloaking.
I assume you actually mean spam in this sentence and not cloaking.
> Now, the easiest to detect cloaking is actually to "see" the page.
The best way for a search engine to detect cloaking is to compare the cache from a "known" spider to the cache from an "unknown" spider. By known and unknown, I mean relative to the cloaker.
We suspect that the major search engines are running spiders under browser user agents from IPs not registered to them. The spiders would have to be programmed to act just like browsers, requesting images, sending HTTP_REFERER headers, etc., in order to "fly below the radar".
I also suspect Google uses information collected from its toolbar and accelerator. The major engines may also have deals with Alexa and/or other companies that spider a lot but aren't considered search engines.
I believe they use algorithms that analyze the text content of the page in their comparisons. I doubt they use actual screen shots for their comparisons.