Welcome to WebmasterWorld Guest from 184.108.40.206
Forum Moderators: open
Now, the easiest to detect cloaking is actually to "see" the page. Looking at the HTML code fetched by Google, and comparing to what a browser displays.
Do do you think SE have technologies to let's say grab a screenshot of a page, analyze this screenshot in the same way as a scanning text detection software and comparing to the HTML?
I know it looks pretty complex but is it really so complex for Google e.g.?
They might even use the Google Toolbar to do that.
I assume you actually mean spam in this sentence and not cloaking.
> Now, the easiest to detect cloaking is actually to "see" the page.
The best way for a search engine to detect cloaking is to compare the cache from a "known" spider to the cache from an "unknown" spider. By known and unknown, I mean relative to the cloaker.
We suspect that the major search engines are running spiders under browser user agents from IPs not registered to them. The spiders would have to be programmed to act just like browsers, requesting images, sending HTTP_REFERER headers, etc., in order to "fly below the radar".
I also suspect Google uses information collected from its toolbar and accelerator. The major engines may also have deals with Alexa and/or other companies that spider a lot but aren't considered search engines.
I believe they use algorithms that analyze the text content of the page in their comparisons. I doubt they use actual screen shots for their comparisons.
But then again, why can't SE's "understand" what is being displayed when there are open-source browsers such as Mozilla? That is to say, aren't SE's just browsers that index the content that they see?
I think there is something much much deeper at play here ....