homepage Welcome to WebmasterWorld Guest from 54.198.42.105
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
Search Engines to detect cloaking?
benja




msg:675537
 11:12 am on May 3, 2006 (gmt 0)

It's pretty obvious that SE have a lot of checks to detect cloaking. Such as JavaScript redirects, weird CSS and so on...

Now, the easiest to detect cloaking is actually to "see" the page. Looking at the HTML code fetched by Google, and comparing to what a browser displays.
Do do you think SE have technologies to let's say grab a screenshot of a page, analyze this screenshot in the same way as a scanning text detection software and comparing to the HTML?

I know it looks pretty complex but is it really so complex for Google e.g.?
They might even use the Google Toolbar to do that.

 

volatilegx




msg:675538
 2:03 pm on May 3, 2006 (gmt 0)

> It's pretty obvious that SE have a lot of checks to detect cloaking.

I assume you actually mean spam in this sentence and not cloaking.

> Now, the easiest to detect cloaking is actually to "see" the page.

The best way for a search engine to detect cloaking is to compare the cache from a "known" spider to the cache from an "unknown" spider. By known and unknown, I mean relative to the cloaker.

We suspect that the major search engines are running spiders under browser user agents from IPs not registered to them. The spiders would have to be programmed to act just like browsers, requesting images, sending HTTP_REFERER headers, etc., in order to "fly below the radar".

I also suspect Google uses information collected from its toolbar and accelerator. The major engines may also have deals with Alexa and/or other companies that spider a lot but aren't considered search engines.

I believe they use algorithms that analyze the text content of the page in their comparisons. I doubt they use actual screen shots for their comparisons.

steveconnerie




msg:675539
 7:15 pm on May 3, 2006 (gmt 0)

Benja - have to agree that it wouldn't be rocket science to run a text recognition on a screen shot of a website to detect cloaking, or even spamming as you mention, such as doorways and hidden text.

But then again, why can't SE's "understand" what is being displayed when there are open-source browsers such as Mozilla? That is to say, aren't SE's just browsers that index the content that they see?

I think there is something much much deeper at play here ....

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Cloaking
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved