Cloaking has taken on so many new meanings and styles over the last few years that we are left scratching our heads as to what cloaking really means. Getting two people to agree on a definition is nearly impossible with all the agent, language, geo targeting, and device specific page generation going on today. It is so prevalent, that it is difficult to find a site in the Alexa top 500 that isn't cloaking in one form or another.
This all came up for us in mid December when right at the height of the Christmas ecommerce season, a friends European site was banned or penalized by a search engine. After numerous inquiries, it was learned that the surprising reason for it was cloaking. I got asked to take a look at the site and figure out where their was a problem. The site owner didn't even know what cloaking was, let alone practice it.
I determined that his off-the-shelf server language and browser content delivery program was classifying search engines as a text browser and delivering them a text version of the page. In it's default configuration, this 5 figure enterprise level package classified anything that wasn't IE, Opera, or Netscape as a text browser and generated a printer friendly version of the page that was pure text.
We explained to the SE just what the situation was and they agreed agreed and took off the penalty after we said we'd figure out a way around the agent part. Unfortunately, the package had all but compiled in the agent support and they were surprised when we informed them about it. What was even better was looking around some fortune 500 companies that run the same software to find three entire sites that were in effect "cloaked" - they didn't have a clue.
In the end we solved the problem with another piece of software that would exchange the agent that the site delivery program was seeing. Yep, we installed cloaking software.
So lets have a little run down of the current state of cloaking in it's various forms:
We've talked a bit about about agent based cloaking recently [webmasterworld.com].
Search Engines Endorse Web Services Cloaking:
Cloaking has become just varying shades of gray. We now have instances where search engines themselves endorse cloaking (xml feeds) and in some instances are giving out cloaking software to deliver those xml feeds.
That has resulted in pages intended (cloaked) for one search engine being indexed by another search engine. There have been occasions where this endorsed content has been banned or penalized by another search engine.
Geographic IP Delivery:
Language translations have been a hot topic for the last year. Most major sites now geographic deliver content in one form or another. Hardly a month goes by when someone doesn't scream, I can't get to Google.com because they are transparently redirected to a local tld. You will also find those same search engines custom tailoring results for that IP address (eg: personalized content generation). You can see the effect your self by changing your language preferences on a few search engines that offer the feature.
One Browser Web:
The recent history of major browsers is summed up in IE4-6, and Netscape 3-7. There is also a large 2nd tier of browsers: Opera, Lynx, Icab, and Mozilla.
All of these agents support different levels of code and standards. They also have inherent bugs related to page display. If you are a web designer, you could get a degree in the various browser differences of CSS and HTML alone.
Just when we are starting to think in terms a one browser web, along comes a whole new set of browsers to consider: Set Top Boxes, Cell Phones, PDA's, and other Mobile Devices. These all have varying degrees of support for XML, XHTML, CSS2/3, and the web services protocol blizzard (eg: .net, soap...etal).
We've not even begun to talk about IE7 which is rumored to be in final internal beta testing. Then there is Apples new browser and the growing horde of Mozilla based clones. When you put it in those terms, our one browser web seems like a distant dream.
Delivering different content to these devices is a mission critical operation on many sites. Generating content for mobile devices is a vastly different proposition than delivering an xml feed to a search engine, or a css tricked out page for a leading edge browser.
Given that the combination of vistor ip and user agent can run into hundreds of possibilities, the only valid response is agent and ip cloaking.
Off the shelf cloaking goes mainstream.
There many off-the-shelf packages available today that include cloaking in one form or another. The perplexing part is that many sites are cloake in ways you wouldn't even know about. There are several major forum packages that cloak in some form or another.
I was at a forum this morning that was agent cloaking, and other that was language cloaking. In both cases, the webmasters don't even know that it is taking place - let alone have the tech knowledge to correct it.
Welcome to 2003 - Modern Era Of Search Engines.
This isn't the web of 98-99 where people would routinely get whisked away to some irrelevant site unrelated to their query. Todays search engines are vastly improved with most engine algorithms putting Q&A tests on every page they include. Those range from directory inclusion requirements, inbound link count and quality, to contextual sensitivity and even a pages reputation.
In this modern era where search engines now routinely talk about their latest off-the-page criteria algo advancements, it's clear that traditional se cloaking has little effect. It comes down to one simple fact, those that complain about SE cloaking are simply over looking how search engines work. The search engines have done a fantastic job at cleaning up their results programatically and by hand.
The most most fascinating thing about this new main stream cloaking is the situation where a site just classifies a search engine as a graphically challenged browser. In that case, cloaking becomes mostly a agent based proposition. The trouble starts when you throw in language delivery to the equation, or even delivering specific content as part of a search engine program.
All of these wide ranging factors combined to result in about 10 to the 4th power of page generation possibilities. In that situation, it almost becomes a necessity to put spiders into the all text browser category and deliver the same page to the se's that you deliver to cell phones or the Lynx browser.
Thus, we've come full circle on search engine cloaking. We no longer cloak to deliver custom content to search engines, we now cloak for the search engines to keep them from getting at our cloaked content for visitors.
<edit> cleaned up some typos and syntax errors</edit>
[edited by: Brett_Tabke at 6:15 am (utc) on Feb. 3, 2003]