I have been asked a number of times to explain how cloaking works, most recently in a sticky mail from one of our members. The difference in this request was to make it a beginner's primer. No over the top techniques, just a straight forward, how it all fits together primer. Ok then, here it is.
At it's most basic (and even at it's most sophisticated) there is really no magic in cloaking, coming to terms with that will make it much easier to "get" cloaking. IMO the biggest stumbling block for beginners to understanding cloaking is that they fail to relate it to things they already do everyday with their site, and the way their site works. Making it too much of a mystical excercise is also a stumbling block. Cloaking is simple, it uses simple techniques, and is not over anyone's head, keep saying that to yourself as you read on, and you will see there is nothing to it, and you'll "get it" easily.
Forget about cloaking completely for a minute. Let's just look at how a regular site works.
Let's say your domain name is www.foo.com, and you created a page named page1.html.
When you create this page for your site and have finished it, you know that for others to see it you must upload it to your web hosting space so that others can see it when they enter the URL of that page. The URL for the above example would be www.foo.com/page1.html. Let's dive deeper into the server's set up to understand what really happens when your ISP sets up your web hosting space and you upload a page there for others to see.
Bsically here's what the ISP will do. They take your registered domain name (eg. www.foo.com) and define a host container in the web server's configuration files. Part of this configuration tells the web server where it should look for files for the host www.foo.com. The location where the files for www.foo.com can be found is the real location of the files on the server. This path on a *nix server looks something like this; /home/servers/foo, (on Windows servers it might look like; e:\home\servers\foo).
Next, as part of defining your host (www.foo.com) in the web server's configuration file, the ISP also defines that the root path for your host is /home/servers/foo, then a userid is created for you on the server, and usually, it is set up so that when you log in you are automatically placed in your root directory, i.e. /home/servers/foo
When you FTP to your web site to upload your new page, i.e. page1.html, you are really uploading and saving the page to the directory named foo, which is in a directory named servers, which is in a directory named home, in other words it can be found on the server by following the path /home/servers/foo/page1.html
If you were to create a directory within your web site the exact same principle applies. Assume you create a directory and name it bar, and then upload another page1.html into it, the path to the page would be /home/servers/foo/bar/page1.html, still with me? Good.
Part of the built in function of a web server is to associate these paths to the host name so that it knows where to find the content of a page so that it can be displayed to a browser. Remember we said that the ISP associates a root path; /home/servers/foo to the host www.foo.com in the webserver's configuration file? When you tell someone to visit your page at the URL www.foo.com/page1.html, the web server takes the request and separates the host from the file that is requested, when the host is removed from the URL, what is left over is /page1.html, the webserver then adds this to the root path /home/servers/foo and comes up with /home/servers/foo/page1.html, it now knows where to find the content on the server, it gets this content, and it is shown to the browser or spider requesting it. The browser or spider getting the result of this request has no idea what path the content came from.
Had you given out the URL of www.foo.com/bar/page1.html instead, the same process would produce /bar/page1.html being added to the root path, yielding /home/servers/foo/bar/page1.html and the content to be displayed to the browser would come from that path.
The point is, this is no different than how you store files on your own computer, and when you want to find those files you go to the directory you put them in to find them.
This is why many types of scripts you may have installed have you specify the root path to your web space as part of the installation process. It is the only way the script can know where to find a file it will read from, or write to.
Ok. now let's bring cloaking into the picture, but let's call it selective page serving based on language. Suppose you had three versions of page1.html, each with a different html structure, and written in a different language, each stored in a separate directory;
The URL you have published is www.foo.com/page1.html
In your root directory page1.html is replaced with a server side script (which just means it is executed on the server, not by the browser/machine requesting the URL) which checks what the language setting is of the browser asking for www.foo.com/page1.html, if the language is English, then it reads /home/servers/foo/england/page1.html and writes to /home/servers/page1.html, the webserver does it's thing and happily returns www.foo.com/page1.html completely oblivious to the fact that a script read the content from /home/servers/foo/england/page1.html and returned it as the content to display.
if the language of the browser requesting the page is Dutch, then it reads /home/servers/foo/dutch/page1.html and writes to /home/servers/page1.html, the webserver does it's thing and happily returns www.foo.com/page1.html completely oblivious to the fact that a script read the content from /home/servers/foo/dutch/page1.html and returned it as the content to display.
This of course could be repeated for French, and as many languages as you might want to selectively serve for, in exactly the same manner. The browser or spider getting the result of this request has no idea what path the content came from.
That is all there is to it. Basic reading and writing, I used language as an example, but obviously IP addresses could be used just as easily in the example if your intention was to serve different content based on the IP address of the requestor.
A little long winded (ok a lot) but hopefully it helps to remove some of the mystery around selectively delivering content.