I want to cloak, but not from spiders
| 1:25 am on Mar 30, 2002 (gmt 0)|
I have never used a cloaking script, but I know how they work.
I want to serve custom pages only to a few specific referring domains. BUT I DONT WANT TO WORRY ABOUT SE SPIDERS THINKING I AM TRYING TO FOOL THEM... I AM NOT.
I have used the .htaccess file to specify the default page for my domain to be my CGI script. The script checks the HTTP_REFERER and serves the custom page if appropriate. Otherwise it serves the normal defaul page for the domain: the one that the SE's have in their index.
Type my domain name in your browser and you get the normal default page. Click on a link to my domain name and you get the default page... unless the link you click is on one of a small (less than 100) number of specific sites.
This all works fine, but I am worried if there is a way for a spider to detect that my default page is a script and not a static page. The spiders' normal tricks would not trip an alarm... they can try coming at me from any IP address they want and they will still get the normal default page, as will anyone else except if from the few specified domains.
Any help will certainly be appreciated!
| 1:54 am on Mar 30, 2002 (gmt 0)|
Since most spiders don't show up with referral info, you shouldn't have any problem. However, there are some link indexing spam programs that will pass the referral from the page they were previously on.
If one of those programs were to visit one of the pages on your referral list and then visit your site, you would end up giving them the customized page. Many of those programs are used to generate link lists which end up getting published somewhere.
Does your script serve the two different versions as the same file name or does it deliver two different file names?
| 9:06 pm on Mar 30, 2002 (gmt 0)|
WebGuerrilla - Thanks for your comments. The script only runs when the browser/spider requests the top level domain name. So the "name" of the file is just the domain name.
Any additional thoughts?
| 9:27 pm on Mar 30, 2002 (gmt 0)|
The only SE bot that may foul you up is AskJeeves -- sometimes they crawl with a referrer back to them.
| 11:03 pm on Mar 30, 2002 (gmt 0)|
As long as the script isn't serving up two different file names, I don't think you will have a problem. As little mentioned, AJ does like to show up with a referring url.
If AJ were to crawl a site on your list, and then show up at your site with the previous sites referral info, it would only cause a problem if your script ended up sending them to domain.com/index2.html. That would get indexed, and then potentially show up in search results, therby allowing people other than your intended audience to see it. But if the script just serves up domain.com, then anyone clicking on a SERP link would get picked up by the script, so they would end up where you want them to be. (but AJ would get a different page than most humans)
One idea that might make it a little more full proof would be to check both the referral, and the UA. That way a matching referral with a search engine UA wouldn't cause a customized page to be served.
One idea to make it a little
| 12:10 am on Mar 31, 2002 (gmt 0)|
Excellent input everyone!
I am going to add WebGuerrilla's enhancement as well, just to keep it nice and tidy.