Perl script that checks if the ip is in a spider datbase:
if ($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/)
is the only check variable I have, before delivering the page to the spider...
what I am thinking of doing is something like this:
if ($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && ($ENV{'HTTP_REFERRER'} =~ //)
{
open(FILE, "$cloak_dir/$file");
my(@lines) = <FILE>;
close(FILE);
my($line);
foreach $line (@lines)
{
print "$line";
}
exit;
The ip checking is working out fine, but I'm trying to think of other variables to check to make the script more intelligent. Would this checking of the HTTP_REFERRER variable work? Is the syntax right?
if ($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && ($ENV{'HTTP_REFERRER'})
instead of
if (($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && ($ENV{'HTTP_REFERRER'} =~ //))
also for
foreach $line (@lines)
{
print "$line";
}
exit;
you can just use
print @lines;
Are you planning to use a script to cloak your entire site?
The important part to me is the blank referrer variable, because to my knowledge only alexa fakes it's user agent. In case an employee of a search engine, for example, is trolling the pages, I want to give them the correct page, and not the cloaked version.
So that example you gave, TPK, will give out the info only to somebody from the IP range in a database, and a blank referrer? Thanks for jumping in so quickly.
Also note the correct spelling of 'referrer' in terms of environment codes is REFERER and not the way you'd find it in the dictionary.
if (($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && !$ENV{'HTTP_REFERER'})
{
open(FILE, "$cloak_dir/$file") ¦¦ die;
local $/ = undef;
print <FILE>;
close FILE;
} else {
# do something else
}
About the strategy... it's very hard to tell if a search engine is among a list of "known IP addresses." They change their IP's a lot, and they have lots of banks of IP's that they don't use much which you probably wouldn't have thought to code for. Traditionally the best method to check for a seach engine was to look at the user agent, e.g. if it's Googlebot or EmailExtractor you know it's a spider.
Bolot
your $IP =~ /^$IPfromdatabase/
match won't work properly...
because a robot IP of 122.122.122.122 would match to 122.122.122.12 from your database.
You might also want to build wildcard recognition into it so you could just put class Cs into your database.
Other wise the above will work fine with class Cs if they are stored truncated like such:
111.222.111.
223.223.22.
64.127.23.
The pattern recognition could be speeded/tightened up a bit by '\' the '.'s. So we could go:
111\.222\.111\..
223\.223\.22\.
64\.127\.23\.
This type of basic IP cloaking is the way to go if you are going to cloak on an NT box with perl. On NT I've found the less external calls the better. In fact I believe you would be better off with a completely self contained script that stored the HTML internally instead of calling up an external file. On a *nix you could get away with a lot of girth. I think the biggest cloaking application I wrote was close to 50K of perl.