Forum Moderators: coopster & phranque

Message Too Old, No Replies

developing a cloaking system

nailing down the logic of ip delivering pages

         

jeremy goodrich

9:42 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Couldn't decide if this was in cloaking or this, but since I'm asking about the perl programming aspect here it goes:

Perl script that checks if the ip is in a spider datbase:

if ($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/)

is the only check variable I have, before delivering the page to the spider...

what I am thinking of doing is something like this:

if ($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && ($ENV{'HTTP_REFERRER'} =~ //)
{
open(FILE, "$cloak_dir/$file");
my(@lines) = <FILE>;
close(FILE);
my($line);
foreach $line (@lines)
{
print "$line";
}
exit;

The ip checking is working out fine, but I'm trying to think of other variables to check to make the script more intelligent. Would this checking of the HTTP_REFERRER variable work? Is the syntax right?

theperlyking

10:22 pm on Jun 28, 2001 (gmt 0)

10+ Year Member



I'm not sure if that works as intended - you can just put

if ($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && ($ENV{'HTTP_REFERRER'})

instead of

if (($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && ($ENV{'HTTP_REFERRER'} =~ //))

also for

foreach $line (@lines)
{
print "$line";
}
exit;

you can just use

print @lines;

Are you planning to use a script to cloak your entire site?

jeremy goodrich

11:56 pm on Jun 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just want to make sure that I am cloaking for anything that leaves an HTTP_REFERRER of nothing combined with known spider IP addresses. I've been working on a database of 9 months of data, it's coming along. Together with that, we'll see if anything gets cloaked or not. It's more for research and programming knowledge than anything, call it academic if you will.

The important part to me is the blank referrer variable, because to my knowledge only alexa fakes it's user agent. In case an employee of a search engine, for example, is trolling the pages, I want to give them the correct page, and not the cloaked version.

So that example you gave, TPK, will give out the info only to somebody from the IP range in a database, and a blank referrer? Thanks for jumping in so quickly.

theperlyking

12:14 am on Jun 29, 2001 (gmt 0)

10+ Year Member




will give out the info only to somebody from the IP range in a database, and a blank referrer?

I think that would be
if (($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && (! $ENV{'HTTP_REFERRER'}))

Bolotomus

10:48 am on Jul 11, 2001 (gmt 0)

10+ Year Member



Since we're going to get into a contest to see who can print the file the easiest in Perl I'll throw in my two bits ;)

Also note the correct spelling of 'referrer' in terms of environment codes is REFERER and not the way you'd find it in the dictionary.

if (($ENV{'REMOTE_ADDR'} =~ /^$ipaddress/) && !$ENV{'HTTP_REFERER'})
{
open(FILE, "$cloak_dir/$file") ¦¦ die;
local $/ = undef;
print <FILE>;
close FILE;
} else {
# do something else
}

About the strategy... it's very hard to tell if a search engine is among a list of "known IP addresses." They change their IP's a lot, and they have lots of banks of IP's that they don't use much which you probably wouldn't have thought to code for. Traditionally the best method to check for a seach engine was to look at the user agent, e.g. if it's Googlebot or EmailExtractor you know it's a spider.

Bolot

volatilegx

5:10 pm on Jul 12, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



be careful when pattern matching the IP to the IP in your database...

your $IP =~ /^$IPfromdatabase/

match won't work properly...

because a robot IP of 122.122.122.122 would match to 122.122.122.12 from your database.

You might also want to build wildcard recognition into it so you could just put class Cs into your database.

littleman

12:43 am on Nov 8, 2001 (gmt 0)



If you wanted to keep it basic AND match specific IPs you could just slap a '$' into the script:
$IP =~ /^$IPfromdatabase$/

Other wise the above will work fine with class Cs if they are stored truncated like such:
111.222.111.
223.223.22.
64.127.23.

The pattern recognition could be speeded/tightened up a bit by '\' the '.'s. So we could go:
111\.222\.111\..
223\.223\.22\.
64\.127\.23\.

This type of basic IP cloaking is the way to go if you are going to cloak on an NT box with perl. On NT I've found the less external calls the better. In fact I believe you would be better off with a completely self contained script that stored the HTML internally instead of calling up an external file. On a *nix you could get away with a lot of girth. I think the biggest cloaking application I wrote was close to 50K of perl.