homepage Welcome to WebmasterWorld Guest from 54.242.18.232
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Yahoo / Yahoo Publisher Contextual Advertising Network
Forum Library, Charter, Moderator: open

Yahoo Publisher Contextual Advertising Network Forum

  posting off  
Perl CGI scripts/techniques to filter foreign visitors
my home-brewed solution...
berto




msg:1577465
 6:25 pm on Feb 17, 2006 (gmt 0)

Here is my approach to filtering foreign visitors. Make of it what you will.

STEP 1:

Visit the MaxMind website and download their MaxMind GeoLite Country Database (CSV format).

STEP 2:

Run the following Perl script, suitably modified to your (local path) situation:

[darn! sure enough, the WW forum software removed all indentation and special formatting. oh, well...]

findus.pl:


#!/usr/bin/perl

open(MAXMIND, "GeoIPCountryWhois.csv");
while (<MAXMIND>) {
if (/^\"(\d+)\.(\d+)\.\d+\.\d+\",\"(\d+)\.(\d+)\.\d+\.\d+\",\"\d+\",\"\d+\",\"(\w+)\",\"([^\"]+)\"/) {
$neta1 = $1;
$netb1 = $2;
$neta2 = $3;
$netb2 = $4;
$cntry = $5;
$country = $6;
#printf "%s.%s %s.%s %s %s\n", $neta1, $netb1, $neta2, $netb2, $cntry, $country;
if (($cntry =~ /us/i) && ($country =~ /united states/i)) {
for ($na=$neta1; $na<=$neta2; $na++) {
if (($na == $neta1) && ($neta1 == $neta2)) {
for ($nb=$netb1; $nb<=$netb2; $nb++) {
$usnet{$na . "." . $nb} = 1;
}
} elsif ($na == $neta2) {
for ($nb=0; $nb<=$netb2; $nb++) {
$usnet{$na . "." . $nb} = 1;
}
} else {
for ($nb=0; $nb<=255; $nb++) {
$usnet{$na . "." . $nb} = 1;
}
}
}
} else {
for ($na=$neta1; $na<=$neta2; $na++) {
if (($na == $neta1) && ($neta1 == $neta2)) {
for ($nb=$netb1; $nb<=$netb2; $nb++) {
$nonusnet{$na . "." . $nb} = 1;
}
} elsif ($na == $neta2) {
for ($nb=0; $nb<=$netb2; $nb++) {
$nonusnet{$na . "." . $nb} = 1;
}
} else {
for ($nb=0; $nb<=255; $nb++) {
$nonusnet{$na . "." . $nb} = 1;
}
}
}
}
} else {
#printf "BAD LINE: %s\n", $_;
}
}
close(MAXMIND);

foreach $net (keys %usnet) {
if (! $nonusnet{$net}) {
printf "%s\n", $net;
}
}

exit 0;

This script will output something like:


9.132
206.42
147.2
130.222
167.84
28.89
16.161
7.98
68.1
...

Save this output to a file, e.g., /var/www/cgi-bin/us.dat (or wherever you keep you cgi-bin scripts).

STEP 3:

Create these two Perl scripts in your cgi-bin directory:

/var/www/cgi-bin/isus.pl:


sub isus {
my $isus = 0;
my $user = $ENV{REMOTE_ADDR};

open(US, "/var/www/cgi-bin/us.dat");
while (<US>) {
chomp;
$us{$_}++;
}
close(US);
if ($user =~ /^(\d+\.\d+)\./) {
$nanb = $1;
if ($us{$nanb}) {
$isus = 1;
}
}
return($isus);
}

1;

/var/www/cgi-bin/ypn.pl:


#!/usr/bin/perl

$cgidir = "/var/www/cgi-bin";

require "$cgidir/isus.pl";

if (&isus()) {
$input = "$cgidir/affad.pl $ARGV[0]";
} else {
$input = "$cgidir/affad.pl $ARGV[1]";
}

open(AFFAD, "$input ");
@AFFAD = <AFFAD>;
close(AFFAD);
$affad = join('', @AFFAD);
print $affad;
exit;

STEP 4:

You must create a third script, affad.pl. When invoked as

affad.pl ysw

it outputs the JavaScript code for a YPN wide skyscraper. When invoked as

affad.pl gsw

it outputs the JavaScript code for an Adsense wide skyscraper. Add additional cases (for other YPN and Adsense ad types) as necessary.

(Actually, my version of affad.pl is a very elaborate script that outputs code for YPN, Adsense, commercial affiliates, and other things besides.)

STEP 5:

Enable SSI on your web server, if you haven't already done so. (Sorry, you'll have to figure that one out by yourself.)

STEP 6:

In your HTML code, add a directive like:


<!--#exec cmd="/var/www/cgi-bin/ypn.pl ysw gsw" -->

STEP 7:

To test the code, visit the page where you added the "exec cmd" directive. Hopefully, you will see a YPN wide skyscraper (or whatever ad you specified as the first argument to the ypn.pl program). This assumes you are in the U.S., and the first two numbers of your network address fall within the us.dat database. Then, temporarily delete your network address from the us.dat file, and do a page refresh. You should now see an Adsense wide skyscraper (or whatever). Remember to undo your previous edit, restoring your network address to the us.dat database.

STEP 8:

Start adding the SSI directive to other pages on your website(s).

STEP 9:

Start earning YPN revenue (hopefully).

NOTES:

--The findus.pl program, and the resulting us.dat file, includes class B networks assigned entirely to the U.S. (according to MaxMind). It excludes class B networks assigned partly to the U.S. and partly to foreign countries. Yes, you will lose a considerable number of U.S. visitors this way. But it plays safe, follows the KISS principle, and speeds up execution (see below). Modify the code as necessary to deal with mixed-location class B networks, but this complicates things, increases the likelihood of false negatives, increases the size of your us.dat file, and slows down ad serving. (Using the isus.pl function, I wrote a separate script to analyze my webserver logs from the past two months. It shows that one of my sites has ~30+% U.S. visitors, and the other ~35+% U.S. visitors. The actual U.S. visitor counts/percentages are greater than this, for reasons stated above, but not by much. Given the nature of my sites, it seems entirely plausible that most of my visitor traffic is "foreign".

--I have verified the correctness of the us.dat file by manually inspecting a hundred or more randomly selected cases, comparing the us.dat entries (or omissions) against the source GeoLight Country Database. Sort the us.dat file if you wish, but that has no bearing on its operation.

--I have verified operation of the ypn.pl and isus.pl tandem many, many times, by removing (then restoring) my network address(es) as described above.

--I was surprised by how fast this all executes, even on my underppowered web server (1.8 GHz Intel Celeron, 512 MB RAM, Apache, SUSE Linux 9.3). I expected to see a noticeable delay for ads appearing, but instead they show in a split second. Fearing that my web server was not up to the task of doing real-time, on-the-fly processing in this way--this is the biggest reason I had refrained from attempting foreign visitor filtering before now.

This all works for me. Your Mileage May Vary. I make no claims, offer no warranties in your case. Make of this as you wish. And, please, I will not respond to any "stickies" asking for technical support. I will only clarify issues in this WW forum as needed.

 

freeflight2




msg:1577466
 8:07 pm on Feb 17, 2006 (gmt 0)

just use MAXMIND'S mod_geoip, then check that the environment variable COUNTRY_CODE=='US'

resolves 10k+ IPs/sec

berto




msg:1577467
 3:12 pm on Feb 21, 2006 (gmt 0)

just use MAXMIND'S mod_geoip

I might do that someday, if speed and extreme accuracy ever become issues.

On the other hand, home brewing this was a fun coding exercise, and I like flexibility and having complete control, tinkering and customizing for my own needs.

I just did a quick analysis of the GeoLite Country Database. Considering only the first two address numbers, the breakdown is:

--U.S. networks, 20,346
--non U.S. networks, 14,462
--mixed (U.S. & non U.S.) networks, 2,374

With my approach, I get a false negative (is this a U.S. visitor?) ~6% (or less) of the time. (Given how I do it, there should be no false positives.) That is to say, my quick-and-dirty solution gets it right (at least) ~94% of the time. Importantly, the chance is near zero that I will incorrectly identify a foreign visitor as a U.S. visitor (then serve him/her YPN ads).

So, for now, for me (Your Mileage Does Vary), my solution is accurate enough, fast enough--good enough.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Yahoo Publisher Contextual Advertising Network
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved