Forum Moderators: goodroi
I have a "spider trap" on my site, a file that is excluded in the robots.txt, but crawlers that ignore the robots.txt will run... and if run, it logs their IP, and bans them...
So, look at this:
216.239.33.5 - - [07/Sep/2002:06:18:24 -0600] "GET / HTTP/1.0" 200 12614 "-" "SIE-C3I/3.0 UP/4.1.16m (Google WAP Proxy/1.0)"
216.239.33.5 - - [07/Sep/2002:06:19:12 -0600] "GET /secret_spider-trap.cgi HTTP/1.0" 200 152 "-" "SIE-C3I/3.0 UP/4.1.16m (Google WAP Proxy/1.0)"
That is all it got, but it was enough to ban him!
Should I unban this IP, contact Google, anything like that?
dave
[google.com...]
Probably not the place for perl questions, so sorry if this is inappropriate...
I wrote this:
$visitor_ua = $ENV{'HTTP_USER_AGENT'};
if ($visitor_ua =~ 'WAP') {
print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Forward On</title>\n"; print "</head>\n";
print "<body>\n";
print "<p><b>Please <A HREF="http://www.mydomain.com/">Click Here</A> to continue!</b></p>\n";
print "</body>\n";
print "</html>\n";
exit;
}
else {
CODE
}
and inserted above the logging part of the trap... look good?
dave
[edited by: carfac at 5:56 pm (utc) on Sep. 7, 2002]
Perfect- thanks!
I am a bit shaky on Regular expressions, can you tell me if this is correct to match all those numbers:
if ($visitor_ua =~ '(216.239.33.5¦216.239.35.4¦216.239.37.5¦216.239.39.5)' { code blah blah
Sorry- I get mixed up sometimes whether to us single or double quotes, or the ^ anchor...
Thank you!
Dave