Forum Moderators: coopster & phranque

Message Too Old, No Replies

Global Sniffing

         

toolman

6:46 pm on Nov 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a site that is sitting on top and for legal reasons the company can no longer use the domain name.

I'm wondering if there is a method to global sniff for browser requests to present them a "site shut down" page while at the same time allowing spiders free reign so maybe I can get a few more months of pulling up my buddys from below.

Key_Master

10:09 pm on Nov 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



allowbots.pl

#!/usr/local/bin/perl

$agent = $ENV{'HTTP_USER_AGENT'};
@robot = ("googlebot","scooter","gulliver"); # Robots Party List
foreach $allow (@robot) {
if (lc($agent) =~ /$allow/) {
$allow = 1;
}
}
if ($allow == 1) {
print "Pragma: no-cache\n";
print "Location: [send_spiders_here.com\n\n";...]
}
else {
print "Pragma: no-cache\n";
print "Location: [send_browsers_here.com\n\n";...]
}
exit;

Here's something I threw together which you could use to replace the index page. It's not global though. That would have to do be done through .htaccess, if I understand your question correctly.

ggrot

10:21 pm on Nov 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is also redirecting spiders. Yeah, what you need to use is really .htaccess files.

littleman

10:39 pm on Nov 25, 2001 (gmt 0)



Perhaps think of it as browser exclusion instead of bot inclusion. Maybe deny all that has Mozilla + (referer, keep-alive, or gzip encoding), that will get virtually all the bots through, but exclude all but the most esoteric browsers.

toolman

10:59 pm on Nov 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Key_Master.
Little you know really all I'm worried about is google so banning by ua shouldn't be a problem. Is there a regex that would drop all but googlebot?

littleman

11:03 pm on Nov 25, 2001 (gmt 0)



.htaccess or perl? Also, google is playing a bit of ua games, so you better do IP and ua.

toolman

11:07 pm on Nov 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>>google is playing a bit of ua games

Oh yeah? I've not been on the cutting edge for a month or so. I do have a list of ip and ua's though. Could it be done globally from .htaccess? I have mod_rewrite too.

littleman

11:47 pm on Nov 25, 2001 (gmt 0)



>.htaccess

SetEnvIf User-Agent "googlebot" googlebot=1
Order Allow,Deny
Allow from [google class C here]
Allow from [another google class C here]
Allow from env=googlebot

That will work, you could use truncated IPs to work with class Cs like this:
Allow from 222.111.222

You know they are reading this right?

rcjordan

11:53 pm on Nov 25, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>You know they are reading this right?

We're redirecting them, LM. Google gets served this thread as toolman's recipe for holiday egg nog. ;)

toolman

1:24 am on Nov 26, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OOOps. I guess I better not do this at toolman.com ;)