Forum Moderators: open

Message Too Old, No Replies

Zelig/0.4 alpha2

Yet a new bot?

         

FineWare

1:04 am on Jan 30, 2003 (gmt 0)

10+ Year Member



Received this one today from 216.95.210.2. Grabbed the index and went on it's way.

pendanticist

11:22 pm on Jan 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to keep a close eye on your access_log files on these kind of hits. It's been my experience (more times than not) those who do this 'hit-and-run' robots.txt tactic will return and harvest everything you have and do it very, very fast.

Pendanticist.

FineWare

3:44 am on Feb 1, 2003 (gmt 0)

10+ Year Member



Yup. But not anymore if it has Zelig as an agent. I have a list of well over 200 agents. If they try to grab anything but a robots.txt, they get this interesting html message:

"Access Denied

We're sorry. The software you are using to access our website is not allowed. Some examples of this are e-mail harvesting programs, web crawlers, spiders, and programs that will copy websites to your hard drive. If you feel you have received this message in error, please send an e-mail addressed to the webmaster. Your IP Address has been logged. Prepare to be Slashdotted..."

jdMorgan

4:26 am on Feb 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Fineware,

I like that approach, and use a similar one. All you'll get on my sites is:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Language" content="en-US">
<title>Access Denied</title>
</head>
<body>
<h1>Access denied</h1>
<!--#exec cmd="sleep 5" -->
<p>Please click <a href=/403i.html">here</a> for more information.
<body>
</html>

The purpose of this minimal text is to minimize the size of the custom 403 document so that the really dumb 'bots don't suck too much bandwidth if they're too stupid to take a hint - and many are. The SSI sleep command slows 'em down a little so they don't hammer the server despite the small file size.

If the visitors follow the link, they get an explanation like yours, just in case a curious-but-not-malicious visitor gets trapped. Very few 'bots ever follow that link.

Jim