Forum Moderators: open

Message Too Old, No Replies

Unwanted Spiders - how to limit/kill access

simple ways to code, spiders list etc.

         

lak12

2:08 am on Jun 1, 2002 (gmt 0)

10+ Year Member



Hi Everybody!
I have this little issue (not a problem yet). I have two new sites opened and one day got about 2000 hits within 20 minutes from generic spider looking for /system32/root.exe file.
They were looking for WinNT file and I am on FreeBSD. Sure enough they hit my 404s'.

It's not a problem, but I hate to give away my bandwidth. I'd better serve users or good known spiders than totally stupid spiders.

So I wrote a small addition to missing.cgi in Perl (I handle all errors there).
One hit to get any M$ Windows file - I write an IP log.
Next time program sees a log - it adds a line to .htaccess file: deny from $ip
Problem solved.

Once a day I just replace .htaccess file with a fresh version with no "deny from.." lines...

My question is: Is there any place I can get the idents of the spiders that are "no good" ones? Spiders that just collect e-mail addresses, just take up traffic etc.
I have 27 domains to maintain and hate to give away resources for nothing. I have a content to show and wish to keep it that way.

Any thoughts?
Mark.

wilderness

5:07 am on Jun 1, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this thread will get you started
[webmasterworld.com...]

jdMorgan

7:56 pm on Jun 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mark,

You stated:
"So I wrote a small addition to missing.cgi in Perl (I handle all errors there).
One hit to get any M$ Windows file - I write an IP log.
Next time program sees a log - it adds a line to .htaccess file: deny from $ip
Problem solved.

Once a day I just replace .htaccess file with a fresh version with no "deny
from.." lines... "

Just wondering - Why do you clear your deny list out? Once these e-mail harvesters make a pass through your site, it's really too late. However, if you keep your deny lists active, you can use deny entries from one victim site to block access on your other sites *before* they get harvested. That will at least reduce the spam that results...

I've received so much spam recently that my policy is, "once banned, banned forever." This makes for a pretty long list of banned agents and IPs, but with a little creative use of Regular Expression pattern-matching, the list can be compressed down to manageable size.

Maybe I misunderstood your meaning, but I'm wondering why you'd want to clear your deny list.

Thanks,
Jim