Forum Moderators: phranque

Message Too Old, No Replies

ultimate ip deny list for email spam harvesters

looking to compile one

         

madeonmoon

2:30 pm on Oct 12, 2004 (gmt 0)

10+ Year Member



hello all

every once in a while i look through my log file and see that some email harvesters are trying to collect email off my site to be used for spamming later.

so i go into my ip deny manager and add the ip there one by one (i have about 10 in there so far). however, is there an existing (and more comprehensive) list of ips that i can use as a reference and to borrow from?

thanks a lot!
james

Span

8:57 pm on Oct 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



madeonmoon,
if your site is on an Apache server and modRewrite is enabled I really would use an .htaccess and block email harvesters by user agent instead of by ip.

Sanenet

9:04 pm on Oct 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Trouble with using user_agent is that some dastardly bots impersonate real user agents.

Best way is the two pronged attack - ban by user_agent (there are list arounds, search this forum) and then setup a bot trap to capture and ban the naughty.

Saltminer

10:41 pm on Oct 12, 2004 (gmt 0)

10+ Year Member



I prefer to encode the email address in some fashion so they can't read it, rather than trying to keep up with all the harvesters. Easier to do and no maintenance. Plus you're running the risk of occasionally blocking a regular visitor to your site.

You can convert the text to character codes, or use a little javascript to do the job.

Jimmy

g1smd

3:44 am on Oct 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some bots decode URL-encoded stuff already.

Writing the link out using Javascript to build the HTML code fragments is best; or use an email form (but make sure the backend script is secure [Google for "Open Relay"]).

idoc

12:27 am on Oct 14, 2004 (gmt 0)

10+ Year Member



I also use the javascript encoded email addresses but also regularly ban trash bots and scraper ip's on a daily basis. I think in general the shared ban list is not a bad idea, though there are bound to be differences of opinion as to what constitutes a bannable offense. Myself, I tend to paint the ripe and apnic address ranges with a pretty broad brush because of the type of sites I keep I can do that. I also ban the content scraper i.p.'s and find that easier than keeping up with user agents even for those that obey robots.txt files. Of course, I even have recently taken to banning cidr blocks of many of the cheap web hosts and cheap server colocation facilities as well because of content scrapers that run from these ip's. The biggest of the rest of the nuisances are probably the home users with cable modem and dsl that run scrapers. You probably will benefit from a spider trap that modifies htaccess for those.