Forum Moderators: phranque

Message Too Old, No Replies

Controlling email spam with .htaccess

Powerful but dangerous tool

         

Reno

6:32 pm on Jul 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am posting this as a separate thread (rather than continuing the one below) because I do not pretend to understand Apache server coding, and I am hoping these .htaccess lines will come to the attention of those webmasters who really do grasp how to do this correctly.

There is an outstanding thread that deals with blocking email spambots at: (Perl, PHP, and Python Scripting)
[webmasterworld.com...]

After reading through all that, I combined the various lists into the one you see below. It seems like there are duplicates, but I left them in, because my sorting software made them separate. If any can be deleted, please indicate.

Mostly, I want to make sure this is done right. In that thread, I especially took note of Bird's cautionary advice (messages 29 + 36):
---------------------------------------
"...putting up incorrect rewriting rules may do your site more harm than putting up correct ones will help it..."

"I would strongly advise against using any mod_rewrite rules just because someone (including myself) told you they worked in a certain way. This thing is a swiss army knife on steroids and with nuclear power... What works on one site may render the next one inaccessible. You don't need to understand all its quirks or every regular expression. Just make sure that you really understand those you use."
---------------------------------------

In my current .htaccess file, the only line I have is this:
ErrorDocument 404 [mydomainname.com...]

If I decide to use the spam blocks, would I keep that current line on top?

And, would those of you who are highly evolved webmasters look at this long list of spambots, and also look at the code, and offer any advice about improvement? This strikes me as one way to control those insidious beasts, but as Bird implies, do it right or don't do it at all...

Thanks for all the help

==================================


ErrorDocument 404 http://www.mydomainname.com/404page.html

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\Image\Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ http://www.site-where-you-want-to-send-the-bot.com [L,R]

Brett_Tabke

11:07 pm on Jul 6, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Thanks. Mine:
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^URLSpiderPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetLinkAgent [OR]
RewriteCond %{HTTP_USER_AGENT} ^PingALink [OR]
RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline*Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^CICC [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} .*almaden.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper/2.09 [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} ^JBH*Agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector

Key_Master

11:35 pm on Jul 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Error documents are usually best placed at the end of the .htaccess file.

Here's my no mod_rewrite example:

SetEnvIf User-Agent ^([A-Z]+)$ ban
SetEnvIfNoCase User-Agent " obot" ban
SetEnvIfNoCase User-Agent "^dual proxy" ban
SetEnvIfNoCase User-Agent "^ie " ban
SetEnvIfNoCase User-Agent "^internet explorer [0-9]\.[0-9]$" ban
SetEnvIfNoCase User-Agent "^mozilla/[0-9]\.[0-9] \(compatible; msie [0-9]\.[0-9]\)$" ban
SetEnvIfNoCase User-Agent "indy library" ban
SetEnvIfNoCase User-Agent "newt activex" ban
SetEnvIfNoCase User-Agent ^([a-z]+)pro$ ban
SetEnvIfNoCase User-Agent ^([a-z]¦[0-9])$ ban
SetEnvIfNoCase User-Agent ^[a-z]surf ban
SetEnvIfNoCase User-Agent ^aspseek ban
SetEnvIfNoCase User-Agent ^bmclient ban
SetEnvIfNoCase User-Agent ^bumblebee ban
SetEnvIfNoCase User-Agent ^collage ban
SetEnvIfNoCase User-Agent ^crescent ban
SetEnvIfNoCase User-Agent ^deepindexer ban
SetEnvIfNoCase User-Agent ^diagem ban
SetEnvIfNoCase User-Agent ^download ban
SetEnvIfNoCase User-Agent ^easydl ban
SetEnvIfNoCase User-Agent ^email ban
SetEnvIfNoCase User-Agent ^hloader ban
SetEnvIfNoCase User-Agent ^html ban
SetEnvIfNoCase User-Agent ^http ban
SetEnvIfNoCase User-Agent ^informant ban
SetEnvIfNoCase User-Agent ^larbin ban
SetEnvIfNoCase User-Agent ^lexibot ban
SetEnvIfNoCase User-Agent ^link ban
SetEnvIfNoCase User-Agent ^lotus ban
SetEnvIfNoCase User-Agent ^lwp-trivial ban
SetEnvIfNoCase User-Agent ^microsoft ban
SetEnvIfNoCase User-Agent ^mozilla$ ban
SetEnvIfNoCase User-Agent ^msproxy ban
SetEnvIfNoCase User-Agent ^multithreaddb ban
SetEnvIfNoCase User-Agent ^nationaldirectory ban
SetEnvIfNoCase User-Agent ^netmechanic ban
SetEnvIfNoCase User-Agent ^pingalink ban
SetEnvIfNoCase User-Agent ^proxy ban
SetEnvIfNoCase User-Agent ^replacer ban
SetEnvIfNoCase User-Agent ^site ban
SetEnvIfNoCase User-Agent ^slysearch ban
SetEnvIfNoCase User-Agent ^surfcontrol ban
SetEnvIfNoCase User-Agent ^teleport ban
SetEnvIfNoCase User-Agent ^wget ban
SetEnvIfNoCase User-Agent ^xenu ban
SetEnvIfNoCase User-Agent ^zeus ban
SetEnvIfNoCase User-Agent fetch ban
SetEnvIfNoCase User-Agent frontpage ban
SetEnvIfNoCase User-Agent grub\.org ban
SetEnvIfNoCase User-Agent msiecrawler ban
SetEnvIfNoCase User-Agent strip ban
SetEnvIfNoCase User-Agent superbot ban
SetEnvIfNoCase User-Agent turingos ban
SetEnvIfNoCase User-Agent vagabondo ban
SetEnvIfNoCase User-Agent visibilitygap ban
SetEnvIfNoCase User-Agent watcher ban
SetEnvIfNoCase User-Agent web(capture¦clipping¦collage¦copier) ban
SetEnvIfNoCase User-Agent whizbang ban

# Prevent offsite image/video/script links
SetEnvIf Referer ^$ local
SetEnvIfNoCase Referer ^(http://www\.¦http://)mydomain\.com local

# Deny visitors using false or spammy referring URLs
SetEnvIfNoCase Referer \.iaea\.org ban
SetEnvIfNoCase Referer ^file:// ban
SetEnvIfNoCase Referer netfactual\.com ban

# Ban Formmail requests
SetEnvIfNoCase Request_URI formmail ban

# Ban .htaccess & .htpasswd requests
SetEnvIfNoCase Request_URI \.ht(access¦passwd)$ ban

<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>

<Files ~ "\.(gif¦jpg¦css¦js)$">
order allow,deny
allow from env=local
deny from env=ban
</Files>