Forum Moderators: open
It came without a referrer and first took one miscellaneous page without asking for robots.txt. It then immediately tried to grab two pages with the word "guestbook" in their path... Doh! spider trap. :)
Just thought I give you all a heads-up.
-Lars
bots identifying as Educate Search V16B, Educate Search V24B, Educate Search V36B, Educate Search V38B, Educate Search V4B have been browsing my site over the past days with a very special interest for the guestbook pages. They used different IP addresses (including the 68.5.32.32 reported).
Probably an email harvester.
I've just banned the bots with a
'SetEnvIfNoCase User-Agent "Educate Search" ban' line in .htaccess - just to be on the safe side.
I need to setup my .htaccess file for inclusion/exclusions
I have been meaning to do this for a long time.
I know this is probably the wrong place to ask...
but could Scooter post the exact code in the robots.txt
file he used (if it is different than what he just put in here).
and if somebody could direct me to a list of email harvestor bots they have handy I'd appreciate it.
I'm sure you guys have been getting them too...
Thank you.
Chalupee
<Files .htaccess>
order allow,deny
deny from all
</Files>
order allow,deny
allow from all
deny from 80.201.211.221
deny from 193.165.185.50
deny from 213.35.182.113
deny from 232.80.35.168
SetEnvIfNoCase User-Agent "Indy Library" getout
SetEnvIfNoCase User-Agent "Full Web Bot" getout
SetEnvIfNoCase User-Agent ^.*Demon getout
SetEnvIfNoCase User-Agent ^About getout
SetEnvIfNoCase User-Agent ^Active getout
SetEnvIfNoCase User-Agent ^AnswerChase getout
SetEnvIfNoCase User-Agent ^Ants getout
SetEnvIfNoCase User-Agent ^Atom getout
SetEnvIfNoCase User-Agent ^attach getout
SetEnvIfNoCase User-Agent ^back getout
SetEnvIfNoCase User-Agent ^BatchFTP getout
SetEnvIfNoCase User-Agent ^BlitzBOT getout
SetEnvIfNoCase User-Agent ^bloodhound getout
SetEnvIfNoCase User-Agent ^brain getout
SetEnvIfNoCase User-Agent ^Buddy getout
SetEnvIfNoCase User-Agent ^Cartographer getout
SetEnvIfNoCase User-Agent ^CherryPicker getout
SetEnvIfNoCase User-Agent ^ChinaClaw getout
SetEnvIfNoCase User-Agent ^clickgarden getout
SetEnvIfNoCase User-Agent ^cosmos getout
SetEnvIfNoCase User-Agent ^Crawl_Application getout
SetEnvIfNoCase User-Agent ^Crawler getout
SetEnvIfNoCase User-Agent ^Crescent getout
SetEnvIfNoCase User-Agent HttpClient getout
SetEnvIfNoCase User-Agent ^curl getout
SetEnvIfNoCase User-Agent ^Custo getout
SetEnvIf User-Agent ^DA getout
SetEnvIfNoCase User-Agent ^DaviesBot getout
SetEnvIfNoCase User-Agent ^DISCo getout
SetEnvIfNoCase User-Agent ^DLExpert getout
SetEnvIfNoCase User-Agent ^dnloadmage getout
SetEnvIfNoCase User-Agent ^Drip getout
SetEnvIfNoCase User-Agent ^eCatch getout
SetEnvIfNoCase User-Agent ^Email getout
SetEnvIfNoCase User-Agent "^Express WebPictures" getout
SetEnvIfNoCase User-Agent ^Extractor getout
SetEnvIfNoCase User-Agent ^EyeNetIE getout
SetEnvIfNoCase User-Agent ^FileHound getout
SetEnvIfNoCase User-Agent ^FlashGet getout
SetEnvIfNoCase User-Agent ^flashsite getout
SetEnvIfNoCase User-Agent ^flunky getout
SetEnvIfNoCase User-Agent Frontpage getout
SetEnvIfNoCase User-Agent ^gazz getout
SetEnvIfNoCase User-Agent ^Genie getout
SetEnvIfNoCase User-Agent ^Get getout
SetEnvIfNoCase User-Agent ^Go!Zilla getout
SetEnvIfNoCase User-Agent ^Go-Ahead-Got-It getout
SetEnvIfNoCase User-Agent ^gotit getout
SetEnvIfNoCase User-Agent ^Grafula getout
SetEnvIfNoCase User-Agent ^gues getout
SetEnvIfNoCase User-Agent ^HMVie getout
SetEnvIfNoCase User-Agent ^htdig getout
SetEnvIfNoCase User-Agent ^ia_archiver getout
SetEnvIfNoCase User-Agent ^IBrowse getout
SetEnvIfNoCase User-Agent ^IncyWincy getout
SetEnvIfNoCase User-Agent ^ineta getout
SetEnvIfNoCase User-Agent ^infoGIST getout
SetEnvIfNoCase User-Agent ^InterGET getout
SetEnvIfNoCase User-Agent "^Internet Ninja" getout
SetEnvIfNoCase User-Agent ^IP?Works getout
SetEnvIfNoCase User-Agent ^Iria getout
SetEnvIfNoCase User-Agent ^iseeker getout
SetEnvIfNoCase User-Agent ^Jack getout
SetEnvIfNoCase User-Agent ^Java getout
SetEnvIfNoCase User-Agent ^JetCar getout
SetEnvIfNoCase User-Agent ^JoBo getout
SetEnvIfNoCase User-Agent ^JOC getout
SetEnvIfNoCase User-Agent ^JustView getout
SetEnvIfNoCase User-Agent ^larbin getout
SetEnvIfNoCase User-Agent ^leech getout
SetEnvIfNoCase User-Agent ^LexiBot getout
SetEnvIfNoCase User-Agent ^lftp getout
SetEnvIfNoCase User-Agent ^libW getout
SetEnvIfNoCase User-Agent ^Lifeboat getout
SetEnvIfNoCase User-Agent ^likse getout
SetEnvIfNoCase User-Agent ^Linkbot getout
SetEnvIfNoCase User-Agent "^links sql" getout
SetEnvIfNoCase User-Agent ^LncSoft* getout
SetEnvIfNoCase User-Agent ^Lockstep getout
SetEnvIfNoCase User-Agent ^lwp getout
SetEnvIfNoCase User-Agent ^Magnet getout
SetEnvIfNoCase User-Agent ^MARS getout
SetEnvIfNoCase User-Agent ^Marvin getout
SetEnvIfNoCase User-Agent ^Mass getout
SetEnvIfNoCase User-Agent ^Mata.*Hari.* getout
SetEnvIfNoCase User-Agent ^Memo getout
SetEnvIfNoCase User-Agent ^Microsoft getout
SetEnvIfNoCase User-Agent "^MFC Foundation" getout
SetEnvIfNoCase User-Agent ^MIDown getout
SetEnvIfNoCase User-Agent ^MIIxpc getout
SetEnvIfNoCase User-Agent ^MindSpider getout
SetEnvIfNoCase User-Agent ^Mirror getout
SetEnvIfNoCase User-Agent ^Mister getout
SetEnvIfNoCase User-Agent ^MOT-CF getout
SetEnvIfNoCase User-Agent ^Mozzila/4* getout
SetEnvIfNoCase User-Agent ^ms-catapult getout
SetEnvIfNoCase User-Agent ^msproxy getout
SetEnvIfNoCase User-Agent ^nabot getout
SetEnvIfNoCase User-Agent ^Navman getout
SetEnvIfNoCase User-Agent ^navroad getout
SetEnvIfNoCase User-Agent ^NearSite getout
SetEnvIfNoCase User-Agent ^Net getout
SetEnvIfNoCase User-Agent ^NICErsPRO getout
SetEnvIfNoCase User-Agent ^Nitro getout
SetEnvIfNoCase User-Agent ^oBot getout
SetEnvIfNoCase User-Agent ^Octopus getout
SetEnvIfNoCase User-Agent ^Papa getout
SetEnvIfNoCase User-Agent ^pc getout
SetEnvIfNoCase User-Agent ^PingALink getout
SetEnvIfNoCase User-Agent ^Pockey getout
SetEnvIfNoCase User-Agent ^psbot getout
SetEnvIfNoCase User-Agent ^Pump getout
SetEnvIfNoCase User-Agent ^Recorder getout
SetEnvIfNoCase User-Agent ^ReGet getout
SetEnvIfNoCase User-Agent ^RepoMonke getout
SetEnvIfNoCase User-Agent ^RMA getout
SetEnvIfNoCase User-Agent ^Siphon getout
SetEnvIfNoCase User-Agent ^site getout
SetEnvIfNoCase User-Agent ^SlySearch getout
SetEnvIfNoCase User-Agent ^Smart getout
SetEnvIfNoCase User-Agent ^Snagger getout
SetEnvIfNoCase User-Agent ^Snake getout
SetEnvIfNoCase User-Agent ^SpaceBison getout
SetEnvIfNoCase User-Agent ^Sqworm getout
SetEnvIfNoCase User-Agent ^SuperBot getout
SetEnvIfNoCase User-Agent ^SuperHTTP getout
SetEnvIfNoCase User-Agent ^Surfairy getout
SetEnvIfNoCase User-Agent ^Surfbot getout
SetEnvIfNoCase User-Agent ^suzuran getout
SetEnvIfNoCase User-Agent ^Szukacz getout
SetEnvIfNoCase User-Agent ^tAkeOut getout
SetEnvIfNoCase User-Agent ^Tateji getout
SetEnvIfNoCase User-Agent ^Tcl getout
SetEnvIfNoCase User-Agent ^Telesoft getout
SetEnvIfNoCase User-Agent ^templeton getout
SetEnvIfNoCase User-Agent ^test getout
SetEnvIfNoCase User-Agent ^utopy getout
SetEnvIfNoCase User-Agent ^Vacuum getout
SetEnvIfNoCase User-Agent ^VoidEYE getout
SetEnvIfNoCase User-Agent ^Web getout
SetEnvIfNoCase User-Agent ^Wget getout
SetEnvIfNoCase User-Agent ^Whacker getout
SetEnvIfNoCase User-Agent ^WPF getout
SetEnvIfNoCase User-Agent ^wwwhoosh getout
SetEnvIfNoCase User-Agent ^Xaldon getout
SetEnvIfNoCase User-Agent ^xget getout
SetEnvIfNoCase User-Agent ^ZBot getout
SetEnvIfNoCase User-Agent ^Zeus getout
SetEnvIfNoCase User-Agent Alligator getout
SetEnvIfNoCase User-Agent Bandit getout
SetEnvIfNoCase User-Agent Collector getout
SetEnvIfNoCase User-Agent Copier getout
SetEnvIfNoCase User-Agent Download getout
SetEnvIfNoCase User-Agent GetRight getout
SetEnvIfNoCase User-Agent grab getout
SetEnvIfNoCase User-Agent htmlgobble getout
SetEnvIfNoCase User-Agent HTTrack getout
SetEnvIf User-Agent iCab getout
SetEnvIfNoCase User-Agent MSIECrawler getout
SetEnvIfNoCase User-Agent naviscope getout
SetEnvIfNoCase User-Agent Ninja getout
SetEnvIfNoCase User-Agent Offline getout
SetEnvIfNoCase User-Agent peakjet getout
SetEnvIfNoCase User-Agent prozilla getout
SetEnvIfNoCase User-Agent rapidcache getout
SetEnvIfNoCase User-Agent realdownload getout
SetEnvIfNoCase User-Agent Reaper getout
SetEnvIfNoCase User-Agent robofox getout
SetEnvIfNoCase User-Agent saver getout
SetEnvIfNoCase User-Agent silentsurf getout
SetEnvIfNoCase User-Agent ^spiderbot getout
SetEnvIfNoCase User-Agent ^stamina getout
SetEnvIfNoCase User-Agent Stripper getout
SetEnvIfNoCase User-Agent Sucker getout
SetEnvIfNoCase User-Agent tarspider getout
SetEnvIfNoCase User-Agent Teleport getout
SetEnvIfNoCase User-Agent thumbnavigator getout
SetEnvIfNoCase User-Agent transsoft getout
SetEnvIfNoCase User-Agent udmsearch getout
SetEnvIfNoCase User-Agent utilmind getout
SetEnvIfNoCase User-Agent w3mir getout
SetEnvIfNoCase User-Agent weazel getout
SetEnvIfNoCase User-Agent Widow getout
SetEnvIfNoCase User-Agent www4mail getout
SetEnvIfNoCase User-Agent WWWOFFLE getout
SetEnvIfNoCase User-Agent hloader getout
SetEnvIfNoCase User-Agent WebCapture getout
SetEnvIfNoCase User-Agent EasyDL getout
SetEnvIfNoCase User-Agent dloader getout
SetEnvIfNoCase User-Agent "production bot" getout
SetEnvIfNoCase User-Agent "full web bot" getout
SetEnvIfNoCase User-Agent "demo bot" getout
SetEnvIfNoCase User-Agent TECOMAC getout
SetEnvIfNoCase User-Agent potbot getout
SetEnvIfNoCase User-Agent npbot getout
SetEnvIfNoCase User-Agent turnitinbot getout
SetEnvIfNoCase User-Agent anarchie getout
SetEnvIfNoCase User-Agent "Educate Search" getout
SetEnvIfNoCase Referer iaea\.org getout
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=getout
</Limit>
Options -Indexes
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://216\.239.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://images\.google.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www\.google\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://translate\.google\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://babel\.altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://babelfish\.altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://world\.altavista\.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www\.excite\.co.*$ [NC]
RewriteRule \.(jpg¦JPG)$ [mydomain.com...] [R,L]
I recently posted an example on another section of this board:
[webmasterworld.com...]
I just noticed since last night this guy/gal coming in
rico/1.0
Doing a search here I found the thread that is now close on the topic of rico/1.0 .
Scoping him/her out showes
69.3.78.160 ¦ h-69-3-78-160.dnvtco56.covad.net ¦ 176 ¦ x-- ¦ Covad Communications NETBLK-COVAD-IP-4-NET
Thusly I'm guessing an email harvester also starting up again.
and I figure entering into the list:
SetEnvIfNoCase User-Agent ^rico getout
will take care of the problem.
I would like to just mention that I had to take out the bottom portion of the htaccess with the image rewrite mods... I was getting a total Server Error from any webpage on my site. I am supposed to have rewrite working on my servers but I've never checked it out and that may be the problem.
None the less - Thanks again!
Change the Options line in the code above to
Options -Indexes +FollowSymLinks
Remember that this forum drops the space required between "}" and "!" in the RewriteConds and RewriteRules. If you cut-n-pasted directly, you'll need to add them back in. Example:
RewriteCond %{HTTP_REFERER} [b]SPACE_REQUIRED_HERE[/b] !^http://(www\.)?mydomain.com.*$ [NC]
RewriteRule \.(jp[b]g¦J[/b]PG)$ http://www.mydomain.com/images/replace.gif [R,L]
HTH,
Jim
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteRule .*\.(gif¦jpg)$ [blackhole.com...] [NC,R,L]
As Jim noted above- add the spaces and the pipe symbol as needed.
This one I like a bit better, it is a little simpler, too...
line one will exclude any UA that does not send a referer (just in case)
line two excludes referer that is your site
line three sends any requests for .jpg or .JPG or .gif or .GIF to whereever you want. I have an IP that I use that is a black hole... I send absoolutely NOTHING and hang their site for 20 seconds while they try to get it (I do not want to waste ANY of my bandwidth on this!) You can replace http://www.blackhole.com/ (which is not real, anyway!) with anything you want- a "denied" image or a 1x1 gif, or change [NC,R,L] to [F,L] to send a 403!
dave