Forum Moderators: open

Message Too Old, No Replies

Educate Search V4B

possible email harvester

         

WitchLars

1:28 am on Mar 3, 2003 (gmt 0)

10+ Year Member



A visitor with the user agent "Educate Search V4B" hit my site from ip68-5-32-32.oc.oc.cox.net (68.5.32.32).

It came without a referrer and first took one miscellaneous page without asking for robots.txt. It then immediately tried to grab two pages with the word "guestbook" in their path... Doh! spider trap. :)

Just thought I give you all a heads-up.

-Lars

dropoffx

4:34 pm on Mar 3, 2003 (gmt 0)

10+ Year Member



I have seen this over the last few days. Educate Search V4B, V16B, VDemoB, V34B all from different IP addresses. Seems to only bother with guestbook pages. Most likely looking for email addresses.

I used mod_rewrite to block asccess to them.

David

carfac

12:29 am on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



David:

Please clarify... is that:

Educate Search V4B, V16B, VDemoB, V34B all _seperate_ UA's...
or are they:

Educate Search V4B, Educate Search V16B, Educate Search VDemoB, Educate Search V34B?

Thanks!

dave

Scooter24

1:28 pm on Mar 5, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi,

bots identifying as Educate Search V16B, Educate Search V24B, Educate Search V36B, Educate Search V38B, Educate Search V4B have been browsing my site over the past days with a very special interest for the guestbook pages. They used different IP addresses (including the 68.5.32.32 reported).

Probably an email harvester.

I've just banned the bots with a
'SetEnvIfNoCase User-Agent "Educate Search" ban' line in .htaccess - just to be on the safe side.

carfac

3:47 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Scooter:

Thanks for clearing that up- (am I just a bit dense?)

dave

Chalupee

11:50 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Also coming in as Educate Search VDemoB looking
for a addguest.html file .

I need to setup my .htaccess file for inclusion/exclusions
I have been meaning to do this for a long time.
I know this is probably the wrong place to ask...
but could Scooter post the exact code in the robots.txt
file he used (if it is different than what he just put in here).
and if somebody could direct me to a list of email harvestor bots they have handy I'd appreciate it.
I'm sure you guys have been getting them too...

Thank you.
Chalupee

Scooter24

5:25 pm on Mar 6, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



The robots.txt won't do much to stop an email harvester. The .htaccess file is more effective, although it won't stop an email harvester or other bad bot registering as Mozilla-something. Here's my .htaccess:
(domain name substituted with mydomain.com)
The deny from IP block is used for a bad bot trap. At the bottom of the .htaccess there is some code which prevents hotlinking to images, allowing it however from Google and other search engines:

<Files .htaccess>
order allow,deny
deny from all
</Files>

order allow,deny
allow from all
deny from 80.201.211.221
deny from 193.165.185.50
deny from 213.35.182.113
deny from 232.80.35.168

SetEnvIfNoCase User-Agent "Indy Library" getout
SetEnvIfNoCase User-Agent "Full Web Bot" getout
SetEnvIfNoCase User-Agent ^.*Demon getout
SetEnvIfNoCase User-Agent ^About getout
SetEnvIfNoCase User-Agent ^Active getout
SetEnvIfNoCase User-Agent ^AnswerChase getout
SetEnvIfNoCase User-Agent ^Ants getout
SetEnvIfNoCase User-Agent ^Atom getout
SetEnvIfNoCase User-Agent ^attach getout
SetEnvIfNoCase User-Agent ^back getout
SetEnvIfNoCase User-Agent ^BatchFTP getout
SetEnvIfNoCase User-Agent ^BlitzBOT getout
SetEnvIfNoCase User-Agent ^bloodhound getout
SetEnvIfNoCase User-Agent ^brain getout
SetEnvIfNoCase User-Agent ^Buddy getout
SetEnvIfNoCase User-Agent ^Cartographer getout
SetEnvIfNoCase User-Agent ^CherryPicker getout
SetEnvIfNoCase User-Agent ^ChinaClaw getout
SetEnvIfNoCase User-Agent ^clickgarden getout
SetEnvIfNoCase User-Agent ^cosmos getout
SetEnvIfNoCase User-Agent ^Crawl_Application getout
SetEnvIfNoCase User-Agent ^Crawler getout
SetEnvIfNoCase User-Agent ^Crescent getout
SetEnvIfNoCase User-Agent HttpClient getout
SetEnvIfNoCase User-Agent ^curl getout
SetEnvIfNoCase User-Agent ^Custo getout
SetEnvIf User-Agent ^DA getout
SetEnvIfNoCase User-Agent ^DaviesBot getout
SetEnvIfNoCase User-Agent ^DISCo getout
SetEnvIfNoCase User-Agent ^DLExpert getout
SetEnvIfNoCase User-Agent ^dnloadmage getout
SetEnvIfNoCase User-Agent ^Drip getout
SetEnvIfNoCase User-Agent ^eCatch getout
SetEnvIfNoCase User-Agent ^Email getout
SetEnvIfNoCase User-Agent "^Express WebPictures" getout
SetEnvIfNoCase User-Agent ^Extractor getout
SetEnvIfNoCase User-Agent ^EyeNetIE getout
SetEnvIfNoCase User-Agent ^FileHound getout
SetEnvIfNoCase User-Agent ^FlashGet getout
SetEnvIfNoCase User-Agent ^flashsite getout
SetEnvIfNoCase User-Agent ^flunky getout
SetEnvIfNoCase User-Agent Frontpage getout
SetEnvIfNoCase User-Agent ^gazz getout
SetEnvIfNoCase User-Agent ^Genie getout
SetEnvIfNoCase User-Agent ^Get getout
SetEnvIfNoCase User-Agent ^Go!Zilla getout
SetEnvIfNoCase User-Agent ^Go-Ahead-Got-It getout
SetEnvIfNoCase User-Agent ^gotit getout
SetEnvIfNoCase User-Agent ^Grafula getout
SetEnvIfNoCase User-Agent ^gues getout
SetEnvIfNoCase User-Agent ^HMVie getout
SetEnvIfNoCase User-Agent ^htdig getout
SetEnvIfNoCase User-Agent ^ia_archiver getout
SetEnvIfNoCase User-Agent ^IBrowse getout
SetEnvIfNoCase User-Agent ^IncyWincy getout
SetEnvIfNoCase User-Agent ^ineta getout
SetEnvIfNoCase User-Agent ^infoGIST getout
SetEnvIfNoCase User-Agent ^InterGET getout
SetEnvIfNoCase User-Agent "^Internet Ninja" getout
SetEnvIfNoCase User-Agent ^IP?Works getout
SetEnvIfNoCase User-Agent ^Iria getout
SetEnvIfNoCase User-Agent ^iseeker getout
SetEnvIfNoCase User-Agent ^Jack getout
SetEnvIfNoCase User-Agent ^Java getout
SetEnvIfNoCase User-Agent ^JetCar getout
SetEnvIfNoCase User-Agent ^JoBo getout
SetEnvIfNoCase User-Agent ^JOC getout
SetEnvIfNoCase User-Agent ^JustView getout
SetEnvIfNoCase User-Agent ^larbin getout
SetEnvIfNoCase User-Agent ^leech getout
SetEnvIfNoCase User-Agent ^LexiBot getout
SetEnvIfNoCase User-Agent ^lftp getout
SetEnvIfNoCase User-Agent ^libW getout
SetEnvIfNoCase User-Agent ^Lifeboat getout
SetEnvIfNoCase User-Agent ^likse getout
SetEnvIfNoCase User-Agent ^Linkbot getout
SetEnvIfNoCase User-Agent "^links sql" getout
SetEnvIfNoCase User-Agent ^LncSoft* getout
SetEnvIfNoCase User-Agent ^Lockstep getout
SetEnvIfNoCase User-Agent ^lwp getout
SetEnvIfNoCase User-Agent ^Magnet getout
SetEnvIfNoCase User-Agent ^MARS getout
SetEnvIfNoCase User-Agent ^Marvin getout
SetEnvIfNoCase User-Agent ^Mass getout
SetEnvIfNoCase User-Agent ^Mata.*Hari.* getout
SetEnvIfNoCase User-Agent ^Memo getout
SetEnvIfNoCase User-Agent ^Microsoft getout
SetEnvIfNoCase User-Agent "^MFC Foundation" getout
SetEnvIfNoCase User-Agent ^MIDown getout
SetEnvIfNoCase User-Agent ^MIIxpc getout
SetEnvIfNoCase User-Agent ^MindSpider getout
SetEnvIfNoCase User-Agent ^Mirror getout
SetEnvIfNoCase User-Agent ^Mister getout
SetEnvIfNoCase User-Agent ^MOT-CF getout
SetEnvIfNoCase User-Agent ^Mozzila/4* getout
SetEnvIfNoCase User-Agent ^ms-catapult getout
SetEnvIfNoCase User-Agent ^msproxy getout
SetEnvIfNoCase User-Agent ^nabot getout
SetEnvIfNoCase User-Agent ^Navman getout
SetEnvIfNoCase User-Agent ^navroad getout
SetEnvIfNoCase User-Agent ^NearSite getout
SetEnvIfNoCase User-Agent ^Net getout
SetEnvIfNoCase User-Agent ^NICErsPRO getout
SetEnvIfNoCase User-Agent ^Nitro getout
SetEnvIfNoCase User-Agent ^oBot getout
SetEnvIfNoCase User-Agent ^Octopus getout
SetEnvIfNoCase User-Agent ^Papa getout
SetEnvIfNoCase User-Agent ^pc getout
SetEnvIfNoCase User-Agent ^PingALink getout
SetEnvIfNoCase User-Agent ^Pockey getout
SetEnvIfNoCase User-Agent ^psbot getout
SetEnvIfNoCase User-Agent ^Pump getout
SetEnvIfNoCase User-Agent ^Recorder getout
SetEnvIfNoCase User-Agent ^ReGet getout
SetEnvIfNoCase User-Agent ^RepoMonke getout
SetEnvIfNoCase User-Agent ^RMA getout
SetEnvIfNoCase User-Agent ^Siphon getout
SetEnvIfNoCase User-Agent ^site getout
SetEnvIfNoCase User-Agent ^SlySearch getout
SetEnvIfNoCase User-Agent ^Smart getout
SetEnvIfNoCase User-Agent ^Snagger getout
SetEnvIfNoCase User-Agent ^Snake getout
SetEnvIfNoCase User-Agent ^SpaceBison getout
SetEnvIfNoCase User-Agent ^Sqworm getout
SetEnvIfNoCase User-Agent ^SuperBot getout
SetEnvIfNoCase User-Agent ^SuperHTTP getout
SetEnvIfNoCase User-Agent ^Surfairy getout
SetEnvIfNoCase User-Agent ^Surfbot getout
SetEnvIfNoCase User-Agent ^suzuran getout
SetEnvIfNoCase User-Agent ^Szukacz getout
SetEnvIfNoCase User-Agent ^tAkeOut getout
SetEnvIfNoCase User-Agent ^Tateji getout
SetEnvIfNoCase User-Agent ^Tcl getout
SetEnvIfNoCase User-Agent ^Telesoft getout
SetEnvIfNoCase User-Agent ^templeton getout
SetEnvIfNoCase User-Agent ^test getout
SetEnvIfNoCase User-Agent ^utopy getout
SetEnvIfNoCase User-Agent ^Vacuum getout
SetEnvIfNoCase User-Agent ^VoidEYE getout
SetEnvIfNoCase User-Agent ^Web getout
SetEnvIfNoCase User-Agent ^Wget getout
SetEnvIfNoCase User-Agent ^Whacker getout
SetEnvIfNoCase User-Agent ^WPF getout
SetEnvIfNoCase User-Agent ^wwwhoosh getout
SetEnvIfNoCase User-Agent ^Xaldon getout
SetEnvIfNoCase User-Agent ^xget getout
SetEnvIfNoCase User-Agent ^ZBot getout
SetEnvIfNoCase User-Agent ^Zeus getout
SetEnvIfNoCase User-Agent Alligator getout
SetEnvIfNoCase User-Agent Bandit getout
SetEnvIfNoCase User-Agent Collector getout
SetEnvIfNoCase User-Agent Copier getout
SetEnvIfNoCase User-Agent Download getout
SetEnvIfNoCase User-Agent GetRight getout
SetEnvIfNoCase User-Agent grab getout
SetEnvIfNoCase User-Agent htmlgobble getout
SetEnvIfNoCase User-Agent HTTrack getout
SetEnvIf User-Agent iCab getout
SetEnvIfNoCase User-Agent MSIECrawler getout
SetEnvIfNoCase User-Agent naviscope getout
SetEnvIfNoCase User-Agent Ninja getout
SetEnvIfNoCase User-Agent Offline getout
SetEnvIfNoCase User-Agent peakjet getout
SetEnvIfNoCase User-Agent prozilla getout
SetEnvIfNoCase User-Agent rapidcache getout
SetEnvIfNoCase User-Agent realdownload getout
SetEnvIfNoCase User-Agent Reaper getout
SetEnvIfNoCase User-Agent robofox getout
SetEnvIfNoCase User-Agent saver getout
SetEnvIfNoCase User-Agent silentsurf getout
SetEnvIfNoCase User-Agent ^spiderbot getout
SetEnvIfNoCase User-Agent ^stamina getout
SetEnvIfNoCase User-Agent Stripper getout
SetEnvIfNoCase User-Agent Sucker getout
SetEnvIfNoCase User-Agent tarspider getout
SetEnvIfNoCase User-Agent Teleport getout
SetEnvIfNoCase User-Agent thumbnavigator getout
SetEnvIfNoCase User-Agent transsoft getout
SetEnvIfNoCase User-Agent udmsearch getout
SetEnvIfNoCase User-Agent utilmind getout
SetEnvIfNoCase User-Agent w3mir getout
SetEnvIfNoCase User-Agent weazel getout
SetEnvIfNoCase User-Agent Widow getout
SetEnvIfNoCase User-Agent www4mail getout
SetEnvIfNoCase User-Agent WWWOFFLE getout
SetEnvIfNoCase User-Agent hloader getout
SetEnvIfNoCase User-Agent WebCapture getout
SetEnvIfNoCase User-Agent EasyDL getout
SetEnvIfNoCase User-Agent dloader getout
SetEnvIfNoCase User-Agent "production bot" getout
SetEnvIfNoCase User-Agent "full web bot" getout
SetEnvIfNoCase User-Agent "demo bot" getout
SetEnvIfNoCase User-Agent TECOMAC getout
SetEnvIfNoCase User-Agent potbot getout
SetEnvIfNoCase User-Agent npbot getout
SetEnvIfNoCase User-Agent turnitinbot getout
SetEnvIfNoCase User-Agent anarchie getout
SetEnvIfNoCase User-Agent "Educate Search" getout
SetEnvIfNoCase Referer iaea\.org getout

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=getout
</Limit>

Options -Indexes

RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://216\.239.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://images\.google.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www\.google\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://translate\.google\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://babel\.altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://babelfish\.altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://world\.altavista\.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www\.excite\.co.*$ [NC]
RewriteRule \.(jpg¦JPG)$ [mydomain.com...] [R,L]

weesnich

6:31 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



The anti-Hotlink-Part can be written using SetEnvIf as well.

I recently posted an example on another section of this board:
[webmasterworld.com...]

Chalupee

7:35 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



thanks Scooter24 ;-)

I just noticed since last night this guy/gal coming in

rico/1.0

Doing a search here I found the thread that is now close on the topic of rico/1.0 .
Scoping him/her out showes
69.3.78.160 ¦ h-69-3-78-160.dnvtco56.covad.net ¦ 176 ¦ x-- ¦ Covad Communications NETBLK-COVAD-IP-4-NET

Thusly I'm guessing an email harvester also starting up again.

and I figure entering into the list:

SetEnvIfNoCase User-Agent ^rico getout

will take care of the problem.

I would like to just mention that I had to take out the bottom portion of the htaccess with the image rewrite mods... I was getting a total Server Error from any webpage on my site. I am supposed to have rewrite working on my servers but I've never checked it out and that may be the problem.

None the less - Thanks again!

WebMistress

4:31 am on Apr 1, 2003 (gmt 0)

10+ Year Member



oooh...I just did a search on google to find out what this educate search thing was...it only hits my page with a guestbook also...interesting. It's gonna be disappointed in mine since it's a sample guestbook page with only private@private.com for all the addresses...hahaha

wilderness

4:53 am on Apr 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>it only hits my page with a guestbook</snip>

Actually NO!
Nary a guest book for me and it visted last week.
I promptly added it to my never-ending list.

On a side note: I actually allowed a major backone, which I've had denied for some time. As a test :-)

jdMorgan

5:08 am on Apr 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chalupee,

Change the Options line in the code above to


Options -Indexes +FollowSymLinks

and see if that fixes your problem. Your host may not have SymLinks enabled by default, and mod_rewrite requires it.

Remember that this forum drops the space required between "}" and "!" in the RewriteConds and RewriteRules. If you cut-n-pasted directly, you'll need to add them back in. Example:


RewriteCond %{HTTP_REFERER} [b]SPACE_REQUIRED_HERE[/b] !^http://(www\.)?mydomain.com.*$ [NC]

And in the following line, you must also replace the broken vertical pipe "¦" character with a solid vertical pipe character.

RewriteRule \.(jp[b]g¦J[/b]PG)$ http://www.mydomain.com/images/replace.gif [R,L]

HTH,
Jim

carfac

5:44 pm on Apr 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is the rewrite I use.... in httpd.conf:

RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteRule .*\.(gif¦jpg)$ [blackhole.com...] [NC,R,L]

As Jim noted above- add the spaces and the pipe symbol as needed.

This one I like a bit better, it is a little simpler, too...

line one will exclude any UA that does not send a referer (just in case)

line two excludes referer that is your site
line three sends any requests for .jpg or .JPG or .gif or .GIF to whereever you want. I have an IP that I use that is a black hole... I send absoolutely NOTHING and hang their site for 20 seconds while they try to get it (I do not want to waste ANY of my bandwidth on this!) You can replace http://www.blackhole.com/ (which is not real, anyway!) with anything you want- a "denied" image or a 1x1 gif, or change [NC,R,L] to [F,L] to send a 403!

dave

Chalupee

6:51 pm on Apr 1, 2003 (gmt 0)

10+ Year Member



Thanks a bunch group! The fixes Jim mentioned, did
the trick... it was probably with the "pipe" and spacing
mentioned, and or the options tag... I fixed them all at
the same time so not sure which one worked....
I use hostway... and although they have been great with
many issues, the response
from them is usually "we do not support third party scripts..."
when it comes to these problems.
I hate to take up message threads with a thank you, but felt
it necessarry for the extra time many of you have spent on
this topic and I hope it also helps others too...
Thanks a bunch group!
Rob
getting fat from chalupees at Taco Bell.

ChiJazzMan

5:10 pm on Apr 22, 2003 (gmt 0)

10+ Year Member



BTW, blackhole.com is an actual ISP host. I just redirect harvesters and people trying to hack my computer (i.e. looking for things like owssvr.dll) to www.cert.org.
(I figure they can respond much more effectively to hacks).