Forum Moderators: open

Message Too Old, No Replies

Daumoa, Korean bot

disobedient and weird UA

         

Mokita

7:45 pm on Apr 18, 2009 (gmt 0)

10+ Year Member


User Agent: Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; +http://ws.daum.net/aboutWebSearch.html) Daumoa/2.0

Came from 222.231.64.nnn. Requested robots.txt and promptly ignored it by also taking the index page.

Daum has been discussed here previously when using their "Edacious & Intelligent Web Robot" - they seem to specialise in weird UAs.

http://www.webmasterworld.com/search_engine_spiders/3100397.htm
http://www.webmasterworld.com/search_engine_spiders/3154391.htm

Samizdata

11:20 pm on Apr 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Requested robots.txt and promptly ignored it by also taking the index page

There seems to be a fairly widespread interpretation of the Robots Exclusion Protocol that considers taking the root/home/index page acceptable in all circumstances on the grounds that it does not constitute "crawling".

Disallow: /

I believe that Google and the other majors treat the above as an instruction to not take anything at all, but I could be wrong - in the unlikely event that I would not want them to take the index page I would add a "noindex" META tag.

Others such as Daum would get a 403 from me.

...

GaryK

3:38 am on Apr 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've seen so much crap from them over the years that I just look for Daum and 403 them.

These are just from 2009:

EDI/1.6.5 (Edacious & Intelligent Web Robot, Daum Communications Corp.)
EDI/1.6.6 (Edacious & Intelligent Web Robot, Daum Communications Corp.)
Mozilla/4.0 (compatible; EDI/1.6.6; Edacious & Intelligent Web Robot; Daum Communications Corp., Korea)
Mozilla/4.0 (compatible; MSIE enviable; DAUMOA 2.0; DAUM Web Robot; Daum Communications Corp., Korea; [ws.daum.net...]
Mozilla/4.0 (compatible; MSIE is not me; DAUMOA/1.0.0; DAUM Web Robot; Daum Communications Corp., Korea)
Mozilla/4.0 (compatible; MSIE is not me; DAUMOA/1.0.1; DAUM Web Robot; Daum Communications Corp., Korea)
Mozilla/4.0 (compatible; MSIE is not me; EDI/1.6.6; Edacious & Intelligent Web Robot; Daum Communications Corp., Korea)
Mozilla/5.0 (compatible; Firefox compatible; MS IE compatible; [search.daum.net...] Daumoa-feedfetcher/2.0
Mozilla/5.0 (compatible; Firefox compatible; MS IE compatible; not on Windows server; [cs.daum.net...] Daumoa-feedfetcher/2.0
Mozilla/5.0 (compatible; Firefox or MSIE mutant; not on Windows server; [ws.daum.net...] Daumoa/2.0
Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; [ws.daum.net...] Daumoa-feedfetcher/2.0
Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; [ws.daum.net...] Daumoa/2.0
Talkro Web-Shot/1.0 (E-mail: webshot@daumsoft.com, Home: [222.122.15.nnn...]

Mokita

3:52 am on Apr 19, 2009 (gmt 0)

10+ Year Member



I'll be 403ing it from now on.

I guess the reason I haven't seen them previously, is I have wide ranges of Korean (and Chinese etc) IPs blocked. The range this belongs to is smallish and I hadn't seen a need to block it till now.

222.231.0.0/18