Forum Moderators: open

Message Too Old, No Replies

Cowbot 0.1

         

BlueSky

2:28 am on Oct 30, 2003 (gmt 0)

10+ Year Member



There are several posts here on the forums about NaverRobot using UA's like minibot(NaverRobot) and dloader(naverRobot). My site has been visited monthly by the latter with the bot only asking for half a dozen pages each time. Since he always behaved, I allowed him passage.

Today, I got a visit by Cowbot-0.1 who fully identified himself as coming from NHN Corp. According to ZinguGuy here [webmasterworld.com] this company runs the most popular search engine in Korea. The bot checked robots.txt first and then indexed 60 pages at a reasonable rate of one per 6-8 seconds and never went into disallowed areas.

218.145.25.45 - - [29/Oct/2003:19:40:00 -0600] "GET /robots.txt HTTP/1.0" 200 1189 "-" "Cowbot-0.1 (NHN Corp. / 2-3011-1954 / nhnbot@naver.com)"

People mentioned before that the other UA's probably came from this same company. My guess is perhaps they were beta versions being tested out, and the company is now using its production model. Microsoft's beta version bot hasn't been too swift or well-identified either. Or, the others got banned by so many sites that NHN is now giving contact info for people to report their bot if it misbehaves. Obviously, each person has to decide for himself whether to give this bot another chance or not. For me, I'll continue to allow them on as long as he behaves.

pendanticist

6:58 am on Nov 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteCond %{HTTP_USER_AGENT} Cowbot [NC,OR]

Is this the best way to ban this one?

Being a verry bad boy tonight...

Pendanticist.

BlueSky

7:15 am on Nov 4, 2003 (gmt 0)

10+ Year Member



Well, that didn't last long. lolol Yeah, that should work. You can add ^ as an anchor to it, or use unanchored "naver" which will catch the other versions too. I haven't seen them at all since Cowbot emerged. After this one indexing, all he's done is ask for robots.txt.

pendanticist

7:31 am on Nov 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, BlueSky.

('naver' got thru my trap somehow.)

This one...

220.73.165.78 - - [03/Nov/2003:14:44:39 -0800] "GET /robots.txt HTTP/1.0" 200 1524 "-" "Cowbot-0.1 (NHN Corp. / +82-2-3011-1954 / nhnbot@naver.com)"
220.73.165.78 - - [03/Nov/2003:14:44:39 -0800] "GET / HTTP/1.0" 403 480 "-" "Cowbot-0.1 (NHN Corp. / +82-2-3011-1954 / nhnbot@naver.com)"

...brought this one to the site and seems to have waited for him/her to be successful, before leaving. Check out the time stamp.

61.78.61.166 - - [03/Nov/2003:14:44:41 -0800] "GET /OLD-Blahblah.html HTTP/1.1" 404 2847 "-" "Cowbot-0.1 (NHN Corp. / +82-2-3011-1954 / nhnbot@naver.com)"
61.78.61.166 - - [03/Nov/2003:14:44:43 -0800] "GET /Blahblah.html HTTP/1.1" 200 13835 "-" "Cowbot-0.1 (NHN Corp. / +82-2-3011-1954 / nhnbot@naver.com)"

And then 'Grasshopper' went thru my site a bit on the fast side.

Nothing like having your site used as a teaching aide, eh!?!

<chuckle/sigh>

Pendanticist.

BlueSky

8:21 am on Nov 4, 2003 (gmt 0)

10+ Year Member



Yeah, he's going a little fast there. He didn't do that on my site. An unanchored naver should have worked.

RewriteCond %{HTTP_USER_AGENT} naver [NC,OR]

Are you missing any OR's in your list? I did that once, and it took awhile to figure out what the problem was. Since then, I wrote a script to send fake UA's and referers so I can test my htaccess out after making any changes and before uploading it on a production site.