Forum Moderators: phranque

Message Too Old, No Replies

Did this bot change the user agent name to get around my htaccess

         

snooprock

1:50 pm on Sep 14, 2006 (gmt 0)

10+ Year Member



I was hoping someone could help me to understand what is going on with this robot. I have blocked all forms of the Java bot via htaccess file, but it looks like this bot started out as the Java and it was sucessfully blocked, then when it noticed it was forbidden it went to an unknown user agent or something and it proceeded to grab every single page of mine. I am a little unclear if I should be blocking this host via ip deny manager or even if this bot is malicious. I have to believe it is since it started out as Java, then was smart enough to find a way around the 403. Thanks in advance for any insight on what is going on here.

/robots.txt
Http Code: 403 Date: Sep 14 02:51:40 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: Java/1.6.0-beta2
¦
¦
¦

/
Http Code: 200 Date: Sep 14 02:51:40 Http Version: HTTP/1.1 Size in Bytes: 8369
Referer: -
Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)
¦
¦
¦

/sitemap.html
Http Code: 200 Date: Sep 14 02:51:41 Http Version: HTTP/1.1 Size in Bytes: 8423
Referer: -
Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)
¦
¦
¦

/quality-sites-directories.html
Http Code: 200 Date: Sep 14 02:51:42 Http Version: HTTP/1.1 Size in Bytes: 4980
Referer: -
Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)

[edited by: jatar_k at 5:39 pm (utc) on Sep. 14, 2006]
[edit reason] no specific IPs thanks [/edit]

jdMorgan

8:38 pm on Sep 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



assuming that all requests came from the same IP address, and that the accesses made when using the browser user-agent looked like a robot (fast page fetches, usually without fetching CSS, external JS, or images), then I'd say that yes, it switched user-agents and came back cloaked as a browser.

key_master's bad-bot PERL script (or one of the several modified/enhanced versions) and the runaway 'bot PHP script by xlcus/alexk, both posted on WebmasterWorld, might help you avoid this kind of problem in the future.

Jim