Forum Moderators: phranque

Message Too Old, No Replies

Java.

Good ones and bad ones?

         

pendanticist

1:14 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Had a new bot hit me very lightly last night.
Java/1.4.1_02

81.5.***.25 - - [07/Nov/2003:04:30:46 -0800] "GET /robots.txt HTTP/1.1" 200 1524 "-" "Wotbox/alpha0.5.1 (bot'at'wot***.com; http://www.wot***.com) Java/1.4.1_02"
81.5.***.25 - - [07/Nov/2003:04:30:52 -0800] "GET / HTTP/1.1" 200 20624 "-" "Wotbox/alpha0.5.1 (bot'at'wot***.com; http://www.wot***.com) Java/1.4.1_02"
81.5.***.25 - - [07/Nov/2003:04:30:54 -0800] "GET / HTTP/1.1" 200 20624 "-" "Wotbox/alpha0.5.1 (bot'at'wot***.com; http://www.wot***.com) Java/1.4.1_02"

The site speaks of being forced to change their name from Wotbot -to- Wotbox.

I suppose any bot can be mainstreamed, but I'm thinking Java has some bad history?

In my .htaccess it reads:

RewriteCond %{HTTP_USER_AGENT} Java1 [NC,OR]

Somehow that no longer seems adequate.

Thanks.

Pendanticist.

[edited by: jdMorgan at 1:57 pm (utc) on Nov. 7, 2003]
[edit reason] Neutered URLs [/edit]

jdMorgan

2:00 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In this case, though, they have properly identified themselves, allowing you to decide based on their specific behaviour. So, you can ban this by "Wot..." if neccessary.

Java is just a 'generic' user-agent - It's usually just the version of Java used to code the Web interface library used by various 'bots, some good and some bad. As a result, Java is one of those UAs that you have to be careful with -- in some cases allowing it if it comes from a 'good' IP address.

At least these guys are identifying themselves properly, and more thoroughly than half the big-name 'bots out there! I'll withold judgement as long as they obey robots.txt, although the double-fetch of your index.html within two seconds isn't exactly wonderful.

Jim