Msg#: 3097217 posted 9:51 am on Sep 26, 2006 (gmt 0)
I have my weblogging software set up to automatically ping three major blog-tracking sites whenever I update. I've noticed that whenever I ping these services, a flock of blog-tracking spiders will descend on my RSS feeds, including Google, Yahoo and MSN. Most of these are good enough to identify themselves in the user-agent.
Some of them just use a generic Java version string, like this one which tried to grab my index page (no robots.txt): 66.96.216.### "Java/1.5.0_06"
Msg#: 3097217 posted 8:19 am on Sep 27, 2006 (gmt 0)
I honestly don't care what blog trackers I block because they can't figure out how to set a simple user agent string, OH WAHHHHH!
If anyone is seriously upset I'm assuming they'll contact me and tell me their blog reader won't work and then I can contact whoever wrote it and tell them what a useless pound of programming flesh they are along with instructions to fix it.
Until then, "Java/anything" goes BOING! BOING! BOING!
On a serious note, I don't even bother with the slash like you did. I suppose if a crawler came around like, for example, Conjavabot, which I just made up, they'd be banned under my rules. So perhaps for those webmasters who prefer a more conservative approach your pattern is a better example than mine.
Then again mine will catch these nasty and persistent little user agents so maybe it's not so bad after all:
Java(TM) 2 Runtime Environment Java1.1.7 JPluck/2.0.9 (Java 1.4.2_03; Windows XP)