Forum Moderators: open
Some of them just use a generic Java version string, like this one which tried to grab my index page (no robots.txt):
66.96.216.### "Java/1.5.0_06"
Anyone know who this spider belongs to?
Whitelisting is the key to the future. Start thinking about things you can do to only let in what you want to come in instead of trying to keep things out. :)
If anyone is seriously upset I'm assuming they'll contact me and tell me their blog reader won't work and then I can contact whoever wrote it and tell them what a useless pound of programming flesh they are along with instructions to fix it.
Until then, "Java/anything" goes BOING! BOING! BOING!
BOING! BOING! BOING!That's the name of the new BoingBoing podcast. :)
Your rants are always the best Bill!
On a serious note, I don't even bother with the slash like you did. I suppose if a crawler came around like, for example, Conjavabot, which I just made up, they'd be banned under my rules. So perhaps for those webmasters who prefer a more conservative approach your pattern is a better example than mine.
Then again mine will catch these nasty and persistent little user agents so maybe it's not so bad after all:
Java(TM) 2 Runtime Environment
Java1.1.7
JPluck/2.0.9 (Java 1.4.2_03; Windows XP)