Welcome to WebmasterWorld Guest from 54.162.139.217

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Hits from unidentified blog trackers

where are these spiders from?

     
9:51 am on Sep 26, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Aug 22, 2003
posts:333
votes: 0


I have my weblogging software set up to automatically ping three major blog-tracking sites whenever I update. I've noticed that whenever I ping these services, a flock of blog-tracking spiders will descend on my RSS feeds, including Google, Yahoo and MSN. Most of these are good enough to identify themselves in the user-agent.

Some of them just use a generic Java version string, like this one which tried to grab my index page (no robots.txt):
66.96.216.### "Java/1.5.0_06"

Anyone know who this spider belongs to?

3:24 pm on Sept 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5460
votes: 3


NOC is a co-locator that offers rack space.
Could be from anybody including the many cohosts that use NOC.

Regarding the UA, most everybod has it denied.

3:58 pm on Sept 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Yup. It's a decision each of us has to make but I for one block everything that has java anywhere in the ua. They're usually nothing but trouble.

Whitelisting is the key to the future. Start thinking about things you can do to only let in what you want to come in instead of trying to keep things out. :)

8:19 am on Sept 27, 2006 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14663
votes: 99


I honestly don't care what blog trackers I block because they can't figure out how to set a simple user agent string, OH WAHHHHH!

If anyone is seriously upset I'm assuming they'll contact me and tell me their blog reader won't work and then I can contact whoever wrote it and tell them what a useless pound of programming flesh they are along with instructions to fix it.

Until then, "Java/anything" goes BOING! BOING! BOING!

4:04 pm on Sept 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


BOING! BOING! BOING!
That's the name of the new BoingBoing podcast. :)

Your rants are always the best Bill!

On a serious note, I don't even bother with the slash like you did. I suppose if a crawler came around like, for example, Conjavabot, which I just made up, they'd be banned under my rules. So perhaps for those webmasters who prefer a more conservative approach your pattern is a better example than mine.

Then again mine will catch these nasty and persistent little user agents so maybe it's not so bad after all:

Java(TM) 2 Runtime Environment
Java1.1.7
JPluck/2.0.9 (Java 1.4.2_03; Windows XP)