Welcome to WebmasterWorld Guest from 54.159.214.27

Forum Moderators: Ocean10000 & incrediBILL

Hits from unidentified blog trackers

where are these spiders from?

   
9:51 am on Sep 26, 2006 (gmt 0)

10+ Year Member



I have my weblogging software set up to automatically ping three major blog-tracking sites whenever I update. I've noticed that whenever I ping these services, a flock of blog-tracking spiders will descend on my RSS feeds, including Google, Yahoo and MSN. Most of these are good enough to identify themselves in the user-agent.

Some of them just use a generic Java version string, like this one which tried to grab my index page (no robots.txt):
66.96.216.### "Java/1.5.0_06"

Anyone know who this spider belongs to?

3:24 pm on Sep 26, 2006 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



NOC is a co-locator that offers rack space.
Could be from anybody including the many cohosts that use NOC.

Regarding the UA, most everybod has it denied.

3:58 pm on Sep 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yup. It's a decision each of us has to make but I for one block everything that has java anywhere in the ua. They're usually nothing but trouble.

Whitelisting is the key to the future. Start thinking about things you can do to only let in what you want to come in instead of trying to keep things out. :)

8:19 am on Sep 27, 2006 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I honestly don't care what blog trackers I block because they can't figure out how to set a simple user agent string, OH WAHHHHH!

If anyone is seriously upset I'm assuming they'll contact me and tell me their blog reader won't work and then I can contact whoever wrote it and tell them what a useless pound of programming flesh they are along with instructions to fix it.

Until then, "Java/anything" goes BOING! BOING! BOING!

4:04 pm on Sep 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BOING! BOING! BOING!
That's the name of the new BoingBoing podcast. :)

Your rants are always the best Bill!

On a serious note, I don't even bother with the slash like you did. I suppose if a crawler came around like, for example, Conjavabot, which I just made up, they'd be banned under my rules. So perhaps for those webmasters who prefer a more conservative approach your pattern is a better example than mine.

Then again mine will catch these nasty and persistent little user agents so maybe it's not so bad after all:

Java(TM) 2 Runtime Environment
Java1.1.7
JPluck/2.0.9 (Java 1.4.2_03; Windows XP)

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month