Forum Moderators: open

Message Too Old, No Replies

Jakarta HTTP Client/1.0

         

keyplyr

12:08 am on Feb 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any evidence whether this UA is used for scraping? I see it coming from many IP addresses, some corporate and some ISPs, hard to tell it's purpose other than link checking.

wilderness

1:56 am on Feb 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's tons of old threads on this in the archives
[google.com...]

This seems to be the most relavant

[jakarta.apache.org...]

jdMorgan

2:17 am on Feb 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don's slackin' tonight... :)

It's not a browser.
It doesn't declare itself as a "useful" 'bot.

We hope you enjoy these most delicious 403's, Mr. Jakarta, sir...

Jim

keyplyr

3:14 am on Feb 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This seems to be the most relavant

[jakarta.apache.org...]
Thanks Don, I know what it is having read that page many times over the years.

I know of one software company that uses it to check link validity in it's directory for school libraries, so I've allowed it in the past. Lately it came from an IP address assigned to IBM, taking a couple dozen pages as a bot would.

wilderness

4:01 pm on Feb 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IP address assigned to IBM

keyplr,
"IBM" there's an "icon" that potentially sends shivers of conspiracy to anyone over the age of 40 ;)

For the longest time they were crawling from that range in NJ with a 3rd party purpose to support unknown (at least to webmasters) clients.
For the most part their reason for crawling remains a mystery today.

Don