Forum Moderators: open

Message Too Old, No Replies

Java/1.x.y

Bot or not?

         

WebJoe

6:21 am on Oct 3, 2003 (gmt 0)

10+ Year Member



I asked for help about identifying [webmasterworld.com] this user agent string before.
I assumed that not many replies where posted because of my bad choice for title, so here we go again:

This time the "visitor" did not visit a off-limits page (by robots.txt), but posted a form twice with no data at all.
The IP, again, belongs to a big Swiss ISP, but not the same as last time.


inetnum: 80.218.0.0 - 80.218.107.255
netname: CABLECOM-MAIN-NET
descr: Cablecom GmbH
descr: Zuerich
country: CH

I assume that "Java.1.4.1_04" and the like are more like offline browsing/web-site grabbing tools than bots, and stupid ones at that too (why would one post a form...to have the results page available offline?)

So I checked the logs and noticed, that it grabbed pretty much every page, even ones that are just targets of forms and a couple pages that don't exist. Ergo, it gets banned.

Romeo

8:58 am on Oct 3, 2003 (gmt 0)

10+ Year Member



I have seen this Swiss Java-bot also several times, the last ones came from tourbilon38.212.98.46.104.adslpremium.ch and dclient80-218-79-184.hispeed.ch

They try to follow every link, but they look on pure text pages only and don't fetch any <img src=...>.
This looks more like a private bot than an offline browsing/web-site grabbing tool.
Before I could think about banning it, it already had banned itself by finding my bot-trap ... which easily happens, if a rude bot thinks it can ignore (in this case not even reading it) any robots.txt ... hehe

Regards,
R.

bull

9:49 am on Oct 3, 2003 (gmt 0)

10+ Year Member



older discussion at
[webmasterworld.com...]

WebJoe

3:34 pm on Oct 3, 2003 (gmt 0)

10+ Year Member



tx bull for that. A site-search turned mostly up posts about Java inside webdesign or JavaScript.

Forgot to check Google...

I banned it anyway, don't like "empty posters"

@romeo: same here too, only the pages but no graphics.

dcrombie

4:22 pm on Oct 25, 2003 (gmt 0)



It's hitting a couple of my sites - getting the home page of one and two pages off another one. The requests on the latter are on the hour, five hours apart, which suggests an automated system. The other site is hit at more random times. Same pattern repeated daily for the last week (haven't checked further back).

I'm not blocking it (yet) but would like to know the source. IP addresses are 217.13.27.234 and 216.52.235.101.

claus

4:29 pm on Oct 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The Java UA can be anything; good, innocent, bad, nasty. Bot and not.

It's "something" made with this tool: http*//java.sun.com/j2se/1.4.1/download.html

AFAIK, it doesn't even need to do the same thing, even if it has the same version number.

/claus

WebJoe

9:58 am on Oct 26, 2003 (gmt 0)

10+ Year Member



Tx for pointing that out claus. I figured that if "Java" appears in the UA it just means that someone used that library to create something to access the web, be it a bot or a browser. It's like "libwww", "Microsoft URL Control" and "Indy Library".