Forum Moderators: DixonJones

Message Too Old, No Replies

Java/1.5.0_04

What the heck is it.?

         

webdude

3:14 pm on Dec 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since it is fairly difficult to search WebamsterWorld right now, I couldn't find any reference to this, so sorry if this has come up before.

I've been getting slammed periodically by this on one of my higher traffic sites. It comes in and hits every page. Hits them pretty fast too. It appears to be coming in on random IPs several times a day. Every check has shown US IPs so I am not sure I want to block them. Does anyone know what this is? Rogue bot or something else. It seems to be slurping my site fairly hard and I am just trying to nail this down.

Thanks!

Edouard_H

3:40 pm on Dec 5, 2005 (gmt 0)

10+ Year Member



Message #6 in this thread [webmasterworld.com] pretty much sums up the Java user agent. I've seen both Google & Yahoo access a site with a variety of version numbers so usually ban on a case by case basis depending on behavior.

webdude

3:54 pm on Dec 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you say a "case by case basis depending on behavior," I am not sure I follow you. I get hit with this about 3 times a day, all from different IPs. A lookup shows different providers for these, usually from the US. It appears that it runs through the whole site, top to bottom. What is a concern is the site has a forum and the database gets hit pretty hard when this happens. There is a referral on the links you posted of a bot trap. Any chance of pointing me in the right direction for something like this? I know Brett has been having troubles with rogue bots lately and I am wondering if anything can be done.

Another thing, do you think it possible that the IPs could be spoofed on something like this? I would hate to just blindly start blocking IPs unless I knew for sure where they were coming from.

Thanks for any help!

jdMorgan

4:28 pm on Dec 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Behaviour:
> it runs through the whole site, top to bottom. What is a concern is the site has a forum and the database gets hit pretty hard when this happens.

Is that good? Do you "like" this? Are there any signs that you get referrals to your site or any other benefit as a result of this?

I suspect the answer is, "No, No, and No." So if this behaviour is unwelcome, block it.

As to "What is this?", the only way to tell is to analyze the behaviour it shows. It's probably either an amateur 'bot, a site scraper, an e-mail address harvester, or a data miner, implemented using Java library functions.

The easiest way to block it (on Apache) would be to use mod_access and mod_setenvif, or mod_rewrite, testing the {HTTP_USER_AGENT} server variable, and rejecting requests from User-agents beginning with "Java/".

For a more automated approach, see these bad-bot Scripts on WebmasterWorld:
Modified "bad-bot" script blocks site downloads [webmasterworld.com]
Blocking Badly Behaved Bots #3 [webmasterworld.com]

These scripts work in different ways, and can be used in combination if desired.

A similar approach could be used with IIS.

Jim

webdude

4:47 pm on Dec 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks jdMorgan. Very informative. I am on IIS using ASP. I there more specific instructions for this platform in the forum somewhere?