Welcome to WebmasterWorld Guest from 54.227.110.209

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

New bot Java/1.5.0_06 grabs all pages

grabbed all pages from 2 different domains

     

privacyman

12:58 am on Feb 3, 2006 (gmt 0)

10+ Year Member



Managing my own domains plus several domains which are independent of my own does give an advantage for spotting new, questionable, or bad bots.

Recently I found my site and several other (isolated and independent) domains had the same log entries for IP number and user agent. For each site this bot grabbed every page from each site.

My research of the IP and its group (and provider) revealed what I consider "concealed identity" wherein the registrar did not give owner names and lookup by address did not give any company name or individual name. I also went to the domain name(S) associated with the IP and it had a Flash page with no alternative content (I deliberately have Flash uninstalled... never use it for many reasons).

Because of "lack of information" on the owner of the IP cidr group (provider of service to the bots IP) and no reverse dns on the individual IP and not much else, plus with it grabbing all pages, I blocked the entire cidr group plus the user agent.

The IP number was 69.85.234.27 and UserAgent was Java/1.5.0_06

For the UserAgent, G and other SE's showed it was a plugin for some browsers.

The cidr range 69.85.192.0/18 of 69.85.192.0 - 69.85.255.255 belongs to
slfiber.com in Alabama. Search of G by address shows Harbor Communications LLC in Mobile AL and where I have found that a huge amount of spam originates from southern states I would sooner block the entire group (16k). I could be a valid new SE but I did not submit to them and would sooner protect my site and those I manage.

Every page grabbed from multiple independent domains is NOT right.

Just a heads-up to watch for the IP and UA.

thetrasher

4:38 pm on Feb 3, 2006 (gmt 0)

10+ Year Member



An FTP server replies on HTTP-Requests?! No Flash.

"M4cub3x (c) FTP Server (Version 6.5/OpenBSD) server ID."

Do you need visits from a server?

wilderness

5:26 pm on Feb 3, 2006 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Most everybody and every UA deny list complied includes deinal to JAVA in all forms.

You may skip the Flash intro at their site and it reads the following:

"we deliver cutting edge services to carriers, business and goverment entities."

Not a mention of private internet services.
A sub page under "services" also offers co-location.

As a result, I agree with your decision to deny the entire range.

RewriteCond %{REMOTE_ADDR} ^69\.85\.(19[2-9]¦2[0-5][0-9])\. [OR]

jdMorgan

5:01 pm on Feb 5, 2006 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



... As well as adding

# Block Java and Python URLlib except from Google and Yahoo
RewriteCond %{HTTP_USER_AGENT} ^(Python[-.]?urllib¦Java/?[1-9]\.[0-9]) [NC]
RewriteCond %{REMOTE_ADDR}!^207\.126\.2(2[4-9]¦3[0-9])\.
RewriteCond %{REMOTE_ADDR}!^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteRule .* - [F]

so your sites can't get raided again by Java- or Python-based scrapers.

(Note that IP rangess may need some expansion/updating - I haven't checked this in a while.)

Jim

pocpocpoc

7:04 am on Feb 9, 2006 (gmt 0)

5+ Year Member



I've also encountered this bot. It first arrived on January 22, and my server gave it the 403 treatment. It keeps coming back from different IP's, presumably because of the 403.

I've logged 130 visits now, from 92 unique IP addresses, all with different user agents that look like Java versions.

I saw it from 69.85.234.38 (close to your .27 signting) on January 29. Most of the source IP's have generic or missing reverse DNS. A few are servers, all of which so far appear to be running Windows.

Yesterday, I started giving this bot a 301 to a nonexistent site. We'll see if that has any effect.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month