Forum Moderators: open

Message Too Old, No Replies

VenusCrawler

         

keyplyr

3:48 am on Apr 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




UA: VenusCrawler/Nutch-1.12 (crawler@mycompany.com)
Protocol: HTTP/1.1
Robots.txt: Yes
Host: Private Customer
68.74.116.16 - 68.74.116.23
68.74.116.16/29
Parent: ATT (sbc.com) ISP
68.72.0.0 - 68.78.255.255
68.76.0.0/15, 68.78.0.0/16, 68.72.0.0/14

keyplyr

7:57 pm on Apr 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also coming from other ISP accounts, possibly compromised.

lucy24

8:40 pm on Apr 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup, seen this. Like most Nutch-based robots, it is robots.txt compliant* and sends humanoid headers--meaning, for me, that it pops right in but it's no skin off my nose.

But, er, yeah, that email does look like someone didn't finish customizing the UA string, doesn’t it.

I've seen it from two IPs: 68.74.116.abc and 76.14.26.abc (where each abc is always the same number). The latter is in my notes as Wave Broadband with no further information. Pattern of requests suggests they've been sent to get some specific file each time.


* I must have tested it, at least briefly, because logs show one period when they only requested robots.txt

keyplyr

9:16 pm on Apr 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Like most Nutch-based robots, it is robots.txt compliant
Not in my opinion. I disallow Nutch, yet everyone adding their prefix to the UA ignores it. That is basically why I then block at server.

Nutch was once a purposed bot that was behaved... then it became the starter kit for every CS class at every school on the planet.