Welcome to WebmasterWorld Guest from 54.196.42.8

Forum Moderators: Ocean10000 & keyplyr

zgrab

Researchscan

     
7:42 pm on Mar 16, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12334
votes: 805



UA: Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13a; +http://researchscan.comsys.rwth-aachen.de)
Protocol: HTTP/1.1
Robots.txt: No
Host: researchscan19.comsys.rwth-aachen.de
Parent: RWTH Aachen University
137.226.0.0 - 137.226.255.255
137.226.0.0/16

CS dept project
2:45 pm on May 23, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts:444
votes: 35


UA: Mozilla/5.0 zgrab/0.x
Protocol: HTTP/1.1
Robots.txt: No
Host: researchscan351.eecs.umich.edu
Parent: University of Michigan College of Engineering
141.212.122.0 - 141.212.122.255
CIDR: 141.212.122.0/24

Did an interesting referral using my site's ip address. Did not use my domain name for a referral. It also included a port number.


- - -

[edited by: keyplyr at 12:55 am (utc) on Aug 12, 2018]
[edit reason] removed active links [/edit]

5:50 pm on May 23, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15028
votes: 665


In addition to the 137 and 141 ranges given above, I've also seen them (just in the past year-plus) from
130.211.63.abc (Google Cloud)
63.251.232.abc (Internap)
13.57.10.111 (Amazon Australia)
But the most common by far is 137.226.113.26-28 [sic]. I guess that means it's one of those robots everybody and their brother can use.
7:35 pm on Aug 11, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15028
votes: 665


Looking specifically at “Researchscan” and not the “zgrab” name element:

This robot drew itself to my notice because it showed up in logs of a site that has only just gone HTTPS, with visits before I instituted a redirect.

UA: Mozilla/5.0 zgrab/0.x (compatible; Researchscan/foo-twiddle; +http://researchscan.comsys.rwth-aachen.de)
IP: 137.226.113.abc
robots.txt: no
First seen (by me): 27 January 2018
Protocol: HTTPS ONLY
where “foo-twiddle” * can be any of:
t12ca
t12l
t12sns
t13l
t13rl
This would be a nifty naming convention in a computer-science class where each student's robot had to have some unique identifier while being overall the same, but that doesn’t seem to apply here.

I took the time to look up “first seen”--delving into older, compressed logs--because I wondered if they would show up five minutes after my test site first went HTTPS. They didn’t; the research project seems to have started in late January 2018.

Can I request that my server be excluded?
To have your host or network excluded from future scans conducted by RWTH Aachen University, please contact researchscan@comsys.rwth-aachen.de with your IP address or CIDR block. Alternatively, you can configure your firewall to drop traffic from the subnet we use for scanning: 137.226.113.0/26
Or, Option C, you could instruct your robot to recognize the established mechanism by which a site conveys the request “Do not crawl here”. Or, Option D, I could continue blocking you on header grounds without having to take any action at all. (Is this one of those ventures where a 403 response actually conveys just as much information as a 200? Possibly.)

The sad part is that what they're studying may really be a valid field of useful inquiry. Just, y'know, be polite about it.


* With hastily added hyphen after I found myself reanalyzing, or mis-analyzing, the name. Oops.

- - -

[edited by: keyplyr at 12:58 am (utc) on Aug 12, 2018]
[edit reason] splice clean-up [/edit]

1:06 am on Aug 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12334
votes: 805


This bot hits all the sites I watch dozens of times a day, all receiving 403s.

This would be a nifty naming convention in a computer-science class where each student's robot had to have some unique identifier while being overall the same, but that doesn’t seem to apply here.
I think that's exactly what it is, but there appears to be several schools participating in the project.