Welcome to WebmasterWorld Guest from 54.92.160.119

Forum Moderators: Ocean10000 & keyplyr

zgrab

Researchscan

     
7:42 pm on Mar 16, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12771
votes: 874



UA: Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13a; +http://researchscan.comsys.rwth-aachen.de)
Protocol: HTTP/1.1
Robots.txt: No
Host: researchscan19.comsys.rwth-aachen.de
Parent: RWTH Aachen University
137.226.0.0 - 137.226.255.255
137.226.0.0/16

CS dept project
2:45 pm on May 23, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts:483
votes: 43


UA: Mozilla/5.0 zgrab/0.x
Protocol: HTTP/1.1
Robots.txt: No
Host: researchscan351.eecs.umich.edu
Parent: University of Michigan College of Engineering
141.212.122.0 - 141.212.122.255
CIDR: 141.212.122.0/24

Did an interesting referral using my site's ip address. Did not use my domain name for a referral. It also included a port number.


- - -

[edited by: keyplyr at 12:55 am (utc) on Aug 12, 2018]
[edit reason] removed active links [/edit]

5:50 pm on May 23, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15173
votes: 679


In addition to the 137 and 141 ranges given above, I've also seen them (just in the past year-plus) from
130.211.63.abc (Google Cloud)
63.251.232.abc (Internap)
13.57.10.111 (Amazon Australia)
But the most common by far is 137.226.113.26-28 [sic]. I guess that means it's one of those robots everybody and their brother can use.
7:35 pm on Aug 11, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15173
votes: 679


Looking specifically at “Researchscan” and not the “zgrab” name element:

This robot drew itself to my notice because it showed up in logs of a site that has only just gone HTTPS, with visits before I instituted a redirect.

UA: Mozilla/5.0 zgrab/0.x (compatible; Researchscan/foo-twiddle; +http://researchscan.comsys.rwth-aachen.de)
IP: 137.226.113.abc
robots.txt: no
First seen (by me): 27 January 2018
Protocol: HTTPS ONLY
where “foo-twiddle” * can be any of:
t12ca
t12l
t12sns
t13l
t13rl
This would be a nifty naming convention in a computer-science class where each student's robot had to have some unique identifier while being overall the same, but that doesn’t seem to apply here.

I took the time to look up “first seen”--delving into older, compressed logs--because I wondered if they would show up five minutes after my test site first went HTTPS. They didn’t; the research project seems to have started in late January 2018.

Can I request that my server be excluded?
To have your host or network excluded from future scans conducted by RWTH Aachen University, please contact researchscan@comsys.rwth-aachen.de with your IP address or CIDR block. Alternatively, you can configure your firewall to drop traffic from the subnet we use for scanning: 137.226.113.0/26
Or, Option C, you could instruct your robot to recognize the established mechanism by which a site conveys the request “Do not crawl here”. Or, Option D, I could continue blocking you on header grounds without having to take any action at all. (Is this one of those ventures where a 403 response actually conveys just as much information as a 200? Possibly.)

The sad part is that what they're studying may really be a valid field of useful inquiry. Just, y'know, be polite about it.


* With hastily added hyphen after I found myself reanalyzing, or mis-analyzing, the name. Oops.

- - -

[edited by: keyplyr at 12:58 am (utc) on Aug 12, 2018]
[edit reason] splice clean-up [/edit]

1:06 am on Aug 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12771
votes: 874


This bot hits all the sites I watch dozens of times a day, all receiving 403s.

This would be a nifty naming convention in a computer-science class where each student's robot had to have some unique identifier while being overall the same, but that doesn’t seem to apply here.
I think that's exactly what it is, but there appears to be several schools participating in the project.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members