Welcome to WebmasterWorld Guest from 54.227.48.147

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Spoofed robots.txt requester

Single IP range

     
3:30 pm on Jan 17, 2018 (gmt 0)

New User

joined:Jan 13, 2018
posts: 5
votes: 0


I've been doing some log crunching to just to see which bot does what and came across a curious one:

UAs: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Protocol: HTTP/1.1
Robots.txt: Yes
IP: 123.125.67.***

Every few days it requests robots.txt, which is nice, right? But it never does anything else. It mainly uses the first UA, the single time it used the second it was also late, otherwise it's very punctual.

The IP range is Chinese. I blocked its access to robots.txt just to see if it does something new, and now I'm waiting a visit. I can't fathom why anyone it'd be using broswers UAs to request the robots file, but I have some theories:

1. It's looking for specific disallowed directories to direct attacks, being part of a Chinese one that keeps requesting /ogShow.aspx?name=ogFoot&line=0&from=oGateeu. These attempts come from multiple ranges unlike the robots requester (it never came from the robots range). I can count on my fingers how many visitors I get from China, and since both use fake browser UAs, are fond of Safari/537.36 and are, well, Chinese...

2. It's Baidu acting dumb. Baidu used to own (might still own) 123.125.68.***. Sooo... a stealthy robots.txt cross-checker?!

3. It's an unrelated attacker looking for something specific in my robots and leaving when it doesn't find it. That would make this bot slightly cleverer than your average bot, I guess. No resources wasted on incorrect targets and fewer flags raised.
7:10 pm on Jan 17, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10624
votes: 630


Very common. The IP requesting robots.tx may not be the same IP that another actor will use. Many sites allow *all* to access robots.txt, so it may be used to check Server config as well as other pertinent data.