Forum Moderators: open

Message Too Old, No Replies

nsrbot

         

TorontoBoy

1:40 pm on Jul 25, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)
Protocol: HTTP/1.1
Robots.txt: No
Host: Net Systems Research LLC
168.1.128.56 - 168.1.128.63 SOFTLAYER
196.52.43.0 - 196.52.43.255 LogicWeb Africa
Reference: http://netsystemsresearch.com/ Says they do IoT research and solutions

Request Headers: did not trigger request headers

Bot was not malicious, just poked me twice. The "+" did show up in my raw access log entry. The second range I had previously banned due to issues with LogicWeb.

168.1.128.* [24/Jul/2018:04:20:28 GET / HTTP/1.1 301 231 - Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)
196.52.43.* [24/Jul/2018:10:37:50 GET / HTTP/1.1 403 636 - Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)

[edited by: keyplyr at 12:21 am (utc) on Aug 5, 2018]
[edit reason] Delinked URL [/edit]

lucy24

4:15 pm on Jul 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to Character Viewer ::

43 = capital C

But-- but why?

:: shrug ::

Do you suppose the botrunner at this point attempted to hit ctrl-C (or cmd-C), missed the modifier key and somehow ended up preserving an obfuscated C for all posterity?

Request Headers: did not trigger request headers
The second request--the one that got as far as a 403--should have triggered header logging. (You've got it on your 403 page, right?) It's always frustrating to see a 301 followed by a 403 for the same underlying request. It makes sense in this rare case: where they happened to use different IPs, and only the second one is blocked. (Tangent: Currently I'm most likely to see the 301-to-403 sequence if the initial request was for a directory without final / slash, and then the redirected request runs into a narrowly constrained rule intended only for page requests.)

Edit: Oh, wait. Not &#x but simply &#, making it
:: counting on fingers ::
2b, aka + sign. That actually makes more sense, because +http in UA strings is pretty common.

keyplyr

7:12 pm on Jul 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So the actual UA is: Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)

TorontoBoy

7:26 pm on Jul 25, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



My raw access log can and does properly render "+" for all other bot UAs but this one. So the actual name of the bot is "Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)", unless someone can explain why for the rest of my raw access log the plus sign "+" renders properly except for this UA. The raw access log is not even an output of html, though is an output from the Apache web server.

For example, this snippet of just UAs, above nsrbot:
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Mozilla/5.0 (Linux; Android 6.0.1; ASUS_Z00UD Build/MMB29P; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36 GSA/6.8.23.21.arm64
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36

keyplyr

7:29 pm on Jul 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interesting... it's a downright anomaly :)

Google SERP also shows &#43 reported for this UA.

TorontoBoy

7:40 pm on Jul 25, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Is some evil genius just messin' with us? Who does that?

lucy24

8:08 pm on Jul 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is some evil genius just messin' with us?

No, it means the botrunner hasn’t fully mastered their software. So they may or may not be evil, but almost certainly not a genius.

keyplyr

8:15 pm on Jul 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The header field was likely a cu'n paste from a mark-up where the unicode character was needed and the botrunner was oblivious to the fact because they never bothered to check how it displays.

lucy24

6:40 pm on Jul 26, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Meanwhile on my personal site...
196.52.43.abc - - [24/Jul/2018:13:39:16 -0700] "GET / HTTP/1.1" 400 1785 "-" "Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)" 
...
168.1.128.abc - - [24/Jul/2018:15:22:26 -0700] "GET / HTTP/1.1" 400 1785 "-" "Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)"
That’s on the HTTPS side. On the HTTP side they got a couple of 403s (both using the 196 address). Is it possible they blundered in some way when trying to make the HTTPS request? I don’t see a lot of 400s. Not that I'm complaining, mind you...

TorontoBoy

7:30 pm on Jul 26, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I have a secret admirer! I did not ban by UA or IP. They are a tad skimpy on the request headers...
196.52.43.* [26/Jul/2018:13:59:00 GET / HTTP/1.1 403 636 - Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)

2018-07-26:13:59:00
URL: /
IP: 196.52.43.*
Host: www.example.com
User-Agent: Mozilla/5.0 (compatible; nsrbot/1.0; +http://netsystemsresearch.com)

lucy24

9:37 pm on Jul 26, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They are a tad skimpy on the request headers...
Hm, now that’s interesting. Mine also say
Connection: close
which kinda reinforces the idea that some hosts supply a “Connection:” header if it isn’t sent. (All requests, without exception, have one--and its value is always, without exception, “close”.)

lucy24

7:01 pm on Jul 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Three days later, and they're still throwing consistent 400s on my HTTPS site. Anyone else noticing this?

:: wondering if it's worth the trouble to designate a 400 page, just so I can look at headers ::

keyplyr

7:13 pm on Jul 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Source code: none found
That is another forum.

lucy24

5:55 pm on Aug 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now, this is interesting. I moved another site to https, but waited a few days before instituting the redirect. In those intervening days, I found a 403 request on the http side, and a 400 request around the same time on the https side. This would seem to imply that nsrbot, with or without plus sign (looks as if they cleaned up the UA string a few days ago), is sending off parallel requests to http and https even though it doesn't know how to make an https request.

keyplyr, do you happen to know if a 400 response has some particular meaning in the context of https? It seems to be kind of a generic “sorry, I don’t understand what you’re asking” so I never know what’s up when I see it in logs.

:: wandering off to see if I can persuade the server to send out an error document on 400 responses ::

keyplyr

6:35 pm on Aug 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The trouble with understanding server response codes is they can be misleading from the intended use described at the OS docs.

Especially with routed file server systems used at shared hosting companies, response codes can be config'd to cover various scenarios.

If you are referring to the host I think you are, they're pretty good with response codes compared to other hosts I've done work at.

However, as you note, the response code only tells half the story, the servers side, and is far from explicit with that.

Pitty we don't get the full tale of what actually happens with that exchange.

lucy24

12:10 am on Aug 5, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you are referring to the host I think you are
Yup, the host who returns 418 Teapot Error responses to requests blocked by mod_security. (It was originally 403, but I think they figured out that it's better to use a different code.)

After poring over the php docs (I am decidedly Not Good At This) I figured out how to add HTTPS to header logging. It's obviously only relevant for 403 (and potentially 400) responses and the occasional robots.txt, since everyone else gets redirected. Since the nsrbot doesn't come by every day, and hardly anyone else draws a 400 response--they're vanishingly rare, though not nonexistent,* on HTTP GET requests--I will see if I succeed in logging those headers. If it turns out the 400 is intercepted at the server level, like 418s, I won't be able to get any further information.


* I found a couple that said GET /dirname%2 which, yeah, is pretty well a textbook “I have no idea what you mean so here, have a 400 instead”.

keyplyr

12:20 am on Aug 5, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The nsrbot may not support SNI*. It gets as far as the HTTP connection but cannot negotiate the HTTPS handshake, so the server ends up giving the 4** response. In that case, an HTTPS header would not do anything and likely a wasted effort, at least with this UA.


*SNI stands for Server Name Indication and is an extension of the TLS protocol. It indicates which hostname is being contacted by the browser at the beginning of the 'handshake' process.

lucy24

12:50 am on Aug 5, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In that case, an HTTPS header would not do anything
I made two concurrent changes: one, adding HTTPS information to all header logging, and two, designating an ErrorDocument for the 400 response. Since headers are all logged into the same directory, I would otherwise have to cross-check with access logs to see whether a given request came in on the HTTP side or the HTTPS side. (At a much earlier date, I added the requested filename to header logging because that, too, doesn't count as a header. Again, to prevent having to cross-check against access logs.) And then, since headers are logged as part of the error document preparation, the information should exist even if the unwanted visitor ended up being unable to receive the error document.

Oh, yeah, and I only changed it in my personal site. Now I have to go make the same changes in the site that just went HTTPS--the one that prompted the observation about nsrbot making “cold” requests. Still trying to figure out why it would try to do something it must know it isn't able to do. Is it probing for non-secure sites that happen to be listening on 443?