Forum Moderators: open

Message Too Old, No Replies

Plainclothes Agent from 23.96/13

Possibly Bingbot

         

lucy24

8:46 pm on Jun 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Neither a new IP nor a new behavior--but for me it's a new combination.

Plainclothes bingbot, with the distinctive "  " (two blanks):
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0;  Trident/5.0)
and
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;  Trident/5.0)

In years past they crawled from 65.55 and more recently from 131.253. (Also, in parallel, from the mysterious Drake Holdings range at 204.79.180-181). About a week ago they suddenly remembered that they also own
23.96.0.0/13
23.96-103
and are now sending the plainclothes bingbot consistently from
23.101.169.abc
Matter of fact, early notes suggest they are so excited about this newly rediscovered range, they've stopped using the Drake range. It may be too early to tell, though.

In general I associate 23.blahblah with Skype; I'd forgotten it was an MSN property until my log-processing code (thank you, computer) pointed out the familiar NT UA.

Bing crawling is weird.

keyplyr

3:22 am on Jun 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



plainclothes bingbot,
Again (as with last discussions) what makes you think this is Bingbot?

This agent is not from a Bingbot or other Microsoft designated "crawl" range. Microsoft uses many ranges for many things, including letting many other companies use their ranges for many things. Just because this /13 is registered to Microsoft Corporation does not mean the agent is Bingbot.

I'm not saying this is *not* Bingbot. It could be, but I see no distinction that it is and several indications that it isn't.

lucy24

4:11 am on Jun 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, you don’t expect plainclothes cops to come out and tell you they’re plainclothes cops, do you?

To me it simply pushes credulity that this pair of UAs has never been seen from anywhere that doesn’t have some sort of Microsoft connection. Normally when robots pretend to be human, you can search back a year or two and find real humans from all over the map using the identical UA.

Some visits come with a referer citing bing search. Not necessarily a correct search, but at least “Yeah, I can see why they would think that”. I think they're spot-checking the accuracy of their searches. In addition to HTML, they pick up fonts, scripts, stylesheets--everything except images and favicon.

:: detour to check in the opposite direction ::

Full list of all times these two user-agents have ever picked up images:
-- two occasions in late 2017, from Drake Holdings range, at my test site. (The page requests happened to be refererless--but when humans do blunder into this site, it’s almost always from a bing search with the “a description of this site is not available” boilerplate.)
-- two occasions this past winter, again from Drake Holdings range, both at my personal site
-- one last month from two MSN ranges, again at my personal site. I say “two ranges” because it’s characteristic of this operator to toggle randomly between 131.253 and 65.55 on the same visit

Is it possible you are simply interpreting my “plainclothes bingbot” descriptor with excessive literalness?

keyplyr

4:29 am on Jun 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lucy24 I agree with all you say about the history of these requests & Drake et al, I just don't see the agent as being related to Bingbot.

It's likely MS has a couple hundreds UAs crawling various document for various reasons. However, Bingbot is a specific agent crawling for specific reasons, mostly relating to the Bing Search Index.

tangor

4:37 am on Jun 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And all that data collected is probably put to good use for....




wait for it....

iamlost

4:59 pm on Jun 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...finding out who's been paying attention and who's oblivious...

I block all 'plainclothes' SE bots if rDNS doesn't return 'authorised' search domain or IP hasn't been white listed ... I'm much more bot restrictive than most; off course I'm much less third party ad/af reliant than most as well.

lucy24

8:50 pm on Jun 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm much more bot restrictive than most
And I'm much less bot restrictive than most ;)

In the specific case of the plainclothes bingbot--including the ones from assorted Drake Holdings properties--I redirect them to a rather generic page that is theoretically for humans. It says something like “I’m awfully sorry, but you have accidentally replicated the behavior of an unwelcome robot” with a link to continue to the originally requested page. This strikes me as more polite than saying outright “Look, you imbecile, if you had bothered to LOOK at the search snippet you would know this isn’t remotely the page you’re looking for.” Humans occasionally do follow the link; these quasi-robots never do.

At one time I blocked anything that came from a bing/MSN range and didn’t have “bingbot” in its name, but hey, MS employees are entitled to surf the web on their lunch break just like ordinary humans.

There’s a Yandex-related entity that behaves almost identically, but unfortunately they don’t arrive from Yandex ranges so I can’t do anything except recognize them after the fact.

keyplyr

8:07 pm on Jun 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There were close to a thousand of these in yesterday's log. Obviously not Bingbot, but some bot-runner leasing M$ ranges looking for vulnerabilities:

23.100.91.** - - [22/Jun/2018:09:14:47 -0700] "GET /bbs/utility/convert/data/config.inc.php HTTP/1.1" 403 20067 "http://my-site.com/bbs/utility/convert/data/config.inc.php" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:48 -0700] "GET /utility/convert/data/config.inc.php HTTP/1.1" 403 20067 "http://my-site.com/utility/convert/data/config.inc.php" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:49 -0700] "GET /md5.asp HTTP/1.1" 404 20067 "http://my-site.com/md5.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:50 -0700] "GET /images/Sql.asp HTTP/1.1" 403 20067 "http://my-site.com/images/Sql.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:51 -0700] "GET /manage/Images/Sql.asp HTTP/1.1" 403 2106 "http://my-site.com/manage/Images/Sql.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:52 -0700] "GET /admin/images/Sql.asp HTTP/1.1" 403 20067 "http://my-site.com/admin/images/Sql.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:52 -0700] "GET /images/css/Thumb.asp HTTP/1.1" 403 20066 "http://my-site.com/images/css/Thumb.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:53 -0700] "GET /admin/sdfg.asp HTTP/1.1" 403 20066 "http://my-site.com/admin/sdfg.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

23.100.91.** - - [22/Jun/2018:09:14:54 -0700] "GET /Templates/test.asp HTTP/1.1" 403 2105 "http://my-site.com/Templates/test.asp" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

lucy24

8:45 pm on Jun 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's not the critter I call the plainclothes bingbot. It's got two and only two UAs, given in the OP. (Look for “Trident” and the highly distinctive double-space.) Referer is either blank or bing search, and page requests are accompanied by css, scripts and fonts but not images or favicon.

Are those bona fide URLs on your site? Come to think of it, the entity I call the plainclothes bingbot only asks for real pages--unlike the “real” bingbot, which loves to convert CamelCase to lowercase. But then, we all know about bing's taste for 404s and 410s.

keyplyr

8:59 pm on Jun 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's not the critter I call the plainclothes bingbot.
Ha... didn't intend to give you the impression this is your "plainclothes bingbot" only that it's an example of the many botrunners that lease M$ ranges (the same as in your OP) and run their campaigns.

Are those bona fide URLs on your site?
Of course not. I don't use CMSs or 3rd party software, I write everything myself. Plus, I'm on an Apache server... I thought you knew :)

wilderness

9:48 am on Jun 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW:
1) Range denied
2) requests are persistent, however not overwhelming and timed far apart.
3) some of the requests actually contain valid and non-encrypted refers from Bing.

keyplyr

10:10 am on Jun 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A lot of bots will fake referrers, most often from:

• Search Engines, complete with querry parameter.

• Pages from your own site, often the same page being requested.

• Some site being spammed to your logs in case you click on it in a stats program or in case your log is public.

lucy24

6:06 pm on Jun 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Plus, I'm on an Apache server... I thought you knew
Wouldn't you be able to tell from the log format anyway? Or do logs of IIS/nginx/{smaller-servers-nobody-has-heard-of} end up looking identical to Apache logs? (When all is said and sifted, a log is just a text file, after all.) No, I simply didn't look closely enough at the samples to register the highly non-Apache ".asp" extensions.

:: detour to check something ::

Huh. I guess I do get requests for \.[aj]spx? (the form I checked for). Most, of course, get a resounding 403. But it wouldn't hurt to add those extensions to the ones that get a manual 404 (like explicit requests for .php), and thus save the server the trouble of looking, because even if I had such files I wouldn't admit it.

keyplyr

7:14 pm on Jun 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I also route some requests to a manual 404 (instead of 403) for the off-chance they will give up instead of becoming challenged and further motivated.

...Drat! Here I am personifying non-human entities like you do.

lucy24

8:28 pm on Jun 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The solid argument in favor of a manual 404 is that it conveys no information. A 403 drops the hint of “We’re onto you”.

like you do
Mwa ha ha, it’s contagious.