Bingbot odd request

Forum Moderators: open

Message Too Old, No Replies

Bingbot odd request

mrtonyg

12:28 am on May 5, 2015 (gmt 0)

Any comments about this malicious looking request from bingbot?

207.46.13.2 - - [04/May/2015:19:46:29 -0400] "GET /... HTTP/1.1" 403 162 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Pfui

2:53 pm on May 5, 2015 (gmt 0)

The IP is a.k.a. --

msnbot-207-46-13-2.search.msn.com

-- so I'd worry more about it possibly being malicious if it didn't come from .search.msn.com

If there was a referrer, it could simply be a miscoded link the bot followed. Many sites' programs use ellipses to truncate how longer file names appear in posts, etc., and the dots get caught up as part of the referrer.

lucy24

4:57 pm on May 5, 2015 (gmt 0)

Is "/..." with trailing dotdotdot the real, literal form of the request?

It makes me think of those occasional errors you find in wmt (more often google, but the principle is identical) where it says it couldn't find such-and-such URL ... and you look at the URL in exasperation and think that honestly, couldn't any idiot tell that that's a hiccup in somebody's auto-linking function, or a misplaced punctuation mark? In some ways, a robot-- even a reputable search-engine spider-- is dumber than the dumbest human who ever lived.

mrtonyg

6:37 pm on May 5, 2015 (gmt 0)

@Pfui no referrer

@lucy24, yes that was a straight copy/paste from the logs.

This is not the first time I have seen bingbot with that same odd request.

The reason I state malicious, is that in a 'nix OS, the command to back up one directory level is: cd /..

Odd that no one yet has seen it in their logs.

keyplyr

8:18 pm on May 5, 2015 (gmt 0)

Odd that no one yet has seen it in their logs.

No one?

aristotle

10:36 pm on May 5, 2015 (gmt 0)

Why did it return a 403? Especially for what looks like it could be a legitimate bingbot request?

lucy24

11:33 pm on May 5, 2015 (gmt 0)

in a 'nix OS, the command to back up one directory level is: cd /..

Nothing sinister there, since the form /.. is also standard HTML for "go back one directory" in a relative link. But it would never appear in a request sent to the server, unless either the robot itself had the hiccups, or the original link was so malformed that the search engine couldn't figure out what URL it's supposed to mean.

If you were instructing a malign robot go go somewhere it wasn't supposed to, like upstream from your /public/ directory, you wouldn't do it like that. You'd need some fancy php-or-equivalent command. And even then I wouldn't expect it to work in an HTTP request.

Pfui

1:52 am on May 6, 2015 (gmt 0)

Directory crawling is not uncommon among less than savory UAs/users, whether or not /.. or slash-dot patterns are hard-coded, imho. In fact, I've long blocked any URI resembling /.. because it's not okay to poke around. Plus, most of the time, the UAs (including scrapers) get the paths very, very wrong anyway.

Nowadays, the worst, long-chronic offender is Yahoo. Here's a partial example from a few weeks ago. They always 404 galore:

h044.crawl.yahoo.net - "GET /dir/fileA.html/../graphics/bttn.gif HTTP/1.1" 404 3340 "-"
h044.crawl.yahoo.net - "GET /dir/fileB.html/../graphics/logo.gif HTTP/1.1" 404 3340 "-"

UA:

"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

(To add insult to injury, Yahoo and all its UA permutations have been blocked from /graphics in robots.txt for YEARS.)

Another UA I had to block because of /.. abuse:

"Instapaper/6.1 CFNetwork/711.1.16 Darwin/14.0.0"

Back to the OP, sorry, no new thoughts about the slash-triple-dot /... pattern you saw other than my prior musings.

lucy24

5:03 am on May 6, 2015 (gmt 0)

the worst, long-chronic offender is Yahoo.

That seems to me like a textbook case of "never attribute to malice that which can be adequately explained by stupidity".

:: detour to archived logs, because I don't see the ../ pattern often, and when I do it tends to point toward some monument of imbecility ::

Whoops! Let's try that again with a RegEx to filter out the query strings, as in

74.91.26.251 - - [30/Apr/2014:21:42:42 -0700] "GET /KikChat/private.php?name=../../../../../../../../../../etc/passwd%00 HTTP/1.0" 403 3269 "-" "-"

That's not even a record; elsewhere I found 15 nests of ../../ -- and no, the robot didn't systematically ask for one, then two, then three etc., so what was the point? It can't possibly have known that such-and-such file is always exactly fifteen directories upstream from its original request.

OK, here we go, a few goodies at random:

79.112.109.72 - - [09/Apr/2011:05:44:40 -0700] "GET /games/../mailto:webmaster@example.com HTTP/1.0" 404 1496 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b5) Gecko/2008032620 Firefox/3.0b5"
...
38.101.148.126 - - [13/Aug/2011:03:38:18 -0700] "GET /ebooks/../fun/index.html HTTP/1.1" 200 1596 "-" "Mozilla/5.0 (compatible; discobot/1.1; +http://discoveryengine.com/discobot.html)"

See what I mean about malice and stupidity? It looks as if the discobot, in particular, hung around eating 403s for much of August 2011 (logs say it got 200s at the beginning of the month, but was blocked by month's end).

Those 200s suggest that the server actually didn't mind this format. My bad; I thought it would object. But honestly, you'd think there would be a status code for "look, dimwit, I'm not going to do your arithmetic for you, so figure out what directory you want, and ask for it instead of going around in circles".

And then there's

80.94.164.141 - - [03/Jul/2011:15:25:55 -0700] "GET /../index.html HTTP/1.0" 400 415 "http://www.example.ru/referer-spam-here" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11) Firefox/2.0.0.11"

which was apparently too much even for the server. Can't backtrack from the root, no matter how much you'd like to.

218.75.27.72 - - [26/Apr/2011:20:39:34 -0700] "GET /.../duct_tape.html HTTP/1.1" 404 1035 "-" "Lotus-Notes/4.5 ( Windows-NT )"

I have no idea what that means-- but the server must have understood, or it wouldn't have known what to look for. ("I can't find it" is a different response from "I don't understand what you're asking for".)

Odd that everything I turned up was from 2011. That includes the unambiguosly malign ones, like

202.100.80.21 - - [11/Dec/2011:11:19:17 -0800] "GET /?file=../../../../../../proc/self/environ%00 HTTP/1.1" 403 1340 "-" "<?php system(\"id\"); ?>"

Well, I guess the UA string wasn't malign; that looks more like stupidity again. Elsewhere, a string of similar requests managed to net a 501 response ("sorry, but I don't know how to do that"). All of this was before my host started running mod_security, which explains the absence of 418 errors.

Dang, forgot to check for trailing dots. Different pattern. But most of them are "Oh, come on" requests, like:

66.249.67.208 - - [27/Feb/2012:06:50:42 -0800] "GET /hovercraft/h.. HTTP/1.1" 404 1032 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I think even Google must understand that these are nonsense requests, because it doesn't look like they've ever requested something of this kind more than once. But there have been a surprising lot of different ones; it's got to be following malformed links from outside.

mrtonyg

11:44 pm on May 9, 2015 (gmt 0)

Here is another one that just puzzles me:

157.55.39.183 - - [09/May/2015:11:50:58 -0400] "GET /secure/ViewKeyboardShortcuts!default.jspa HTTP/1.1" 404 713 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Edit: Though I would add: This site is mostly static with only some PHP thrown in for a contact page.
Absolutely no Java.

lucy24

2:24 am on May 10, 2015 (gmt 0)

Oh, I don't think they bother to study your page source to establish what CMS you are actually using. Faster just to run around with a shopping list of possible filenames. But from a reputable search engine it's a weird request; it does seem more a Ukrainian robot's kind of thing. Do you even have a /secure/ directory? Why would the bingbot think you do?

Is that a literal ! exclamation mark in the middle of the URL? Are those even legal?

:: detour to horse's mouth [w3.org] ::

The similarity to unix and other disk operating system filename conventions should be taken as purely coincidental, and should not be taken to indicate that URIs should be interpreted as file names.

That was about slashes and dots; file and forget.

The astersik [sic] ("*", ASCII 2A hex) and exclamation mark ("!" , ASCII 21 hex) are reserved for use as having special signifiance within specific schemes.

Well, don't leave us in suspense, w3. What specific schemes? Further attempts at enlightenment lead only to the hashbang #! which is a different matter.