Forum Moderators: open

Message Too Old, No Replies

Microsoft Office Protocol Discovery

         

lucy24

8:26 pm on Feb 7, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can someone explain in words of two syllables exactly what this UA does, and why it’s doing it? I have met it before, but never to such egregious lengths. The one thing I’m tolerably sure of is that there is no malign intent; the human may not even know what his computer is doing. (“Never attribute to malice that which can be adequately explained by stupidity.”)

This week’s adventure happened to involve a rarely-visited page, making it relatively easy to check. It starts with an ordinary human from an ordinary human ISP using MSIE 11, visiting
/ebooks/title/chapter.html
No referer; cursory research suggests that he’d bookmarked the page after an earlier visit that also involved /ebooks/title/ but nothing outside this directory.

But then, beginning about a minute after the human visit, I start seeing this 9-request pattern from the same IP:
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "OPTIONS / HTTP/1.1" 403 3545 "-" "Microsoft Office Protocol Discovery" 
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "OPTIONS / HTTP/1.1" 403 3546 "-" "Microsoft Office Protocol Discovery"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "GET /sharedstyles.css HTTP/1.1" 304 261 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; ARM; Trident/7.0; .NET4.0E; .NET4.0C; Tablet PC 2.0; ms-office; MSOffice 15)"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "OPTIONS /ebooks/ HTTP/1.1" 403 3545 "-" "Microsoft Office Protocol Discovery"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "OPTIONS /ebooks/ HTTP/1.1" 403 3546 "-" "Microsoft Office Protocol Discovery"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "GET /ebooks/ebookstyles.css HTTP/1.1" 304 261 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; ARM; Trident/7.0; .NET4.0E; .NET4.0C; Tablet PC 2.0; ms-office; MSOffice 15)"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "OPTIONS /ebooks/title/ HTTP/1.1" 403 3545 "-" "Microsoft Office Protocol Discovery"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "OPTIONS /ebooks/title/ HTTP/1.1" 403 3546 "-" "Microsoft Office Protocol Discovery"
24.36.aa.bb - - [05/Feb/2017:16:48:52 -0800] "GET /ebooks/title/titlestyles.css HTTP/1.1" 304 261 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; ARM; Trident/7.0; .NET4.0E; .NET4.0C; Tablet PC 2.0; ms-office; MSOffice 15)"
Always this set of exactly nine requests in the same order. (I picked one at random for posting. The very first set got 200 responses for the stylesheets; from then on it was the predictable 304.)

OPTIONS /
OPTIONS /
GET /sharedstyles.css
OPTIONS /ebooks/
OPTIONS /ebooks/
GET /ebooks/ebookstyles.css
OPTIONS /ebooks/title/
OPTIONS /ebooks/title/
GET /ebooks/title/titlestyles.css

The OPTIONS requests were blocked due to severely deficient headers, with the addition of a consistent
X-Idcrl-Accepted: t
(wtf? Is that the actual header, or did the machine break?) I don't know about the stylesheets, because those generally get a free pass anyway.

Multiply this by 106.

The next morning, our human apparently went to work; in the course of an hour and a half early in the day*, there were 120 further nine-request packages from a respectable university’s IP. And then he went home early, leading to a final 102 packages from the original IP.

I make this out to be 2952** requests, which is at least 2943 too many.


* I looked it up. The university is in the Eastern time zone, so it wasn’t as outrageously early as it looked in logs.
** 9 x (106 + 120 + 102 = 328)

keyplyr

10:04 pm on Feb 7, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They are saving your pages to their local machine: copying, downloading, scraping, stealing... whatever you want to call it.

I block the methods:
RewriteCond %{REQUEST_METHOD} ^(OPTIONS|PROPFIND)$
RewriteRule !^forbidden\.html$ - [F]

I block the UAs;
MSOffice, Microsoft Office Word, Office, Microsoft Office PowerPoint, ms-office, and a bunch of others I can't think of right now.

Windows 10 came with an extended free trial of the Windows Office Suite. Since then I've seen a significant increase of users attempting to scrape content using these UAs. They also come as botnets (infected machines) from multiple IPs.

lucy24

3:58 am on Feb 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is absolutely no skin off my nose if some Canadian academic wants to save /ebooks/title/ to their local machine, since the thing is public-domain-with-a-vengeance. But you'd think 328 tries would be enough to get the message that This Is Not Working. (I'm also fascinated by the html/css alternation, but you have to be familiar with the pages to really appreciate it.)

Incidentally: While opening this thread I remember that I hadn't bothered to look up the “X-Idcrl-Accepted” header. I was bemused to find one of the top hits was this very thread ... stamped “20 hours ago” according to g###. Interesting math, there.

tangor

4:30 am on Feb 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The MS support document might explain what it does

[support.microsoft.com...]

For most of us webmasters this would be undesired from the get go.

keyplyr

8:02 am on Feb 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is absolutely no skin off my nose if some Canadian academic wants to save /ebooks/title/ to their local machine, since the thing is public-domain-with-a-vengeance. But you'd think 328 tries would be...
Well that's just it. Some MSOffice hits are from individuals trying to save pages to their computers for whatever reason.

Other hits using these UAs are from actors either droning infected machines w/ MSOffice installed or they are spoofing the UA. Regardless, IMO this is not benign behavior.

From the look & frequency of those hits, I'd say the second explanation is more likely the case.