Forum Moderators: open

Message Too Old, No Replies

User Agent is Microsoft Office something or other

         

SumGuy

12:26 am on Apr 4, 2019 (gmt 0)

5+ Year Member Top Contributors Of The Month



I see there was a thread here a couple years ago asking about one aspect of these MS Office hits, but I thought I'd throw this out there because of some recent activity. Sometimes I see hits where the UA is the following:

Microsoft Office Excel 201X
Microsoft Office PowerPoint 201X
Microsoft Office Word 201X

I presume that those hits indicate someone clicked on a link to my site from a Word document, Excel spreadsheet or power point presentation - ?

--------------
I saw one such hit yesterday from the same IP that at first appears as a normal browser hit, where the UA was:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/73.0.3683.86 Safari/537.36

But then thrown in there were HEAD's where the UA was Microsoft Office Word 2014. Presumably that user was browsing the site, then decided to copy a link (to a pdf) into an Office document, and Office was checking the link (using HEAD) ?

And then I see a hit to the same pdf - from the same user (this is all the same IP, same browsing session) where the UA is this:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; Tablet PC 2.0; Zoom 3.6.0; ms-office)

Tablet PC? Zoom? MS-Office? What is all that?
-----------------

Then there are these UA's:

Microsoft Office Mobile/15.0
Microsoft Office Existence Discovery
Microsoft Office Protocol Discovery
and I think I've seen Microsoft webdav?

For which I have no clue what is behind them...

lucy24

4:07 am on Apr 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to logs because I’d forgotten all about Microsoft Office ::

Oh, yes, there they are. Mostly Microsoft Office Protocol Discovery; the occasional Microsoft Office Word 2014; a scattering of others. Always interlocking with to-all-appearances human visits. Happily they’re all getting blocked; a bit of spot-checking in headers tells me why. To make up for missing all the normal headers, there’s a slew of proprietary ones, notably X-Office-Major-Version--for some reason always 16.

Wonder what pleasure I've deprived some human of? (But I don’t wonder very hard.)

tangor

4:50 am on Apr 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do we worry about scrapers? (I think not.)

wilderness

9:53 am on Apr 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SumGuy,
Similar variations of MS tools been in use for decades. Very easy to add to your UA's.
Either the user has saved your entire page to an MS Product, a link to your page, and the worst is when the user provides an inline-link to images on your server and then distributes (web, email or otherwise) to other users and multiple IP's begin requesting same images.
If the later, my personal preference is to determine the original user (via logs) and add their IP (multiple line criteria) to the denies.
HEAD requests are very rare these days, at least compared to years ago.

lucy24

8:29 pm on Apr 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



HEAD requests are very rare these days, at least compared to years ago.
I know a handful of authorized robots that always request in pairs: first HEAD, then GET. (Isn’t there a header than conveys the same information with a single request? “Tell me if the page exists, and if it does, get it.” I forget which one.) And, of course, link checkers normally content themselves with HEAD unless they need to check a fragment. Once in a blue moon the w3 link checker will tell me that suchandsuch site doesn’t allow HEAD requests.

And if you want real rarity, look at OPTIONS requests. That was a Microsoft-related thing too. Further quick detour to logs tells me they still exist, but around 2/3 of them are getting blocked.