Forum Moderators: open

Message Too Old, No Replies

MSOffice

         

keyplyr

8:27 am on Aug 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With the recent proliferation of Windows 10, I've noticed a sharp increase in Microsoft Outlook & MSOffice requests that try to save the webpage and all it's associated files to the user's local machine.

I've always blocked this UA & its variants, but up until now, the profile has usually been from a person at work where MSOffice is installed. Now with Windows10, and MSOffice being one of the apps offered, the couch-potato at home can scrape our sites with ease.

The typical UA string will contain one or all of these attributes:
Microsoft Outlook
ms-office
MSOffice

As always, it is not the tool that is to blame, it is the user that misuses it.

dstiles

6:08 pm on Aug 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've always banned those tools, too. And various other things MS-orientated.

Windows 10 is in itself a scraper, anyway - except it scrapes YOUR hard drive and sends it to MS rather than scraping MY web site. From write-ups I've seen of 10 I'm VERY glad I no longer use MS except (unavoidably) as a web server, and I'm hanging onto 2012 as long as I can.

keyplyr

8:13 pm on Aug 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RE: Windows 10 - All the reporting can be turned off, which is the first thing I did during & after a custom install, but the default browser Edge has an easily accessible tool to strip away advertising & branding, then print or share the content of a web page. However that seems to be the norm nowadays; Safari started it, then Firefox added it, now Edge. I think we can assume Chrome won't, at least not in the current fashion.

dstiles

5:55 pm on Aug 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most of the reporting. If you use it with cloud facilities MS are legally bound to pass on whatever is on the cloud if US authorities demand it, even if you are not a US citizen. So be good, children! :)

I use Privacy Badger with firefox to prevent third paty tracking and it's thrown up a couple of oddities: a few of the web sites I use get their CSS and sometimes their images from a different domain which appears to badger as a third party tracker. No doubt they will overcome that.

Of course, it isn't only MS products that appear in user-agents. I see odd Apple and linux tools mentioned from time to time. The problem is knowing what those items are and hence whether to block them. Is a user-agent containing a mail agent a nasty or just someone clicking on a link in their email? Ditto Office (MS, Open or Libre). I block them but sometimes wonder if that's wise. On the other hand, wget, curl and a host of others are fair game, even though I use them myself. Hypocrite or what? :)

tangor

8:19 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The "home user" wanting to save a copy of a website is not a scraper. A leech, of course, but not a scraper, ie. one who will put it back on the web under a different domain in effort to monetize it.

The "new" MS Office makes some of this one person save a copy a bit easier. And while technically a scrape, they will never turn it against you.

keyplyr

12:09 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



they will never turn it against you.

Have a nice day

tangor

4:06 pm on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I generally do! Thanks for the well-wishing!

I've just looked at the metrics of MSOFFICE and ilk over the years and never seen it used as a scraping tool by SCRAPERS. One page, two, maybe three. It's just not the right kind of tool to suck down an entire website. Small fry.