Eeuw, piwik and google. Been there. Done that.
My solution has two parts. Well, three if you count robots.txt
Disallow: /piwik
The others are in htaccess:
<FilesMatch "\.(js|txt|xml|php)$">
Header set X-Robots-Tag "noindex"
</FilesMatch>
I got this from someone else on these forums. I forget who, but consider yourself thanked. ;) The .php element is specific to my site; if you've got regular pages with a .php extension you obviously would leave it out.
and:
RewriteCond %{REMOTE_ADDR} ^(131\.253\.[2-4]\d|157\.(5[4-9]|60)|207\.46|209\.8[45])\. [OR]
RewriteCond %{HTTP_USER_AGENT} ([a-z]Bot|facebook|pinterest|Google|Seznam|Preview) [NC,OR]
RewriteCond %{HTTP_REFERER} cache
RewriteRule (piwik|dp/|nagvaarniq/) - [F]
RewriteCond %{REQUEST_URI} !piwik
RewriteRule \.(php|pl)$ - [F,NS]
(This one again is specific to my site: robots that ask for nonexistent php files are clearly up to no good, so it is more satisfying to thwack them with a 403 instead of the 404 they would otherwise get.)
# keep auto-referer bots out of piwik
RewriteCond %{HTTP_REFERER} piwik\.js
RewriteRule piwik\.js$ - [F]
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} !piwik
RewriteRule (\.html|/)$ - [F]
Your exact forms will vary. This is the combination of rules --they're not really one-after-the-other as shown here-- that I've arrived at. The main issue is getting preview non-robots to keep the ### out of your analytics files. (For those who use it: does Google Preview show up in your GA records? I've occasionally wondered.)
filesMatch "^thisfolder/"
More "Been there. Done that." The <Files> and <FilesMatch> envelopes apply only to
names of physical files, not to requests. Directory names can be used in <Directory> and <Location> -- and both of those can only be used in config files, not in htaccess.