| X-Robots-Tag disallow a folder?
|
LunaC

msg:4543784 | 12:34 pm on Feb 8, 2013 (gmt 0) | Googlebot is having a bit too much fun in my Piwik folder and I'm not having any luck finding how to block a folder and all the contents inside with x-robots-tag I tried many variations of this but I either get a 500 error or no error but no change in the headers. <filesMatch "^thisfolder/"> Header set X-Robots-Tag "noindex" </filesMatch> Can someone please show me an example of how to do this?
|
wilderness

msg:4543828 | 3:35 pm on Feb 8, 2013 (gmt 0) | Google in most instances, is robots.txt compliant. Just add the directory (i. e., folder) to your robots.txt EX (following your User Agents): User-agent: * Disallow: /MyFolder
|
lucy24

msg:4543955 | 8:43 pm on Feb 8, 2013 (gmt 0) | Eeuw, piwik and google. Been there. Done that. My solution has two parts. Well, three if you count robots.txt
Disallow: /piwik The others are in htaccess:
<FilesMatch "\.(js|txt|xml|php)$"> Header set X-Robots-Tag "noindex" </FilesMatch> I got this from someone else on these forums. I forget who, but consider yourself thanked. ;) The .php element is specific to my site; if you've got regular pages with a .php extension you obviously would leave it out. and:
RewriteCond %{REMOTE_ADDR} ^(131\.253\.[2-4]\d|157\.(5[4-9]|60)|207\.46|209\.8[45])\. [OR] RewriteCond %{HTTP_USER_AGENT} ([a-z]Bot|facebook|pinterest|Google|Seznam|Preview) [NC,OR] RewriteCond %{HTTP_REFERER} cache RewriteRule (piwik|dp/|nagvaarniq/) - [F]
RewriteCond %{REQUEST_URI} !piwik RewriteRule \.(php|pl)$ - [F,NS]
(This one again is specific to my site: robots that ask for nonexistent php files are clearly up to no good, so it is more satisfying to thwack them with a 403 instead of the 404 they would otherwise get.)
# keep auto-referer bots out of piwik RewriteCond %{HTTP_REFERER} piwik\.js RewriteRule piwik\.js$ - [F]
RewriteCond %{REQUEST_METHOD} POST RewriteCond %{REQUEST_URI} !piwik RewriteRule (\.html|/)$ - [F]
Your exact forms will vary. This is the combination of rules --they're not really one-after-the-other as shown here-- that I've arrived at. The main issue is getting preview non-robots to keep the ### out of your analytics files. (For those who use it: does Google Preview show up in your GA records? I've occasionally wondered.)
| filesMatch "^thisfolder/" |
| More "Been there. Done that." The <Files> and <FilesMatch> envelopes apply only to names of physical files, not to requests. Directory names can be used in <Directory> and <Location> -- and both of those can only be used in config files, not in htaccess.
|
LunaC

msg:4544462 | 8:45 am on Feb 11, 2013 (gmt 0) | Thank you, that was a big help :) I had the piwik folder blocked with robots.txt but it was like catnip to Google, they just couldn't resist some parts of it.
|
|
|