Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

X-Robots-Tag disallow a folder?

   
12:34 pm on Feb 8, 2013 (gmt 0)

5+ Year Member




Googlebot is having a bit too much fun in my Piwik folder and I'm not having any luck finding how to block a folder and all the contents inside with x-robots-tag

I tried many variations of this but I either get a 500 error or no error but no change in the headers.

<filesMatch "^thisfolder/">
Header set X-Robots-Tag "noindex"
</filesMatch>

Can someone please show me an example of how to do this?
3:35 pm on Feb 8, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Google in most instances, is robots.txt compliant.

Just add the directory (i. e., folder) to your robots.txt
EX (following your User Agents):

User-agent: *
Disallow: /MyFolder
8:43 pm on Feb 8, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Eeuw, piwik and google. Been there. Done that.

My solution has two parts. Well, three if you count robots.txt

Disallow: /piwik


The others are in htaccess:

<FilesMatch "\.(js|txt|xml|php)$">
Header set X-Robots-Tag "noindex"
</FilesMatch>


I got this from someone else on these forums. I forget who, but consider yourself thanked. ;) The .php element is specific to my site; if you've got regular pages with a .php extension you obviously would leave it out.

and:

RewriteCond %{REMOTE_ADDR} ^(131\.253\.[2-4]\d|157\.(5[4-9]|60)|207\.46|209\.8[45])\. [OR]
RewriteCond %{HTTP_USER_AGENT} ([a-z]Bot|facebook|pinterest|Google|Seznam|Preview) [NC,OR]
RewriteCond %{HTTP_REFERER} cache
RewriteRule (piwik|dp/|nagvaarniq/) - [F]

RewriteCond %{REQUEST_URI} !piwik
RewriteRule \.(php|pl)$ - [F,NS]

(This one again is specific to my site: robots that ask for nonexistent php files are clearly up to no good, so it is more satisfying to thwack them with a 403 instead of the 404 they would otherwise get.)

# keep auto-referer bots out of piwik
RewriteCond %{HTTP_REFERER} piwik\.js
RewriteRule piwik\.js$ - [F]

RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} !piwik
RewriteRule (\.html|/)$ - [F]

Your exact forms will vary. This is the combination of rules --they're not really one-after-the-other as shown here-- that I've arrived at. The main issue is getting preview non-robots to keep the ### out of your analytics files. (For those who use it: does Google Preview show up in your GA records? I've occasionally wondered.)


filesMatch "^thisfolder/"

More "Been there. Done that." The <Files> and <FilesMatch> envelopes apply only to names of physical files, not to requests. Directory names can be used in <Directory> and <Location> -- and both of those can only be used in config files, not in htaccess.
8:45 am on Feb 11, 2013 (gmt 0)

5+ Year Member



Thank you, that was a big help :)

I had the piwik folder blocked with robots.txt but it was like catnip to Google, they just couldn't resist some parts of it.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month