homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

X-Robots-Tag disallow a folder?

 12:34 pm on Feb 8, 2013 (gmt 0)

Googlebot is having a bit too much fun in my Piwik folder and I'm not having any luck finding how to block a folder and all the contents inside with x-robots-tag

I tried many variations of this but I either get a 500 error or no error but no change in the headers.

<filesMatch "^thisfolder/">
Header set X-Robots-Tag "noindex"

Can someone please show me an example of how to do this?



 3:35 pm on Feb 8, 2013 (gmt 0)

Google in most instances, is robots.txt compliant.

Just add the directory (i. e., folder) to your robots.txt
EX (following your User Agents):

User-agent: *
Disallow: /MyFolder


 8:43 pm on Feb 8, 2013 (gmt 0)

Eeuw, piwik and google. Been there. Done that.

My solution has two parts. Well, three if you count robots.txt

Disallow: /piwik

The others are in htaccess:

<FilesMatch "\.(js|txt|xml|php)$">
Header set X-Robots-Tag "noindex"

I got this from someone else on these forums. I forget who, but consider yourself thanked. ;) The .php element is specific to my site; if you've got regular pages with a .php extension you obviously would leave it out.


RewriteCond %{REMOTE_ADDR} ^(131\.253\.[2-4]\d|157\.(5[4-9]|60)|207\.46|209\.8[45])\. [OR]
RewriteCond %{HTTP_USER_AGENT} ([a-z]Bot|facebook|pinterest|Google|Seznam|Preview) [NC,OR]
RewriteCond %{HTTP_REFERER} cache
RewriteRule (piwik|dp/|nagvaarniq/) - [F]

RewriteCond %{REQUEST_URI} !piwik
RewriteRule \.(php|pl)$ - [F,NS]

(This one again is specific to my site: robots that ask for nonexistent php files are clearly up to no good, so it is more satisfying to thwack them with a 403 instead of the 404 they would otherwise get.)

# keep auto-referer bots out of piwik
RewriteCond %{HTTP_REFERER} piwik\.js
RewriteRule piwik\.js$ - [F]

RewriteCond %{REQUEST_URI} !piwik
RewriteRule (\.html|/)$ - [F]

Your exact forms will vary. This is the combination of rules --they're not really one-after-the-other as shown here-- that I've arrived at. The main issue is getting preview non-robots to keep the ### out of your analytics files. (For those who use it: does Google Preview show up in your GA records? I've occasionally wondered.)

filesMatch "^thisfolder/"

More "Been there. Done that." The <Files> and <FilesMatch> envelopes apply only to names of physical files, not to requests. Directory names can be used in <Directory> and <Location> -- and both of those can only be used in config files, not in htaccess.


 8:45 am on Feb 11, 2013 (gmt 0)

Thank you, that was a big help :)

I had the piwik folder blocked with robots.txt but it was like catnip to Google, they just couldn't resist some parts of it.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved