homepage Welcome to WebmasterWorld Guest from 54.227.25.58
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
X-Robots-Tag disallow a folder?
LunaC




msg:4543784
 12:34 pm on Feb 8, 2013 (gmt 0)


Googlebot is having a bit too much fun in my Piwik folder and I'm not having any luck finding how to block a folder and all the contents inside with x-robots-tag

I tried many variations of this but I either get a 500 error or no error but no change in the headers.

<filesMatch "^thisfolder/">
Header set X-Robots-Tag "noindex"
</filesMatch>

Can someone please show me an example of how to do this?

 

wilderness




msg:4543828
 3:35 pm on Feb 8, 2013 (gmt 0)

Google in most instances, is robots.txt compliant.

Just add the directory (i. e., folder) to your robots.txt
EX (following your User Agents):

User-agent: *
Disallow: /MyFolder

lucy24




msg:4543955
 8:43 pm on Feb 8, 2013 (gmt 0)

Eeuw, piwik and google. Been there. Done that.

My solution has two parts. Well, three if you count robots.txt

Disallow: /piwik

The others are in htaccess:

<FilesMatch "\.(js|txt|xml|php)$">
Header set X-Robots-Tag "noindex"
</FilesMatch>


I got this from someone else on these forums. I forget who, but consider yourself thanked. ;) The .php element is specific to my site; if you've got regular pages with a .php extension you obviously would leave it out.

and:

RewriteCond %{REMOTE_ADDR} ^(131\.253\.[2-4]\d|157\.(5[4-9]|60)|207\.46|209\.8[45])\. [OR]
RewriteCond %{HTTP_USER_AGENT} ([a-z]Bot|facebook|pinterest|Google|Seznam|Preview) [NC,OR]
RewriteCond %{HTTP_REFERER} cache
RewriteRule (piwik|dp/|nagvaarniq/) - [F]

RewriteCond %{REQUEST_URI} !piwik
RewriteRule \.(php|pl)$ - [F,NS]

(This one again is specific to my site: robots that ask for nonexistent php files are clearly up to no good, so it is more satisfying to thwack them with a 403 instead of the 404 they would otherwise get.)

# keep auto-referer bots out of piwik
RewriteCond %{HTTP_REFERER} piwik\.js
RewriteRule piwik\.js$ - [F]

RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} !piwik
RewriteRule (\.html|/)$ - [F]

Your exact forms will vary. This is the combination of rules --they're not really one-after-the-other as shown here-- that I've arrived at. The main issue is getting preview non-robots to keep the ### out of your analytics files. (For those who use it: does Google Preview show up in your GA records? I've occasionally wondered.)


filesMatch "^thisfolder/"

More "Been there. Done that." The <Files> and <FilesMatch> envelopes apply only to names of physical files, not to requests. Directory names can be used in <Directory> and <Location> -- and both of those can only be used in config files, not in htaccess.

LunaC




msg:4544462
 8:45 am on Feb 11, 2013 (gmt 0)

Thank you, that was a big help :)

I had the piwik folder blocked with robots.txt but it was like catnip to Google, they just couldn't resist some parts of it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved