homepage Welcome to WebmasterWorld Guest from 54.235.227.60
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Robots.txt - grant access to all - how to?
BallroomDJ




msg:4589935
 4:25 pm on Jul 3, 2013 (gmt 0)

I would like everybody to be able to read my robots file, even if their user agent is blocked in my .htaccess.

Neither of the RewriteRule shown below allowed SkimBot (for example) to read robots. The SkimBot request for robots resulted in a 403 Forbidden due to SetEnvIfNoCase User-Agent .*SkimBot.* bad_bot. later in htaccess.

RewriteCond %{REQUEST_URI} robots\.txt [NC]
#RewriteRule .* http://www.example.com/robots.txt [R=301,L]
RewriteRule .* http://www.example.com/robots.txt [R=200,L]

Can this be done? Any assistance is greatly appreciated.

[edited by: phranque at 4:45 pm (utc) on Jul 3, 2013]
[edit reason] please use example.com [/edit]

 

lucy24




msg:4590029
 8:01 pm on Jul 3, 2013 (gmt 0)

I would like everybody to be able to read my robots file, even if their user agent is blocked in my .htaccess.

The easiest way is:
<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>
That takes care of anyone blocked via mod_authz, like your ordinary "Deny from..." IP or UA blocks. (That is, mod_setenvif followed by "Deny from env=something-nasty".)

If any of your blocks are done in mod_rewrite, you will also need a line that says

RewriteRule ^robots\.txt - [L]

Put this at the very beginning of all RewriteRules.

But wait! You may not even need to do this part. (The <Files> envelope is always necessary.) I don't know about other people, but all my access-control rules are constrained to requests for pages-- final / or .html --so the server doesn't have to evaluate the rules for other requests like images. So if you don't have any ordinary pages ending in .txt, a request for robots.txt will sail on through anyway.

And wait a bit more, because you're not done yet.

On the <Files> side, you also need to allow everyone to see your 403 page-- plus any required styles. If you're on shared hosting and you use their default filename for error documents, they've already taken care of this. Otherwise you'll need another <FilesMatch> envelope. Or put all your error documents in a separate directory and give it a separate htaccess that says "Allow from all" directive of its own.

On the mod_rewrite side, you need another - [L] exception to cover any requests for the error pages. You don't need to muck about with RequestUri; put that part into the body of the rule. You hardly ever need RequestUri unless you're making a negative match.

phranque




msg:4590092
 11:02 pm on Jul 3, 2013 (gmt 0)

welcome to WebmasterWorld, BallroomDJ!


The SkimBot request for robots resulted in a 403 Forbidden due to SetEnvIfNoCase

you should show your directives for this but it's likely that your 403 is preventing anything from happening with your mod_rewrite directives.

BallroomDJ




msg:4590847
 4:17 pm on Jul 6, 2013 (gmt 0)

Lucy24, the code worked great! Thanks so much, help is much appreciated.

Now, if I could just get msnbot-media to realize that my robots.txt file doesn't change every ten minutes I could reduce my server logs by 5%.

Thanks for the welcome. It's great to have folks willing to spend their time helping others.

lucy24




msg:4590880
 8:13 pm on Jul 6, 2013 (gmt 0)

Now, if I could just get msnbot-media to realize that my robots.txt file doesn't change every ten minutes I could reduce my server logs by 5%.

:) Trust me, you are not the only person to notice msn/bing's weird obsession with robots.txt.

golgesiz




msg:4602355
 8:34 am on Aug 16, 2013 (gmt 0)

Thank u lucy24

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved