Welcome to WebmasterWorld Guest from 54.167.175.157

Forum Moderators: Ocean10000 & incrediBILL & phranque

Robots.txt - grant access to all - how to?

   
4:25 pm on Jul 3, 2013 (gmt 0)



I would like everybody to be able to read my robots file, even if their user agent is blocked in my .htaccess.

Neither of the RewriteRule shown below allowed SkimBot (for example) to read robots. The SkimBot request for robots resulted in a 403 Forbidden due to SetEnvIfNoCase User-Agent .*SkimBot.* bad_bot. later in htaccess.

RewriteCond %{REQUEST_URI} robots\.txt [NC]
#RewriteRule .* http://www.example.com/robots.txt [R=301,L]
RewriteRule .* http://www.example.com/robots.txt [R=200,L]

Can this be done? Any assistance is greatly appreciated.

[edited by: phranque at 4:45 pm (utc) on Jul 3, 2013]
[edit reason] please use example.com [/edit]

8:01 pm on Jul 3, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I would like everybody to be able to read my robots file, even if their user agent is blocked in my .htaccess.

The easiest way is:
<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>
That takes care of anyone blocked via mod_authz, like your ordinary "Deny from..." IP or UA blocks. (That is, mod_setenvif followed by "Deny from env=something-nasty".)

If any of your blocks are done in mod_rewrite, you will also need a line that says

RewriteRule ^robots\.txt - [L]

Put this at the very beginning of all RewriteRules.

But wait! You may not even need to do this part. (The <Files> envelope is always necessary.) I don't know about other people, but all my access-control rules are constrained to requests for pages-- final / or .html --so the server doesn't have to evaluate the rules for other requests like images. So if you don't have any ordinary pages ending in .txt, a request for robots.txt will sail on through anyway.

And wait a bit more, because you're not done yet.

On the <Files> side, you also need to allow everyone to see your 403 page-- plus any required styles. If you're on shared hosting and you use their default filename for error documents, they've already taken care of this. Otherwise you'll need another <FilesMatch> envelope. Or put all your error documents in a separate directory and give it a separate htaccess that says "Allow from all" directive of its own.

On the mod_rewrite side, you need another - [L] exception to cover any requests for the error pages. You don't need to muck about with RequestUri; put that part into the body of the rule. You hardly ever need RequestUri unless you're making a negative match.
11:02 pm on Jul 3, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, BallroomDJ!


The SkimBot request for robots resulted in a 403 Forbidden due to SetEnvIfNoCase

you should show your directives for this but it's likely that your 403 is preventing anything from happening with your mod_rewrite directives.
4:17 pm on Jul 6, 2013 (gmt 0)



Lucy24, the code worked great! Thanks so much, help is much appreciated.

Now, if I could just get msnbot-media to realize that my robots.txt file doesn't change every ten minutes I could reduce my server logs by 5%.

Thanks for the welcome. It's great to have folks willing to spend their time helping others.
8:13 pm on Jul 6, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Now, if I could just get msnbot-media to realize that my robots.txt file doesn't change every ten minutes I could reduce my server logs by 5%.

:) Trust me, you are not the only person to notice msn/bing's weird obsession with robots.txt.
8:34 am on Aug 16, 2013 (gmt 0)



Thank u lucy24
 

Featured Threads

Hot Threads This Week

Hot Threads This Month