homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

Hiding robots.txt from browsers
getting 501 error

 6:47 pm on Nov 7, 2007 (gmt 0)

Hello ,

I am trying to hide robots.txt from browsers but not from robots.

I am using this code in my htaccess file:

RewriteCond {HTTP_USER_AGENT} ^Mozilla
RewriteCond %{HTTP_USER_AGENT}!(Slurp¦surfsafely)
RewriteRule ^robots\.txt$ /someotherfile [L]

But then i get 501 error.

Something wrong with the code?




 3:24 pm on Nov 9, 2007 (gmt 0)

I'm sorry, I have to ask what you are trying to accomplish here.

Anyone who knows what a robots.txt file is and wants to read yours will be able to just spoof the user agent. I don't think that's any kind of secret hacker knowledge. And regular users will never see the robots.txt file.

If you really want to down your robots.txt file down, you may have better luck by allowing access only to IP addresses of known spiders.


 3:55 pm on Nov 9, 2007 (gmt 0)

I agree. robots.txt and your custom 403 error page are two files which should be universally-accessible, even to banned IP addresses and user-agents. The reasoning here is that if robots.txt is inaccessible, robots are likely to interpret that as meaning that the entire site may be spidered. And of course, if you return a 403 response when a banned user-agent tries to access your custom 403 page, then your server ends up in a loop.

That said, I'm not sure about a 501-Not Implemented error, but the missing "%" in your first RewriteCond may cause a 500-Server Error.



 5:42 pm on Nov 9, 2007 (gmt 0)

In fact i want to hide it from my competitors.

I derived hiding robots.txt from here:



 9:35 pm on Nov 9, 2007 (gmt 0)

Be prepared to monitor it closely then, and do not make any mistakes with user-agents or IP address ranges. Check the major search engine spider IP address ranges at least once every week so you don't block them and lose your rankings. This is cloaking, and cloaking successfully is a full-time job.

Just my opinion, but I think there are better things to spend your time on. Why not let your competitors see your robots.txt, but put a few extra Disallow entries in there that don't really exist, and are not linked from anywhere on the Web? Then if you ever see an access to one of those Disallowed fake URL-prefixes, you can rewrite it to a script that bans the IP address. ;)



 9:39 pm on Nov 9, 2007 (gmt 0)

Cool trick ..

I will try it.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved