Welcome to WebmasterWorld Guest from 54.147.44.13

Forum Moderators: goodroi

Message Too Old, No Replies

Security Challenges

     
4:00 pm on Feb 22, 2008 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts: 12166
votes: 51


I have a question. Why would someone publish a robots.txt file that contains a complete Disallow: list of all secure login areas of the site? Or, why would someone list out an entire directory structure of sub-folders that they don't want the bots to traverse when the content in those folders is behind a login? Doesn't that present major security challenges?
4:23 pm on Feb 22, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 26, 2002
posts:3292
votes: 6


Sure, and even the not quite really official robotstxt [robotstxt.org] states the same thing:

the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.


Just a matter of folks not really knowing the mechanics.
6:44 am on Mar 31, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14622
votes: 87


Well, yes and no, as I could see reasons why you might want some content for registered members indexed in order to bring in more members, yet disallow other restricted areas from being indexed.

For instance a publisher that allows registered members only full access may allow Google to bypass the login to index partial teaser content to generate traffic yet block access to indexing the membership roster, assuming it were visible, or other things.

Without seeing the site it would be hard to justify what they did and the fact that they don't cloak the robots.txt to just the SE requesting it and let everyone else see these paths is definitely a security risk.

3:49 pm on Apr 2, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


robots.txt is best used for dealing with duplicate content and other search engine issues. (which is why its called robots.txt and not security.txt :) if you want to secure information you should use a stronger solution like htaccess.

since robots.txt is publicly available you can use it to lay a bot trap to identify the bad bots and competitors looking to reverse engineer your site.
here is robots.txt file that i use:

User-agent: *
Disallow: /tracking/
Disallow: /system/
Disallow: /people-that-spy-on-robotstxt/
Disallow: /customerdata/

/customerdata/ is a fake folder. if helps me identify computers that i want to block from accessing my site. if anyone tries to peek into that folder which is not referenced from anywhere but robots.txt i dont want them on my site.