Msg#: 3582123 posted 4:00 pm on Feb 22, 2008 (gmt 0)
I have a question. Why would someone publish a robots.txt file that contains a complete Disallow: list of all secure login areas of the site? Or, why would someone list out an entire directory structure of sub-folders that they don't want the bots to traverse when the content in those folders is behind a login? Doesn't that present major security challenges?
Msg#: 3582123 posted 6:44 am on Mar 31, 2008 (gmt 0)
Well, yes and no, as I could see reasons why you might want some content for registered members indexed in order to bring in more members, yet disallow other restricted areas from being indexed.
For instance a publisher that allows registered members only full access may allow Google to bypass the login to index partial teaser content to generate traffic yet block access to indexing the membership roster, assuming it were visible, or other things.
Without seeing the site it would be hard to justify what they did and the fact that they don't cloak the robots.txt to just the SE requesting it and let everyone else see these paths is definitely a security risk.
robots.txt is best used for dealing with duplicate content and other search engine issues. (which is why its called robots.txt and not security.txt :) if you want to secure information you should use a stronger solution like htaccess.
since robots.txt is publicly available you can use it to lay a bot trap to identify the bad bots and competitors looking to reverse engineer your site. here is robots.txt file that i use:
/customerdata/ is a fake folder. if helps me identify computers that i want to block from accessing my site. if anyone tries to peek into that folder which is not referenced from anywhere but robots.txt i dont want them on my site.