Welcome to WebmasterWorld Guest from 54.226.246.160

Forum Moderators: goodroi

Message Too Old, No Replies

Security Challenges

     

pageoneresults

4:00 pm on Feb 22, 2008 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I have a question. Why would someone publish a robots.txt file that contains a complete Disallow: list of all secure login areas of the site? Or, why would someone list out an entire directory structure of sub-folders that they don't want the bots to traverse when the content in those folders is behind a login? Doesn't that present major security challenges?

jimbeetle

4:23 pm on Feb 22, 2008 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Sure, and even the not quite really official robotstxt [robotstxt.org] states the same thing:

the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.


Just a matter of folks not really knowing the mechanics.

incrediBILL

6:44 am on Mar 31, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Well, yes and no, as I could see reasons why you might want some content for registered members indexed in order to bring in more members, yet disallow other restricted areas from being indexed.

For instance a publisher that allows registered members only full access may allow Google to bypass the login to index partial teaser content to generate traffic yet block access to indexing the membership roster, assuming it were visible, or other things.

Without seeing the site it would be hard to justify what they did and the fact that they don't cloak the robots.txt to just the SE requesting it and let everyone else see these paths is definitely a security risk.

goodroi

3:49 pm on Apr 2, 2008 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



robots.txt is best used for dealing with duplicate content and other search engine issues. (which is why its called robots.txt and not security.txt :) if you want to secure information you should use a stronger solution like htaccess.

since robots.txt is publicly available you can use it to lay a bot trap to identify the bad bots and competitors looking to reverse engineer your site.
here is robots.txt file that i use:

User-agent: *
Disallow: /tracking/
Disallow: /system/
Disallow: /people-that-spy-on-robotstxt/
Disallow: /customerdata/

/customerdata/ is a fake folder. if helps me identify computers that i want to block from accessing my site. if anyone tries to peek into that folder which is not referenced from anywhere but robots.txt i dont want them on my site.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month