Forum Moderators: goodroi

Message Too Old, No Replies

robot.txt Security

Can be read publically

         

kkonline

6:03 am on Aug 18, 2007 (gmt 0)

10+ Year Member



As you must be aware that robot.txt is a file on the root of the server which publishes the information the bots require regarding the site i.e. to prevent the bot indexing the private safe directories.

Now the situation...

I can just write robot.txt on the browser and can get to know all the names of private/protected directories and files existing to which the administrator of the site doesn't want any public access. {as it is a text file}

And now the site actual becomes more vulnerable to attacks as the internal protected directories and files names have been known and can be used by any hacker to hack it...

So what is the solution to allow bots to index the site leaving the protected files/directories and also not being vulnerable to attacks...

Tastatura

8:07 am on Aug 18, 2007 (gmt 0)

10+ Year Member



just for starters:
- one of the places listed in robots file can be a 'bait' (if bot goes there you ban it etc.)
- cloak robots file (serve 'real deal' only to bots... go to robots.txt file of WebmasterWorld and see what happens )

goodroi

2:51 am on Aug 19, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



robots.txt is intended to help you manage well behaved bots. it is not supposed to provide proper security. for example robots.txt is good for dealing with duplicate content not for protecting billing information.

if you have sensitive information it should be blocked with htaccess or isapi. put it behind password protection. if possible dont even put sensitive information online.

g1smd

10:08 am on Aug 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots.txt is not there to stop URLs appearing in the index. It is there to stop the content at that URL being fetched.

The page can setill appear as a URL-only listing in the SERPs.

Use <meta name="robots" content="noindex"> if you want the page to not appear in the index at all.