Welcome to WebmasterWorld Guest from 54.156.92.138

Forum Moderators: goodroi

Message Too Old, No Replies

Should this file be "Public"

Is it good practice to have the robots.txt file.......

     
4:04 pm on Sep 25, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 8, 2004
posts: 563
votes: 1


Is it good practice to have the robots.txt file available for everyone on the web to see?
Because then it may be advantageous for those less scrupulous people to see which directories, if any, you don't want the robots to crawl.

If it's possible to only allow robots to see this file, then what is the best method for doing this? :)

5:22 pm on Sept 25, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 11, 2004
posts:1014
votes: 0


You seem to misunderstand the "robots.txt" file. This file is a purely voluntary request to robots. Many robots ignore it, hackers certainly aren't going to worry about it, if a directory has links into it, it will be found by those who so wish, irrespective of any robots.txt file.

Matt

5:24 pm on Sept 25, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 8, 2004
posts: 563
votes: 1


No, I understand that it's a voluntary file. I would just like to voluntarily put it out there but also be able to limit who see's it. ;)
11:58 am on Sept 26, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3264
votes: 217


You can use IP delivery aka cloaking to serve the robots.txt to bots coming from google/yahoo/msn ip addresses and show all other ip addresses a 404 error. This is a little tricky since the ip addresses search engines use change over time and you need to maintain it.

A simpler solution which I prefer to use myself is to use htaccess to block sensitive areas of my website. I use robots.txt more to deal with duplicate content issues.

You also can make a bot trap. Add a folder to the robots.txt file and do not list it anywhere else. Then wait and see what hits that folder and then ban that ip address. Since the only way to find that folder is from robots.txt you know it is a misbehaving bot or a hacker - either way you don't want it on your site.