homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Should this file be "Public"
Is it good practice to have the robots.txt file.......
Propools




msg:3460407
 4:04 pm on Sep 25, 2007 (gmt 0)

Is it good practice to have the robots.txt file available for everyone on the web to see?
Because then it may be advantageous for those less scrupulous people to see which directories, if any, you don't want the robots to crawl.

If it's possible to only allow robots to see this file, then what is the best method for doing this? :)

 

Matt Probert




msg:3460506
 5:22 pm on Sep 25, 2007 (gmt 0)

You seem to misunderstand the "robots.txt" file. This file is a purely voluntary request to robots. Many robots ignore it, hackers certainly aren't going to worry about it, if a directory has links into it, it will be found by those who so wish, irrespective of any robots.txt file.

Matt

Propools




msg:3460509
 5:24 pm on Sep 25, 2007 (gmt 0)

No, I understand that it's a voluntary file. I would just like to voluntarily put it out there but also be able to limit who see's it. ;)

goodroi




msg:3461316
 11:58 am on Sep 26, 2007 (gmt 0)

You can use IP delivery aka cloaking to serve the robots.txt to bots coming from google/yahoo/msn ip addresses and show all other ip addresses a 404 error. This is a little tricky since the ip addresses search engines use change over time and you need to maintain it.

A simpler solution which I prefer to use myself is to use htaccess to block sensitive areas of my website. I use robots.txt more to deal with duplicate content issues.

You also can make a bot trap. Add a folder to the robots.txt file and do not list it anywhere else. Then wait and see what hits that folder and then ban that ip address. Since the only way to find that folder is from robots.txt you know it is a misbehaving bot or a hacker - either way you don't want it on your site.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved