homepage Welcome to WebmasterWorld Guest from 54.167.75.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Security Challenges
pageoneresults




msg:3582125
 4:00 pm on Feb 22, 2008 (gmt 0)

I have a question. Why would someone publish a robots.txt file that contains a complete Disallow: list of all secure login areas of the site? Or, why would someone list out an entire directory structure of sub-folders that they don't want the bots to traverse when the content in those folders is behind a login? Doesn't that present major security challenges?

 

jimbeetle




msg:3582148
 4:23 pm on Feb 22, 2008 (gmt 0)

Sure, and even the not quite really official robotstxt [robotstxt.org] states the same thing:

the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.


Just a matter of folks not really knowing the mechanics.

incrediBILL




msg:3615100
 6:44 am on Mar 31, 2008 (gmt 0)

Well, yes and no, as I could see reasons why you might want some content for registered members indexed in order to bring in more members, yet disallow other restricted areas from being indexed.

For instance a publisher that allows registered members only full access may allow Google to bypass the login to index partial teaser content to generate traffic yet block access to indexing the membership roster, assuming it were visible, or other things.

Without seeing the site it would be hard to justify what they did and the fact that they don't cloak the robots.txt to just the SE requesting it and let everyone else see these paths is definitely a security risk.

goodroi




msg:3617265
 3:49 pm on Apr 2, 2008 (gmt 0)

robots.txt is best used for dealing with duplicate content and other search engine issues. (which is why its called robots.txt and not security.txt :) if you want to secure information you should use a stronger solution like htaccess.

since robots.txt is publicly available you can use it to lay a bot trap to identify the bad bots and competitors looking to reverse engineer your site.
here is robots.txt file that i use:

User-agent: *
Disallow: /tracking/
Disallow: /system/
Disallow: /people-that-spy-on-robotstxt/
Disallow: /customerdata/

/customerdata/ is a fake folder. if helps me identify computers that i want to block from accessing my site. if anyone tries to peek into that folder which is not referenced from anywhere but robots.txt i dont want them on my site.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved