homepage Welcome to WebmasterWorld Guest from 54.234.141.47
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
robots.txt VS .htaccess
is there any difference
needinfo

10+ Year Member



 
Msg#: 201 posted 1:08 pm on May 16, 2003 (gmt 0)


I need to create a couple of sites which must not be accessed by all search engine robots except for one. Does anybody have any comments on which would be the best method to use. I personally would prefer to use the robots.txt file method because I already know how to do that.
Can i be 100% sure that it would work this way.

 

rogerd

WebmasterWorld Administrator rogerd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 201 posted 1:13 pm on May 16, 2003 (gmt 0)

Consider robots.txt a suggestion that will be ignored by rogue bots, and may be ignored even by normally benign bots.

ukgimp

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 201 posted 1:17 pm on May 16, 2003 (gmt 0)

The robots.txt is used to tell well behaved spiders where they should and should not index so you probably need one of those. The key heer is well behaved, a bad bot will ignor your robots.txt and go for it anyway. If you have problems from a spider that is misbehaving (looking in secret areas, overload of the download etc) you would use your htaccess to block it.

So you need both.

Cheers

rogerd

WebmasterWorld Administrator rogerd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 201 posted 1:20 pm on May 16, 2003 (gmt 0)

Needinfo, the key word in your post is "must" - if keeping the content out of public search engines is important, then you should do this at the server level. I'd also add a ROBOTS NOINDEX meta tag to the pages in question, but this, too, is a suggestion to the bot.

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 201 posted 10:16 pm on May 17, 2003 (gmt 0)

If you need to be 100% sure, your best method is to block at the server level with mod_rewrites, or something like Apache::Block_IP... then, .htaccess. Robots.txt will not INSURE they do not come.

dave

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved