Welcome to WebmasterWorld Guest from 54.196.231.129

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

robots.txt VS .htaccess

is there any difference

     
1:08 pm on May 16, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 13, 2003
posts:192
votes: 0



I need to create a couple of sites which must not be accessed by all search engine robots except for one. Does anybody have any comments on which would be the best method to use. I personally would prefer to use the robots.txt file method because I already know how to do that.
Can i be 100% sure that it would work this way.
1:13 pm on May 16, 2003 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9685
votes: 0


Consider robots.txt a suggestion that will be ignored by rogue bots, and may be ignored even by normally benign bots.
1:17 pm on May 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 6, 2001
posts:2213
votes: 0


The robots.txt is used to tell well behaved spiders where they should and should not index so you probably need one of those. The key heer is well behaved, a bad bot will ignor your robots.txt and go for it anyway. If you have problems from a spider that is misbehaving (looking in secret areas, overload of the download etc) you would use your htaccess to block it.

So you need both.

Cheers

1:20 pm on May 16, 2003 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9685
votes: 0


Needinfo, the key word in your post is "must" - if keeping the content out of public search engines is important, then you should do this at the server level. I'd also add a ROBOTS NOINDEX meta tag to the pages in question, but this, too, is a suggestion to the bot.
10:16 pm on May 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 1, 2002
posts:774
votes: 0


If you need to be 100% sure, your best method is to block at the server level with mod_rewrites, or something like Apache::Block_IP... then, .htaccess. Robots.txt will not INSURE they do not come.

dave