Forum Moderators: goodroi

Message Too Old, No Replies

robots4.txt -- I'm confused

Confused about what is allowed in this file

         

Goose_68

5:15 pm on May 24, 2005 (gmt 0)

10+ Year Member



Hello all,

This is my first post as a new user so bear with me... I imagine this has been answered before, but I've searched around here and google and can't seem to find my question.

I was looking at the suggested robots4.txt file and its list of 'nice-guy' spiders... I saw the other thread of the guy who thought it was blocking googlebot but if 'Disallow: ' without any parameters means allow all, programs like 'EmailSiphon' are good-bots? Im not familiar with most of the bots on this list, so Im leary about putting this in place on my site with items like this being allowed. Can someone assuage my fears?

Thanks much,
Goose_68

ThomasB

8:39 pm on May 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Goose_68, first of all welcome to WebmasterWorld!

I guess you mean the robots.txt file in your post. Talking about nice and bad bots it's important to know that most of the bad bots (email/content harvesters ...) don't obey robots.txt anyway. You should look into other ways to keep them out of your site (.htaccess for example) to keep them out of your site in case you want to achieve blocking them.

Goose_68

8:46 pm on May 24, 2005 (gmt 0)

10+ Year Member



Thanks ThomasB,

Yes, I realize it has limited use for keeping bad bots out, and I am reading up on bad-bot traps and honeypots and such... I will definately do those too, but do you typically put this in place anyway as a minimal first screen?

I was mainly double-checking that items on this list are really good bots... something like 'EmailSiphon', which is on this list, makes me wonder:
[searchengineworld.com...]

Goose