Forum Moderators: goodroi

Message Too Old, No Replies

Block all spiders, but only allow well known spiders?

         

foxfox

2:37 pm on Dec 2, 2006 (gmt 0)

10+ Year Member



Is it good to have a robots.txt, which block all spiders in the first line, but allow some well-know spiders one by one afterward?

jdMorgan

3:19 pm on Dec 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, reverse the order; Many spiders will accept the first record that matches their name or "*" -- whichever comes first. So list the good ones by name first, then deny the rest with a catch-all record.

Google and Yahoo seem to look for a 'best match', so this is not a problem for them. But for other less-sophisticated spiders, it's best to take a lowest-common-denominator approach.

Example:

User-agent: googlebot
Disallow: /cgi-bin

User-agent: slurp
Disallow: /cgi-bin

User-agent: msnbot
Disallow: /cgi-bin

User-agent: teoma
Disallow: /cgi-bin

User-agent: *
Disallow: /

Jim