Forum Moderators: open
They will never trip a robots.txt trapdoor :( I cannot seem to think of any way, other than an exclusive list of bots that should be allowed on the site and hiding a link that is NOT protected by robots.txt and evaluate who trips it.
On that note, is there maybe a top 10 good bot list around here with IP + user-agent?
Would it be bad practice or be difficult to only allow the following bots/spiders? (no particular order)
InfoSeek
Excite
Fast/AllTheWeb
Alta Vista
Lycos
Inktomi
WiseNut
Ask Jeeves/Teoma
Northern Light
Alexa
Gigablast
A little bit of creative site searching pulled up this gem:
Spider Trap needs GOOD Spider list [webmasterworld.com].
Pendanticist.
Am I the only one who wishes Brett's site search was a little more powerful and flexible?
I often find myself using the google toolbar for "search site" instead :(