First things first, unless you want your e-mail flooded... :)
What is the correct User-agent string for use in robots.txt?
And where is this information available on the AJ site?
How about Teoma? - Is the User-agent "Teoma-agent" retired now?
I too have had disallowed pages crawled, but have not yet made the assumption that I've got everything correct on my end. I strongly suggest you consider putting pages on your AJ and Teoma Web sites to deal with the robots exclusion protocol and specify your User-agent names on those pages. If such pages exist, I certainly can't find them. Webmasters cannot assume that the UA string for robots exclusion is the same as the UA string in their site access logs - that rule has been broken by too many search providers to be considered a rule, and demonstrably doesn't work with AJ.
The only way I've been able to keep "Ask Jeeves/Teoma" from getting caught in my bad-bot traps has been to feed that User-agent a blank link-to-home page instead of disallowed content. It adds complication to my sites, it is technically cloaking, it slows down the servers for all requests, and I don't like doing it.
For User-Agent strings in robot.txt, I've tried:
I've had no problems with other legitimate robots and my robots.txt files validate with every validation tool I can find. However, they are non-trivial files, due to the different ways that search engines treat disallows in robots.txt and on-page meta-robots tag exclusions, and combinations of the two. So maybe it is on my end, and the UA string is the most suspect.
Please publish this info!
Thanks for your time,