Let's say I find this user agent in my Win2K/IIS website logs: Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)
Further, after some research let's say I come to the conclusion this is probably Webinator or another of Thunderstone's products.
So now I want to add them to my robots.txt file for a little while to see if they'll respect it otherwise I'll just ban their IP or domain.
What do I use for a user agent? Based on what I've seen of Brett's robots.txt file and the reading I've done about robots.txt, I don't think it's the whole entire user agent above like it would be in a browscap.ini file. Or is it? How do you determine the user agent for robots.txt unless, like some websites, there is a page about their robots including the user agent?
I appreciate your reply. And you're probably correct about this particular bot. Putting that aside for the moment, here's what I really want to know.
How do you determine the user agent for robots.txt unless, like some websites there is a page about their robots including the user agent?
I read in the tutorial on SearchEngineWorld that I should look in my logs for GETS to robots.txt and use the user agent it shows. But that doesn't hold up in all cases because, for example, "Googlebot-Image" is what's needed for the robots.txt file but that isn't what the actual user agent is in my logs.
I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs
Because you'd have to update robots.txt every time a new version of a robot was released. robots.txt is supposed to be simple protocol based on cooperation and communication. Nicknames are simple and clear: if a bot doesn't have a nickname, we know its operators are cooperating or communicating.
and how one goes about determining what user agent name to use in robots.txt.
Research, guesswork, and the counsel of your peers.
The "Ask Jeeves" bot is coming from directhit.com, so it might be a replacement for DirectHit Grabber. Grabber's exclusion name was "grabber". Try that and see what happens.
Likewise, the exclusion name for Webinator is/was just "webinator". Give it a shot.
If they don't work, all you can do is contact the bot owners and/or employ non-cooperative measures.