Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
Further, after some research let's say I come to the conclusion this is probably Webinator or another of Thunderstone's products.
So now I want to add them to my robots.txt file for a little while to see if they'll respect it otherwise I'll just ban their IP or domain.
What do I use for a user agent? Based on what I've seen of Brett's robots.txt file and the reading I've done about robots.txt, I don't think it's the whole entire user agent above like it would be in a browscap.ini file. Or is it? How do you determine the user agent for robots.txt unless, like some websites, there is a page about their robots including the user agent?
How do you determine the user agent for robots.txt unless, like some websites there is a page about their robots including the user agent?
I read in the tutorial on SearchEngineWorld that I should look in my logs for GETS to robots.txt and use the user agent it shows. But that doesn't hold up in all cases because, for example, "Googlebot-Image" is what's needed for the robots.txt file but that isn't what the actual user agent is in my logs.
18.104.22.168 - - [30/May/2002:03:57:20 -0700] "GET /mysite/mypgae.htm HTTP/1.0" 200 19516 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"
There are 7 fields in this file.
The last filed conatined in"" is the UA used by your visitor. OR at least in most instances.
This will help you with some UA's
This will help some more
Try a search at Google on user agent
In my browscap.ini file the user agent for Ask Jeeves is just what you cited above, "Mozilla/2.0 (compatible; Ask Jeeves)".
But if I wanted to disallow part of my site to Ask Jeeves in robots.txt it's my understanding I would simply use "Ask Jeeves" as the user agent instead of the full user agent name as found in my logs.
I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs and how one goes about determining what user agent name to use in robots.txt.
I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs
Because you'd have to update robots.txt every time a new version of a robot was released. robots.txt is supposed to be simple protocol based on cooperation and communication. Nicknames are simple and clear: if a bot doesn't have a nickname, we know its operators are cooperating or communicating.
and how one goes about determining what user agent name to use in robots.txt.
Research, guesswork, and the counsel of your peers.
The "Ask Jeeves" bot is coming from directhit.com, so it might be a replacement for DirectHit Grabber. Grabber's exclusion name was "grabber". Try that and see what happens.
Likewise, the exclusion name for Webinator is/was just "webinator". Give it a shot.
If they don't work, all you can do is contact the bot owners and/or employ non-cooperative measures.