| 12:01 am on May 30, 2002 (gmt 0)|
I wouldn't waste your time trying to tune your robots for
IMO the easiest and best method is (at Least for me)
deny from 64.208.
deny from 64.209.
deny from 64.210.
deny from 64.211.
deny from 64.212
deny from 64.213.
deny from 64.214.
deny from 64.215.
| 5:12 pm on May 30, 2002 (gmt 0)|
I appreciate your reply. And you're probably correct about this particular bot. Putting that aside for the moment, here's what I really want to know.
How do you determine the user agent for robots.txt unless, like some websites there is a page about their robots including the user agent?
I read in the tutorial on SearchEngineWorld that I should look in my logs for GETS to robots.txt and use the user agent it shows. But that doesn't hold up in all cases because, for example, "Googlebot-Image" is what's needed for the robots.txt file but that isn't what the actual user agent is in my logs.
| 10:05 pm on May 30, 2002 (gmt 0)|
Below is a log line. There are various types of logs presented by hosts and servers.
18.104.22.168 - - [30/May/2002:03:57:20 -0700] "GET /mysite/mypgae.htm HTTP/1.0" 200 19516 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"
There are 7 fields in this file.
The last filed conatined in"" is the UA used by your visitor. OR at least in most instances.
This will help you with some UA's
This will help some more
Try a search at Google on user agent
| 11:01 pm on May 30, 2002 (gmt 0)|
The user agent you cited, "Mozilla/2.0 (compatible; Ask Jeeves)", is a perfect example of what I'm trying to figure out about the differences between a user agent in browscap.ini and robots.txt.
In my browscap.ini file the user agent for Ask Jeeves is just what you cited above, "Mozilla/2.0 (compatible; Ask Jeeves)".
But if I wanted to disallow part of my site to Ask Jeeves in robots.txt it's my understanding I would simply use "Ask Jeeves" as the user agent instead of the full user agent name as found in my logs.
I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs and how one goes about determining what user agent name to use in robots.txt.
| 11:39 pm on May 30, 2002 (gmt 0)|
|I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs |
Because you'd have to update robots.txt every time a new version of a robot was released. robots.txt is supposed to be simple protocol based on cooperation and communication. Nicknames are simple and clear: if a bot doesn't have a nickname, we know its operators are cooperating or communicating.
and how one goes about determining what user agent name to use in robots.txt.
Research, guesswork, and the counsel of your peers.
The "Ask Jeeves" bot is coming from directhit.com, so it might be a replacement for DirectHit Grabber. Grabber's exclusion name was "grabber". Try that and see what happens.
Likewise, the exclusion name for Webinator is/was just "webinator". Give it a shot.
If they don't work, all you can do is contact the bot owners and/or employ non-cooperative measures.