homepage Welcome to WebmasterWorld Guest from 54.161.175.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
User agent in robots.txt and browscap.ini...
Why is there a format different?
Pushycat




msg:1526513
 8:54 pm on May 29, 2002 (gmt 0)

Let's say I find this user agent in my Win2K/IIS website logs:
Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)

Further, after some research let's say I come to the conclusion this is probably Webinator or another of Thunderstone's products.

So now I want to add them to my robots.txt file for a little while to see if they'll respect it otherwise I'll just ban their IP or domain.

What do I use for a user agent? Based on what I've seen of Brett's robots.txt file and the reading I've done about robots.txt, I don't think it's the whole entire user agent above like it would be in a browscap.ini file. Or is it? How do you determine the user agent for robots.txt unless, like some websites, there is a page about their robots including the user agent?

 

wilderness




msg:1526514
 12:01 am on May 30, 2002 (gmt 0)

I wouldn't waste your time trying to tune your robots for
T-H-U-N-D-E-R-S-T-O-N-E

IMO the easiest and best method is (at Least for me)
deny from 64.208.
deny from 64.209.
deny from 64.210.
deny from 64.211.
deny from 64.212
deny from 64.213.
deny from 64.214.
deny from 64.215.

Pushycat




msg:1526515
 5:12 pm on May 30, 2002 (gmt 0)

I appreciate your reply. And you're probably correct about this particular bot. Putting that aside for the moment, here's what I really want to know.

How do you determine the user agent for robots.txt unless, like some websites there is a page about their robots including the user agent?

I read in the tutorial on SearchEngineWorld that I should look in my logs for GETS to robots.txt and use the user agent it shows. But that doesn't hold up in all cases because, for example, "Googlebot-Image" is what's needed for the robots.txt file but that isn't what the actual user agent is in my logs.

wilderness




msg:1526516
 10:05 pm on May 30, 2002 (gmt 0)

Below is a log line. There are various types of logs presented by hosts and servers.

216.200.130.204 - - [30/May/2002:03:57:20 -0700] "GET /mysite/mypgae.htm HTTP/1.0" 200 19516 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"

There are 7 fields in this file.
The last filed conatined in"" is the UA used by your visitor. OR at least in most instances.

This will help you with some UA's
[jafsoft.com...]

This will help some more
[robotstxt.org...]

Try a search at Google on user agent

[google.netscape.com...]

Pushycat




msg:1526517
 11:01 pm on May 30, 2002 (gmt 0)

The user agent you cited, "Mozilla/2.0 (compatible; Ask Jeeves)", is a perfect example of what I'm trying to figure out about the differences between a user agent in browscap.ini and robots.txt.

In my browscap.ini file the user agent for Ask Jeeves is just what you cited above, "Mozilla/2.0 (compatible; Ask Jeeves)".

But if I wanted to disallow part of my site to Ask Jeeves in robots.txt it's my understanding I would simply use "Ask Jeeves" as the user agent instead of the full user agent name as found in my logs.

I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs and how one goes about determining what user agent name to use in robots.txt.

mbauser2




msg:1526518
 11:39 pm on May 30, 2002 (gmt 0)

I don't userstand why robots.txt does not seem to use the actual user agent as found in my logs

Because you'd have to update robots.txt every time a new version of a robot was released. robots.txt is supposed to be simple protocol based on cooperation and communication. Nicknames are simple and clear: if a bot doesn't have a nickname, we know its operators are cooperating or communicating.


and how one goes about determining what user agent name to use in robots.txt.

Research, guesswork, and the counsel of your peers.

The "Ask Jeeves" bot is coming from directhit.com, so it might be a replacement for DirectHit Grabber. Grabber's exclusion name was "grabber". Try that and see what happens.

Likewise, the exclusion name for Webinator is/was just "webinator". Give it a shot.

If they don't work, all you can do is contact the bot owners and/or employ non-cooperative measures.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved