Robot question

Forum Moderators: phranque

Message Too Old, No Replies

Robot question

TemiTheOne

5:42 pm on May 29, 2003 (gmt 0)

Hello Gurus,
Could you please help me out with the following questions:

1. si6001.inktomisearch.com - - [28/May/2003:02:43:11 +0100] "GET /robots.txt HTTP/1.0" 404 1524 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]

Does it mean inktomi could not find my robot.txt file? (I have a robot.txt file at the root folder of my site)

11. Is there a URL where I can find the meaning of robot codes?

111. If there is a robot.txt file in the root of a site and robot tag in an HTML file, does robot.txt over ride robot tag?

Thanks you in advance

jdMorgan

5:54 pm on May 29, 2003 (gmt 0)

TemiTheOne,

1) Yes, there is a problem there. It's looking for robots.txt, not robot.txt as you stated.

2) A standard for Robots Exclusion [robotstxt.org]

3) Assuming that the robots obeys, a robots.txt file in the site root will override the html tags if the page is excluded in robots.txt. Since the page is excluded, the robot will not fetch it, and therefore cannot "see" the html robots tag. If the page is not disallowed in robots.txt, the robot can then fetch the page and read the html robots tag.

Note that robots.txt is often ignored by "bad" robots, and there is therefore no use in trying to use robots.txt to block non-compliant robots; Other means must be used to stop them.

Jim

TemiTheOne

6:31 pm on May 29, 2003 (gmt 0)

Jim,
Thanks very much for your prompt reply, its very helpful.

Temi

Robot question

Robot question

TemiTheOne

jdMorgan

TemiTheOne

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week