Forum Moderators: phranque
1. si6001.inktomisearch.com - - [28/May/2003:02:43:11 +0100] "GET /robots.txt HTTP/1.0" 404 1524 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
Does it mean inktomi could not find my robot.txt file? (I have a robot.txt file at the root folder of my site)
11. Is there a URL where I can find the meaning of robot codes?
111. If there is a robot.txt file in the root of a site and robot tag in an HTML file, does robot.txt over ride robot tag?
Thanks you in advance
1) Yes, there is a problem there. It's looking for robots.txt, not robot.txt as you stated.
2) A standard for Robots Exclusion [robotstxt.org]
3) Assuming that the robots obeys, a robots.txt file in the site root will override the html tags if the page is excluded in robots.txt. Since the page is excluded, the robot will not fetch it, and therefore cannot "see" the html robots tag. If the page is not disallowed in robots.txt, the robot can then fetch the page and read the html robots tag.
Note that robots.txt is often ignored by "bad" robots, and there is therefore no use in trying to use robots.txt to block non-compliant robots; Other means must be used to stop them.
Jim