Forum Moderators: open
Its my understanding that the robots.txt is used to stop some bots crawling all or some parts of your site, and as i want all to be crawled i have never made one and its not done my sites any harm at all.
Would like to hear other reasons for needing one, i.e. that it helps to get them crawled
1. The first thing a robot requests is indeed that file, and if it does not find it you will see a 404 mesage in your logs.
2. Badly configured web servers sometimes send a 403 code (access forbidden) when there is no robots.txt file instead of the correct 404 (file not found). The standard "suggests" (it does not require) that a site not be visited under those conditions, so you might be accidentally keeping robots out.
About a year ago Google decided to treat 403 like 404 and spider the site anyway.
All in all I see no good reason for telling a robot that it can do what it planned to do. The robots.txt file is designed to keep robots out :)
just my observations!
also tested with three other websites!
humpingdan-
> Can somebody tell me what that is as I might as well put one up.
robots.txt to allow all robots to access all files:
# Robots exclusion file for widgets.com (comment lines start with "#" character)
User-agent: *
Disallow:
Ref: A Standard for Robot Exclusion [robotstxt.org]
Jim
based on expereince through a number of domains, ive noticed that im only getting listed once a robots.txt is inplace!
Some sites are configured to not return an error page but redirect to the home page when a nonexistent page is requested (not a good idea IMO but apparently done by some webmasters in order to retain traffic from broken links).
A request to /robots.txt serves a redirect to a HTML page for these servers. What the spider's robots.txt parser makes of this unexpected input is left to the spider's implementation - it might default to 'everything allowed' or to 'everything denied', or if badly programmed just crash.