Forum Moderators: goodroi
is the simplest
User-agent: *
Disallow:
I would just like to use some thing like index or archive all pages, but can find no valid postive values for this file all standards I read are to do with exclusion.
I simply wish to return a robots.txt to tell all bots especially googebot that they are welcome to spider my entire site, in preference to returning my custom 404 page.
[edited by: Woz at 3:06 am (utc) on Oct. 3, 2004]
[edit reason] No URLs please, see TOS#13 [/edit]
User-agent: *
Disallow:
While you say you don't want to exclude some things it is a good idea to exclude a load of bots that will come along and cause pain: You can use this sites robot.txt, delete any bots you do want to crawl and delete the end section which is site specific
You are right their are no postive values such as "Allow": These are two ways to get round it
To exclude all files except one
The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/docs/
Alternatively you can explicitly disallow all disallowed pages:
User-agent: *
Disallow: /~joe/private.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html
User-agent: *
Disallow:
if you wanted to allow all filetypes to be served.... the robots.txt ... would be:
User-Agent: *
Allow: /
[google.com...]
In case such things may happen elsewhere ...better to have than to have not