Forum Moderators: goodroi
The choice to either allow or deny a robot access seems restrictive. The use case might be that a site owner may want to be indexed by, say, the MSN search facility, but wants to limit this crawler to no more than 1000 HTTP requests a week, or 10Mbytes, a day, or 500 files a week. Presumably the ROBOTS.TXT syntax could be extended to allow entries like:
User-agent: msnbot
Disallow: more_than 1000 HTTP per week
Disallow: more_than 10 Mb per day
Disallow: more_than 500 files per week
This would give webmasters finer grained control over the resource loading placed upon websites by particular crawlers.