Forum Moderators: goodroi
A suggestion regarding a new warning message: "Duplicate partial URLs detected, perhaps there is missing a newline between different record groups?" (or something like that).
An example file that should display this warning: http://www.mnot.net/robots.txt
The problem is that the file in question, according to the specification, should be interpreted as only two groups, one for "*" and one for "ia_archiver", but it is evident that the author intended otherwise.
The above warning should be triggered by duplicate occurences of the same partial URL within one group.
Another option would be to flag alternating lines of User-Agent and Disallow in one group, but that may be a personal style issue.
Regards,
Morten Frederiksen
Well-spotted - That robots.txt is a disaster!
Another problem I've seen is that some 'bots require a newline at the END of the file - A member here reported this problem awhile ago (maybe last year).
To simplify parsing, the script could simply check for a required newline between any occurance of "User-agent:" and a preceding "Disallow:". This eliminates the need to do any partial URL-matching, and would catch most errors like the ones in the page you cited.
Jim