Welcome to WebmasterWorld Guest from 54.166.54.215

Forum Moderators: goodroi

robots.txt validator warning suggestion

Missing newlines can cause problems, but in some cases can be detected.

   
12:02 pm on Apr 25, 2003 (gmt 0)

10+ Year Member


Hi there,

A suggestion regarding a new warning message: "Duplicate partial URLs detected, perhaps there is missing a newline between different record groups?" (or something like that).

An example file that should display this warning: http://www.mnot.net/robots.txt

The problem is that the file in question, according to the specification, should be interpreted as only two groups, one for "*" and one for "ia_archiver", but it is evident that the author intended otherwise.

The above warning should be triggered by duplicate occurences of the same partial URL within one group.

Another option would be to flag alternating lines of User-Agent and Disallow in one group, but that may be a personal style issue.

Regards,
Morten Frederiksen

4:42 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Morten,

Well-spotted - That robots.txt is a disaster!

Another problem I've seen is that some 'bots require a newline at the END of the file - A member here reported this problem awhile ago (maybe last year).

To simplify parsing, the script could simply check for a required newline between any occurance of "User-agent:" and a preceding "Disallow:". This eliminates the need to do any partial URL-matching, and would catch most errors like the ones in the page you cited.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month