homepage Welcome to WebmasterWorld Guest from 54.161.200.144
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt validator warning suggestion
Missing newlines can cause problems, but in some cases can be detected.
mortenf




msg:1527692
 12:02 pm on Apr 25, 2003 (gmt 0)
Hi there,

A suggestion regarding a new warning message: "Duplicate partial URLs detected, perhaps there is missing a newline between different record groups?" (or something like that).

An example file that should display this warning: http://www.mnot.net/robots.txt

The problem is that the file in question, according to the specification, should be interpreted as only two groups, one for "*" and one for "ia_archiver", but it is evident that the author intended otherwise.

The above warning should be triggered by duplicate occurences of the same partial URL within one group.

Another option would be to flag alternating lines of User-Agent and Disallow in one group, but that may be a personal style issue.

Regards,
Morten Frederiksen

 

jdMorgan




msg:1527693
 4:42 pm on Apr 25, 2003 (gmt 0)

Morten,

Well-spotted - That robots.txt is a disaster!

Another problem I've seen is that some 'bots require a newline at the END of the file - A member here reported this problem awhile ago (maybe last year).

To simplify parsing, the script could simply check for a required newline between any occurance of "User-agent:" and a preceding "Disallow:". This eliminates the need to do any partial URL-matching, and would catch most errors like the ones in the page you cited.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved