Forum Moderators: goodroi

Message Too Old, No Replies

Validation of robots.txt

         

designei

10:29 pm on Apr 16, 2005 (gmt 0)



How can I validate the following html pages in my root html directory?

designandpermittools, publicsectorexperience, publicschools, CONTACTUS, contactus?

I also have another robots.txt file with the following:

User-agent: *
Disallow:
[sitename.com...] [sitename.com...] [sitename.com...] [sitename.com...]

According to searchengineworld.com/cgi-bin/robotcheck.cgi

There are errors in my robots.txt as follows:

Syntax check robots.txt on [sitename.com...] (197 bytes)
Line Severity Code
3 ERROR Invalid fieldname:
[sitename.com...] [sitename.com...] [sitename.com...] [sitename.com...]
We're sorry, this robots.txt does NOT validate.
Warnings Detected: 1
Errors Detected: 1
3 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
[sitename.com...] [sitename.com...] [sitename.com...] [sitename.com...]

Note: Our website is not called sitename.com. I hid it by replacing the real url with sitename.com

robots.txt source code for [example.com...]
Line Code
1 User-agent: *
2 Disallow:
3 [example.com...] [example.com...] [example.com...] [example.com...]

[edited by: ThomasB at 11:23 am (utc) on April 18, 2005]
[edit reason] examplified [/edit]

Span

10:18 am on Apr 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi designei, welcome to the forums.

if you don't want those pages spidered (by robots that read and obey robots.txt) you need to write your robots.txt like this:


User-agent: *
Disallow: designandpermittools.html
Disallow: publicsectorexperience.html
Disallow: publicschools.html
Disallow: contact.html

Here is a tutorial: [searchengineworld.com ]

To validate your html use the W3 validator [validator.w3.org]

Robert Thivierge

10:50 am on Apr 17, 2005 (gmt 0)

10+ Year Member



I think you have a little typo, and should pre-pend a slash, to have it work for all "standard" robots:

User-agent: *
Disallow: /designandpermittools.html
Disallow: /publicsectorexperience.html
Disallow: /publicschools.html
Disallow: /contact.html

Side note: think about putting these files in their own directory, and excluding the directory. Some "bad" robots will take your list of files as an "invitation".