Welcome to WebmasterWorld Guest from 54.197.171.28

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt code sensitive to missed spaces?

   
9:37 am on Jul 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Quick question about robots.txt files; are these 2 lines treated as the same or is the space required:

(1) disallow: /folder/
(2) disallow:/folder/

I know that number 1 is the correct way to present the code - but my question is would number 2 be ignored or will it do the same job?

Cheers
MG

12:22 pm on Jul 11, 2008 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



When dealing with robots.txt there is no room for error. I don't mean to scare you but I have seen some huge websites that earn millions become deindexed because of a typo in their robots.txt.

It is true that some search engine bots do a better job than others with error handling and can accommodate minor typos with no damage to your site. Why take the risk? Be careful and make sure your robots.txt validates 100% properly.

12:57 pm on Jul 11, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The only way to find out is to test... You first. :)

I once had a major problem with a third-tier search robot, because of a missing blank line at the end of the file. Since the definition of a "record" in robots.txt is that it ends with a blank line, it was understandable -- the robot considered that record to be "unclosed." But it came as a shock, nonetheless.

Jim

2:17 pm on Jul 11, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Looking at the repository of robots.txt files (and especially at the summary data of user-agents that people are blocking in their robots files) over at the BotSeer project, it is clear that a large percentage, certainly into double-digits, of robots.txt files are hosed in one way or another; often in multiple ways.

I would think that most bots do follow the old mantra of "Be liberal in what you accept, and conservative in what you send" but I would never like to test it out. The problem is obviously troubling to Google, as they have a whole section of WebMasterTools dedicated to verifying and checking your robots.txt file.

Google tripped me up a few years ago, when I tried something new at the time: [webmasterworld.com...]

2:26 pm on Jul 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A well programmed bot won't require that space to be present (and I believe robots.txt standard does not require space to be there), however it is best to use space as you don't want to take chances, it's not hard to add space and sleep well at night.
2:48 pm on Jul 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lord Majestic is correct:
[robotstxt.org...]