Forum Moderators: open

Message Too Old, No Replies

robots.txt and google

when I removed I got my deep crawl

         

EAHunt

12:42 am on Oct 3, 2002 (gmt 0)

10+ Year Member



User-agent: *
Disallow: /manual

is what was in my robots.txt, I haven't seen a deep crawl since I put it in, I removed it today and got my whole site crawled. Is there something wrong with my robots.txt

I don't want them to go to www.mysite.com/manual/

So what have I done wrong?

Kerrin

1:36 am on Oct 3, 2002 (gmt 0)

10+ Year Member



Your robots.txt is only blocking a file called "manual" in your root directory. If you want to block the directory add a trailing slash so it becomes:

User-agent: *
Disallow: /manual/

This shouldn't have stopped googlebot crawling your site.

jdMorgan

3:20 am on Oct 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



EAHunt,

The first deep crawls were reported yesterday [webmasterworld.com], so it may just be a coincindence.

If your robots.txt validates using Brett's robots.txt validator [searchengineworld.com], then you should be OK. Two non-obvious things I know of that can cause problems are CR/LF at the end of each line instead of newline(LF), and a missing terminal newline. Most robots will ignore these errors, but some don't.

Brett's robots.txt checker catches everything except the missing terminal new line. In case that's not clear, the last line in the robots.txt file should be blank except for a newline character.

Jim

ciml

6:03 pm on Oct 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld [webmasterworld.com], Kerrin.

"Disallow: /manual" disallows anything beginning with /manual, so it does forbid crawling of /manual/, /manual/whatever and even /manualwhatever.

EAHunt, I agree with Jim. It's bound to be coincidence.