Forum Moderators: open

Message Too Old, No Replies

Letting just Google crawl

robots.txt

         

xlcus

1:03 pm on Dec 29, 2002 (gmt 0)

10+ Year Member



I have a large site that I want Google to crawl, but I don't want any other search engines to index. I have the following robots.txt file in place...

User-agent: *
Disallow: /

User-agent: googlebot
Disallow:

Is this correct? I'm paranoid that I've got it wrong somehow and that google will miss me on the next crawl. I suddenly thought that the second (googlebot) section should be first, as the bot might see the wildcard first and ignore the rest of the file. Should I put the googlebot section first? Or am I just getting pre-update jitters? ;-)

hetzeld

1:18 pm on Dec 29, 2002 (gmt 0)

10+ Year Member



Hi xlcus,

I'm pretty sure that it won't achieve what you're expecting.

According to [robotstxt.org...] ,
the wildcard '*' is a special token, meaning "any other User-agent" .

My understanding is that it should come at the end.

Dan

xlcus

4:40 pm on Dec 30, 2002 (gmt 0)

10+ Year Member



I swapped them round just in case.
Looks like Googlebot's happy with it too... Phew! :)

64.68.82.38 - [30/Dec/2002:16:16:24] "GET /robots.txt HTTP/1.0"
64.68.82.38 - [30/Dec/2002:16:16:25] "GET / HTTP/1.0"