Forum Moderators: goodroi

Message Too Old, No Replies

Question about blocking 1 robot, but allow all others

         

capsmaster

5:57 am on Jun 16, 2005 (gmt 0)

10+ Year Member



Just so I don't screw this up, if I wanted to block yahoo but allow all others would this be the right code?

User-agent: Slurp
Disallow: /

Do I need to add anything else so all other robots still crawl my site?

Thanks in advance.

Sanenet

11:53 am on Jun 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, that looks fine.

Try adding
User-agent: *
Disallow:
at the top, this allows the bots to go anywhere (although most assume this by default?)

[robotstxt.org...]

capsmaster

5:31 pm on Jun 16, 2005 (gmt 0)

10+ Year Member



Thanks a bunch Sanenet!

Clint

1:56 pm on Jun 17, 2005 (gmt 0)



I was just about to ask this EXACT same question for SINGLE pages. ;)

Does anyone know the correct meta tag to stop the Yahoo bot from indexing/crawling SINGLE pages? Their site mentions <META NAME="robots" CONTENT="noindex"> but that of course is not going to work since that will block *ALL* bots. So, I need to know the meta tag JUST for Yahoo that I can add on certain pages. Or, can this/should this be done in the robots.txt file? I already have this in my robots file:

User-agent: *
Disallow:

Thanks.

Sanenet

2:36 pm on Jun 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do it in robots.txt

UserAgent:*
Disallow:

UserAgent: Slurp
Disallow: page1.htm
Disallow: page2.htm
Disallow: dir/dir2/page1.htm

Clint

5:48 pm on Jun 17, 2005 (gmt 0)



Hey thanks a bunch. ;)

Leave a space below this?
UserAgent:*
Disallow:

jdMorgan

3:48 am on Jun 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most robots will accept the record which contains their robot name or the first record that contains "*" whichever comes first. Some are smarter than that, but don't count on it.

Therefore you will usually want to put the robot-specific records first, and follow them with the wild-card robot record:

User-agent: Slurp
Disallow: /page1.htm
Disallow: /page2.htm
Disallow: /dir/dir2/page1.htm

User-agent: *
Disallow:


This disallows Slurp from fetching the indicated resources, and allows all other robots unrestricted access. If you reverse the order, then Slurp may fetch only the wild-card record, and assume that it too is free to fetch all resources.

All files should be specified starting with a slash, since robots.txt uses prefix-matching.

Each record should end with a blank line, including the last record, as shown.

"User-agent" and "Disallow" should be spelled and captialized exactly as shown to avoid problems.

After making any changes, validate your robots.txt file here [searchengineworld.com].

Jim

Clint

3:26 pm on Jun 18, 2005 (gmt 0)



Thanks a lot Jim for the info. :)