Welcome to WebmasterWorld Guest from 23.20.241.155

Forum Moderators: goodroi

Message Too Old, No Replies

how to disallow all robots except Googlebot, Yahoo Slurp and MSNBot?

     
10:29 am on Apr 13, 2007 (gmt 0)

5+ Year Member



I only want Googlebot, Yahoo Slurp and MSNBot to crawl my site. How do i do that?
12:51 am on Apr 15, 2007 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



robots.txt is for blacklisting bots you don't want, and as such is can't do whitelisting just for bots you choose.

The only way to do it is to have a dynamic robots.txt file which displays a disallow to any request not from those you want whitelisted. This thread [webmasterworld.com] explains the basic idea.

2:17 am on Apr 15, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Maybe I missed something here, but...

# robots.txt - Disallow Googlebot, msnbot, and slurp for NO files, and disallow all others for ALL files.
#
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

User-agent: Slurp
Disallow:

User-agent: *
Disallow: /

# end robots.txt


Jim
6:50 pm on May 1, 2007 (gmt 0)

5+ Year Member



Sorry to recommence this rubique but I have a similar question.

My current robots.txt is this.

User-agent: *
Disallow: /

It has removed all my problems with robots but given me another one. My website traffic has very much dropped.

I noticed that google and yahoo were have had 90% of my search engine. And I truely only need to stop a total of 4 or 5 robots. (It does not appear to be worthy of the problem to make a dynamic script.)

Is it possible to do this in a robots.txt

User-agent: alexa, askjeeves, etc, etc
Disallow: /

or

User-agent: alexa,
User-agent: askjeeves,
User-agent: msnbot
User-agent: another bot.
Disallow. /

IE: Disallow the robots directly that i do no want and let the others come in.

7:27 pm on May 1, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



My current robots.txt is this.

User-agent: *
Disallow: /


That tells all obedient robots that they are not allowed to index anything.


Is it possible to do this in a robots.txt

User-agent: alexa
User-agent: askjeeves
User-agent: msnbot
User-agent: another-bot
Disallow: /


This might work -- try it and see. It depends on whether each of the robots can understand the multiple-User-agent record format. While this format is valid according to the original Standard for Robot Exclusion, and all robots should support it, it is in fact not supported by all robots. Also, note that the character after "Disallow" in your example was a period, not a colon. I have corrected that here.


The most universal/bullet-proof method would probably be something like this:
User-agent: alexa
Disallow: /

User-agent: Teoma
Disallow: /

User-agent: msnbot
Disallow: /

User-agent: another-bot
Disallow: /

User-agent: *
Disallow:


Note the blank line at the end -- At least one EU robot requires it. "Teoma" is Ask's (formerly Ask Jeeves) spider name.

Jim

11:41 pm on May 1, 2007 (gmt 0)

5+ Year Member



Just one question.
How would you know if the file is working or not?
4:35 am on May 2, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



How would you know if the file is working or not?

you can use the robot.txt analysis feature in google webmaster tools [google.com] to see how it works for googlebot...

5:48 am on May 2, 2007 (gmt 0)

5+ Year Member



Thank you,

I did not want to just install the new file and wait for my stats to update the hits on the robots.txt file.--------
--------
update

I tested the file by excluding the google bot.
Google has the know knowledge of my site but it also claims the robots.txt file is blocking it.

Just for mes efforts I placed the google bot last.

User-agent: ia_archiver
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: googlebot
Disallow: /

So for the premier person with the question it is possible to exclude the robots you do not want but leave blanks for the permission the robos you do want.
-------
Additional question.
is IA_archiver the only bot Alexa uses?
I did not see anything these more listed on robotstxt.org and I want to be certain.

[edited by: Michel_Samuel at 6:09 am (utc) on May 2, 2007]

 

Featured Threads

Hot Threads This Week

Hot Threads This Month