Welcome to WebmasterWorld Guest from 54.146.5.196

Forum Moderators: goodroi

Message Too Old, No Replies

how to disallow all robots except Googlebot, Yahoo Slurp and MSNBot?

     
10:29 am on Apr 13, 2007 (gmt 0)

New User

10+ Year Member

joined:Sept 21, 2006
posts:7
votes: 0


I only want Googlebot, Yahoo Slurp and MSNBot to crawl my site. How do i do that?
12:51 am on Apr 15, 2007 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9074
votes: 6


robots.txt is for blacklisting bots you don't want, and as such is can't do whitelisting just for bots you choose.

The only way to do it is to have a dynamic robots.txt file which displays a disallow to any request not from those you want whitelisted. This thread [webmasterworld.com] explains the basic idea.

2:17 am on Apr 15, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Maybe I missed something here, but...

# robots.txt - Disallow Googlebot, msnbot, and slurp for NO files, and disallow all others for ALL files.
#
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

User-agent: Slurp
Disallow:

User-agent: *
Disallow: /

# end robots.txt


Jim
6:50 pm on May 1, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 22, 2006
posts:77
votes: 0


Sorry to recommence this rubique but I have a similar question.

My current robots.txt is this.

User-agent: *
Disallow: /

It has removed all my problems with robots but given me another one. My website traffic has very much dropped.

I noticed that google and yahoo were have had 90% of my search engine. And I truely only need to stop a total of 4 or 5 robots. (It does not appear to be worthy of the problem to make a dynamic script.)

Is it possible to do this in a robots.txt

User-agent: alexa, askjeeves, etc, etc
Disallow: /

or

User-agent: alexa,
User-agent: askjeeves,
User-agent: msnbot
User-agent: another bot.
Disallow. /

IE: Disallow the robots directly that i do no want and let the others come in.

7:27 pm on May 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


My current robots.txt is this.

User-agent: *
Disallow: /


That tells all obedient robots that they are not allowed to index anything.


Is it possible to do this in a robots.txt

User-agent: alexa
User-agent: askjeeves
User-agent: msnbot
User-agent: another-bot
Disallow: /


This might work -- try it and see. It depends on whether each of the robots can understand the multiple-User-agent record format. While this format is valid according to the original Standard for Robot Exclusion, and all robots should support it, it is in fact not supported by all robots. Also, note that the character after "Disallow" in your example was a period, not a colon. I have corrected that here.


The most universal/bullet-proof method would probably be something like this:
User-agent: alexa
Disallow: /

User-agent: Teoma
Disallow: /

User-agent: msnbot
Disallow: /

User-agent: another-bot
Disallow: /

User-agent: *
Disallow:


Note the blank line at the end -- At least one EU robot requires it. "Teoma" is Ask's (formerly Ask Jeeves) spider name.

Jim

11:41 pm on May 1, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 22, 2006
posts:77
votes: 0


Just one question.
How would you know if the file is working or not?
4:35 am on May 2, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10847
votes: 61


How would you know if the file is working or not?

you can use the robot.txt analysis feature in google webmaster tools [google.com] to see how it works for googlebot...

5:48 am on May 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 22, 2006
posts:77
votes: 0


Thank you,

I did not want to just install the new file and wait for my stats to update the hits on the robots.txt file.--------
--------
update

I tested the file by excluding the google bot.
Google has the know knowledge of my site but it also claims the robots.txt file is blocking it.

Just for mes efforts I placed the google bot last.

User-agent: ia_archiver
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: googlebot
Disallow: /

So for the premier person with the question it is possible to exclude the robots you do not want but leave blanks for the permission the robos you do want.
-------
Additional question.
is IA_archiver the only bot Alexa uses?
I did not see anything these more listed on robotstxt.org and I want to be certain.

[edited by: Michel_Samuel at 6:09 am (utc) on May 2, 2007]

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members