homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
how to disallow all robots except Googlebot, Yahoo Slurp and MSNBot?
d6rth7ader

5+ Year Member



 
Msg#: 3310124 posted 10:29 am on Apr 13, 2007 (gmt 0)

I only want Googlebot, Yahoo Slurp and MSNBot to crawl my site. How do i do that?

 

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3310124 posted 12:51 am on Apr 15, 2007 (gmt 0)

robots.txt is for blacklisting bots you don't want, and as such is can't do whitelisting just for bots you choose.

The only way to do it is to have a dynamic robots.txt file which displays a disallow to any request not from those you want whitelisted. This thread [webmasterworld.com] explains the basic idea.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3310124 posted 2:17 am on Apr 15, 2007 (gmt 0)

Maybe I missed something here, but...

# robots.txt - Disallow Googlebot, msnbot, and slurp for NO files, and disallow all others for ALL files.
#
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

User-agent: Slurp
Disallow:

User-agent: *
Disallow: /

# end robots.txt

Jim

Michel Samuel

5+ Year Member



 
Msg#: 3310124 posted 6:50 pm on May 1, 2007 (gmt 0)

Sorry to recommence this rubique but I have a similar question.

My current robots.txt is this.

User-agent: *
Disallow: /

It has removed all my problems with robots but given me another one. My website traffic has very much dropped.

I noticed that google and yahoo were have had 90% of my search engine. And I truely only need to stop a total of 4 or 5 robots. (It does not appear to be worthy of the problem to make a dynamic script.)

Is it possible to do this in a robots.txt

User-agent: alexa, askjeeves, etc, etc
Disallow: /

or

User-agent: alexa,
User-agent: askjeeves,
User-agent: msnbot
User-agent: another bot.
Disallow. /

IE: Disallow the robots directly that i do no want and let the others come in.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3310124 posted 7:27 pm on May 1, 2007 (gmt 0)

My current robots.txt is this.

User-agent: *
Disallow: /


That tells all obedient robots that they are not allowed to index anything.


Is it possible to do this in a robots.txt

User-agent: alexa
User-agent: askjeeves
User-agent: msnbot
User-agent: another-bot
Disallow: /


This might work -- try it and see. It depends on whether each of the robots can understand the multiple-User-agent record format. While this format is valid according to the original Standard for Robot Exclusion, and all robots should support it, it is in fact not supported by all robots. Also, note that the character after "Disallow" in your example was a period, not a colon. I have corrected that here.


The most universal/bullet-proof method would probably be something like this:
User-agent: alexa
Disallow: /

User-agent: Teoma
Disallow: /

User-agent: msnbot
Disallow: /

User-agent: another-bot
Disallow: /

User-agent: *
Disallow:


Note the blank line at the end -- At least one EU robot requires it. "Teoma" is Ask's (formerly Ask Jeeves) spider name.

Jim

Michel Samuel

5+ Year Member



 
Msg#: 3310124 posted 11:41 pm on May 1, 2007 (gmt 0)

Just one question.
How would you know if the file is working or not?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3310124 posted 4:35 am on May 2, 2007 (gmt 0)

How would you know if the file is working or not?

you can use the robot.txt analysis feature in google webmaster tools [google.com] to see how it works for googlebot...

Michel Samuel

5+ Year Member



 
Msg#: 3310124 posted 5:48 am on May 2, 2007 (gmt 0)

Thank you,

I did not want to just install the new file and wait for my stats to update the hits on the robots.txt file.--------
--------
update

I tested the file by excluding the google bot.
Google has the know knowledge of my site but it also claims the robots.txt file is blocking it.

Just for mes efforts I placed the google bot last.

User-agent: ia_archiver
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: googlebot
Disallow: /

So for the premier person with the question it is possible to exclude the robots you do not want but leave blanks for the permission the robos you do want.
-------
Additional question.
is IA_archiver the only bot Alexa uses?
I did not see anything these more listed on robotstxt.org and I want to be certain.

[edited by: Michel_Samuel at 6:09 am (utc) on May 2, 2007]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved