homepage Welcome to WebmasterWorld Guest from 54.235.36.164
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
how to disallow all robots except Googlebot, Yahoo Slurp and MSNBot?
d6rth7ader




msg:3310126
 10:29 am on Apr 13, 2007 (gmt 0)

I only want Googlebot, Yahoo Slurp and MSNBot to crawl my site. How do i do that?

 

encyclo




msg:3311427
 12:51 am on Apr 15, 2007 (gmt 0)

robots.txt is for blacklisting bots you don't want, and as such is can't do whitelisting just for bots you choose.

The only way to do it is to have a dynamic robots.txt file which displays a disallow to any request not from those you want whitelisted. This thread [webmasterworld.com] explains the basic idea.

jdMorgan




msg:3311472
 2:17 am on Apr 15, 2007 (gmt 0)

Maybe I missed something here, but...

# robots.txt - Disallow Googlebot, msnbot, and slurp for NO files, and disallow all others for ALL files.
#
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

User-agent: Slurp
Disallow:

User-agent: *
Disallow: /

# end robots.txt

Jim

Michel Samuel




msg:3327732
 6:50 pm on May 1, 2007 (gmt 0)

Sorry to recommence this rubique but I have a similar question.

My current robots.txt is this.

User-agent: *
Disallow: /

It has removed all my problems with robots but given me another one. My website traffic has very much dropped.

I noticed that google and yahoo were have had 90% of my search engine. And I truely only need to stop a total of 4 or 5 robots. (It does not appear to be worthy of the problem to make a dynamic script.)

Is it possible to do this in a robots.txt

User-agent: alexa, askjeeves, etc, etc
Disallow: /

or

User-agent: alexa,
User-agent: askjeeves,
User-agent: msnbot
User-agent: another bot.
Disallow. /

IE: Disallow the robots directly that i do no want and let the others come in.

jdMorgan




msg:3327791
 7:27 pm on May 1, 2007 (gmt 0)

My current robots.txt is this.

User-agent: *
Disallow: /


That tells all obedient robots that they are not allowed to index anything.


Is it possible to do this in a robots.txt

User-agent: alexa
User-agent: askjeeves
User-agent: msnbot
User-agent: another-bot
Disallow: /


This might work -- try it and see. It depends on whether each of the robots can understand the multiple-User-agent record format. While this format is valid according to the original Standard for Robot Exclusion, and all robots should support it, it is in fact not supported by all robots. Also, note that the character after "Disallow" in your example was a period, not a colon. I have corrected that here.


The most universal/bullet-proof method would probably be something like this:
User-agent: alexa
Disallow: /

User-agent: Teoma
Disallow: /

User-agent: msnbot
Disallow: /

User-agent: another-bot
Disallow: /

User-agent: *
Disallow:


Note the blank line at the end -- At least one EU robot requires it. "Teoma" is Ask's (formerly Ask Jeeves) spider name.

Jim

Michel Samuel




msg:3328029
 11:41 pm on May 1, 2007 (gmt 0)

Just one question.
How would you know if the file is working or not?

phranque




msg:3328206
 4:35 am on May 2, 2007 (gmt 0)

How would you know if the file is working or not?

you can use the robot.txt analysis feature in google webmaster tools [google.com] to see how it works for googlebot...

Michel Samuel




msg:3328240
 5:48 am on May 2, 2007 (gmt 0)

Thank you,

I did not want to just install the new file and wait for my stats to update the hits on the robots.txt file.--------
--------
update

I tested the file by excluding the google bot.
Google has the know knowledge of my site but it also claims the robots.txt file is blocking it.

Just for mes efforts I placed the google bot last.

User-agent: ia_archiver
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: googlebot
Disallow: /

So for the premier person with the question it is possible to exclude the robots you do not want but leave blanks for the permission the robos you do want.
-------
Additional question.
is IA_archiver the only bot Alexa uses?
I did not see anything these more listed on robotstxt.org and I want to be certain.

[edited by: Michel_Samuel at 6:09 am (utc) on May 2, 2007]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved