encyclo

msg:3311427 | 12:51 am on Apr 15, 2007 (gmt 0) |
robots.txt is for blacklisting bots you don't want, and as such is can't do whitelisting just for bots you choose. The only way to do it is to have a dynamic robots.txt file which displays a disallow to any request not from those you want whitelisted. This thread [webmasterworld.com] explains the basic idea.
|
jdMorgan

msg:3311472 | 2:17 am on Apr 15, 2007 (gmt 0) |
Maybe I missed something here, but...
# robots.txt - Disallow Googlebot, msnbot, and slurp for NO files, and disallow all others for ALL files. # User-agent: Googlebot Disallow:User-agent: msnbot Disallow: User-agent: Slurp Disallow: User-agent: * Disallow: / # end robots.txt
Jim
|
Michel Samuel

msg:3327732 | 6:50 pm on May 1, 2007 (gmt 0) |
Sorry to recommence this rubique but I have a similar question. My current robots.txt is this. User-agent: * Disallow: / It has removed all my problems with robots but given me another one. My website traffic has very much dropped. I noticed that google and yahoo were have had 90% of my search engine. And I truely only need to stop a total of 4 or 5 robots. (It does not appear to be worthy of the problem to make a dynamic script.) Is it possible to do this in a robots.txt User-agent: alexa, askjeeves, etc, etc Disallow: / or User-agent: alexa, User-agent: askjeeves, User-agent: msnbot User-agent: another bot. Disallow. / IE: Disallow the robots directly that i do no want and let the others come in.
|
jdMorgan

msg:3327791 | 7:27 pm on May 1, 2007 (gmt 0) |
| My current robots.txt is this. User-agent: * Disallow: / |
| That tells all obedient robots that they are not allowed to index anything. | Is it possible to do this in a robots.txt User-agent: alexa User-agent: askjeeves User-agent: msnbot User-agent: another-bot Disallow: / |
| This might work -- try it and see. It depends on whether each of the robots can understand the multiple-User-agent record format. While this format is valid according to the original Standard for Robot Exclusion, and all robots should support it, it is in fact not supported by all robots. Also, note that the character after "Disallow" in your example was a period, not a colon. I have corrected that here. The most universal/bullet-proof method would probably be something like this: User-agent: alexa Disallow: / User-agent: Teoma Disallow: / User-agent: msnbot Disallow: / User-agent: another-bot Disallow: / User-agent: * Disallow: |
| Note the blank line at the end -- At least one EU robot requires it. "Teoma" is Ask's (formerly Ask Jeeves) spider name. Jim
|
Michel Samuel

msg:3328029 | 11:41 pm on May 1, 2007 (gmt 0) |
Just one question. How would you know if the file is working or not?
|
phranque

msg:3328206 | 4:35 am on May 2, 2007 (gmt 0) |
| How would you know if the file is working or not? |
| you can use the robot.txt analysis feature in google webmaster tools [google.com] to see how it works for googlebot...
|
Michel Samuel

msg:3328240 | 5:48 am on May 2, 2007 (gmt 0) |
Thank you, I did not want to just install the new file and wait for my stats to update the hits on the robots.txt file.-------- -------- update I tested the file by excluding the google bot. Google has the know knowledge of my site but it also claims the robots.txt file is blocking it. Just for mes efforts I placed the google bot last. User-agent: ia_archiver Disallow: / User-agent: Slurp Disallow: / User-agent: googlebot Disallow: / So for the premier person with the question it is possible to exclude the robots you do not want but leave blanks for the permission the robos you do want. ------- Additional question. is IA_archiver the only bot Alexa uses? I did not see anything these more listed on robotstxt.org and I want to be certain. [edited by: Michel_Samuel at 6:09 am (utc) on May 2, 2007]
|
|