Welcome to WebmasterWorld Guest from 54.167.65.247

Forum Moderators: goodroi

Message Too Old, No Replies

google allow - disallow all others

     
4:22 pm on Jul 19, 2004 (gmt 0)

New User

10+ Year Member

joined:July 19, 2004
posts:2
votes: 0


Does anyone have a sample of how I would set up a Robots.txt file to allow googlebot but disallow all other spiders from the whole web site?

Thanks for the help!

:)

2:07 pm on July 29, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:May 14, 2003
posts:376
votes: 0


surely you've found the answer to this by now? ;)
2:11 pm on July 29, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member sem4u is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 18, 2002
posts:3067
votes: 1


You can only disallow spiders if you know the names of them and they will all have to be entered into the robots.txt file.

There is a good list here though:

[webmasterworld.com...]

;)

2:41 pm on July 29, 2004 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9068
votes: 4


Welcome to WebmasterWorld, jimmy19.

The safest way would be to cloak robots.txt to deliver a disallow all to anything other than Googlebot.

The basic process is to use mod_rewrite to redirect calls for robots.txt to, say, robots.php, and in the latter file, check the IP address and user agent string to identify Googlebot, and then print the appropriate robots.txt declarations. You could even place all IPs other than Googlebot accessing the robots.txt file on a banned list to ensure that they can't spider the site.

Rather complicated, but the only sure way I know of!

3:01 pm on July 29, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 26, 2002
posts:3295
votes: 6


For "well-behaved" robots, those that obey robots.txt, this is the syntax recommended by robotstxt.org:

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

7:20 am on July 30, 2004 (gmt 0)

New User

10+ Year Member

joined:July 19, 2004
posts:2
votes: 0


I will take a look at these. Thank you for the replies...

Jim :)