Welcome to WebmasterWorld Guest from 54.196.244.206

Forum Moderators: goodroi

Message Too Old, No Replies

allow major engines, block the fleas

and to add further control

     
9:01 pm on Oct 25, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 28, 2006
posts:1043
votes: 1


I want to block my site from being spidered by everything but google, yahoo, msn, and ia_archiver

I want to block google, yahoo, msn and ia_archiver from spidering the forum. How does this look for syntax?

User-agent: *
Disallow: /forum/

User-agent: Slurp
User-agent: Googlebot
User-agent: msnbot
User-agent: Mediapartners-Google
User-agent: Adsbot-Google
User-agent: ia_archiver-web.archive.org
Disallow:

User-agent: *
Disallow: /

1:11 am on Oct 26, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 17, 2005
posts:44
votes: 0


I can't speak for all bots, but Googlebot follows the line aimed at it, if there is one. So, in this case, it would interpret the file as allowing it access to everything. I would recommend something like this:

User-agent: Slurp
User-agent: Googlebot
User-agent: msnbot
User-agent: Mediapartners-Google
User-agent: Adsbot-Google
User-agent: ia_archiver-web.archive.org
Disallow: /forum/

User-agent: *
Disallow: /

You can always verify how Googlebot will interpret a robots.txt file using the robots.txt analysis tool in Google webmaster tools. You can just add the site you're interested in to your account, paste the test file in into the tool, and check specific URLs to see if the test file would block or allow them.

1:23 am on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 28, 2006
posts:1043
votes: 1


It's so simple. :)
Thanks vanessa, hope you have a good night.
5:18 am on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Be prepared also for ancient-bots, quasi-bots, and broken-bots that can't handle the (valid according to the Standard) multiple-user-agent records. I suggest backing up your robots.txt with 'stronger stuff,' such as mod_rewrite user-agent checks, if possible.

There are plenty of badly-coded 'bots out there that are not really malicious, just incompetent...

Jim

5:16 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 28, 2006
posts:1043
votes: 1


thanks JD, looking into various "bot traps" to resolve that. Lots of threads to comb through....
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members