Welcome to WebmasterWorld Guest from 54.227.110.209

Forum Moderators: goodroi

Message Too Old, No Replies

allow major engines, block the fleas

and to add further control

     

youfoundjake

9:01 pm on Oct 25, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I want to block my site from being spidered by everything but google, yahoo, msn, and ia_archiver

I want to block google, yahoo, msn and ia_archiver from spidering the forum. How does this look for syntax?

User-agent: *
Disallow: /forum/

User-agent: Slurp
User-agent: Googlebot
User-agent: msnbot
User-agent: Mediapartners-Google
User-agent: Adsbot-Google
User-agent: ia_archiver-web.archive.org
Disallow:

User-agent: *
Disallow: /

vanessafox

1:11 am on Oct 26, 2006 (gmt 0)

5+ Year Member



I can't speak for all bots, but Googlebot follows the line aimed at it, if there is one. So, in this case, it would interpret the file as allowing it access to everything. I would recommend something like this:

User-agent: Slurp
User-agent: Googlebot
User-agent: msnbot
User-agent: Mediapartners-Google
User-agent: Adsbot-Google
User-agent: ia_archiver-web.archive.org
Disallow: /forum/

User-agent: *
Disallow: /

You can always verify how Googlebot will interpret a robots.txt file using the robots.txt analysis tool in Google webmaster tools. You can just add the site you're interested in to your account, paste the test file in into the tool, and check specific URLs to see if the test file would block or allow them.

youfoundjake

1:23 am on Oct 26, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



It's so simple. :)
Thanks vanessa, hope you have a good night.

jdMorgan

5:18 am on Oct 26, 2006 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Be prepared also for ancient-bots, quasi-bots, and broken-bots that can't handle the (valid according to the Standard) multiple-user-agent records. I suggest backing up your robots.txt with 'stronger stuff,' such as mod_rewrite user-agent checks, if possible.

There are plenty of badly-coded 'bots out there that are not really malicious, just incompetent...

Jim

youfoundjake

5:16 pm on Oct 26, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



thanks JD, looking into various "bot traps" to resolve that. Lots of threads to comb through....