| allow major engines, block the fleas and to add further control |
youfoundjake

msg:3134881 | 9:01 pm on Oct 25, 2006 (gmt 0) | I want to block my site from being spidered by everything but google, yahoo, msn, and ia_archiver I want to block google, yahoo, msn and ia_archiver from spidering the forum. How does this look for syntax? User-agent: * Disallow: /forum/ User-agent: Slurp User-agent: Googlebot User-agent: msnbot User-agent: Mediapartners-Google User-agent: Adsbot-Google User-agent: ia_archiver-web.archive.org Disallow: User-agent: * Disallow: /
|
vanessafox

msg:3135112 | 1:11 am on Oct 26, 2006 (gmt 0) | I can't speak for all bots, but Googlebot follows the line aimed at it, if there is one. So, in this case, it would interpret the file as allowing it access to everything. I would recommend something like this: User-agent: Slurp User-agent: Googlebot User-agent: msnbot User-agent: Mediapartners-Google User-agent: Adsbot-Google User-agent: ia_archiver-web.archive.org Disallow: /forum/ User-agent: * Disallow: / You can always verify how Googlebot will interpret a robots.txt file using the robots.txt analysis tool in Google webmaster tools. You can just add the site you're interested in to your account, paste the test file in into the tool, and check specific URLs to see if the test file would block or allow them.
|
youfoundjake

msg:3135122 | 1:23 am on Oct 26, 2006 (gmt 0) | It's so simple. :) Thanks vanessa, hope you have a good night.
|
jdMorgan

msg:3135242 | 5:18 am on Oct 26, 2006 (gmt 0) | Be prepared also for ancient-bots, quasi-bots, and broken-bots that can't handle the (valid according to the Standard) multiple-user-agent records. I suggest backing up your robots.txt with 'stronger stuff,' such as mod_rewrite user-agent checks, if possible. There are plenty of badly-coded 'bots out there that are not really malicious, just incompetent... Jim
|
youfoundjake

msg:3135796 | 5:16 pm on Oct 26, 2006 (gmt 0) | thanks JD, looking into various "bot traps" to resolve that. Lots of threads to comb through....
|
|
|