Page is a not externally linkable
lexipixel - 2:38 am on Nov 7, 2006 (gmt 0)
User-agent: * should be enough now? Yes? No? -bouncybunny No. For larger sites with mixed dynamic and static content, user/member login areas, subscription only content, etc.. Keeping the bots out of certain areas is needed, and being able to wildcard match partial strings will go a long way towards cleaning dynamic URLs in the SERPs, (on Yahoo! if they are the only ones to adopt these ROBOTS.TXT operators). Boiled down, it looks like they added the They also mention and demonstrate how they allow the I've always thought of it like a filter. Disallow: /pattern/ (defined, true, "on") - or - default (not defined, not "true", "off") A defined state for 'Disallow' is sort of double negative where "allow" is the same as "not disallow". I wonder if Slurp would obey: Meaning "don't crawl anything in the calendar archives, except this month's static (.htm) event files"... Something like that could be useful when tied to a content management system that auto updates ROBOTS.TXT, (so long as other bots obey or ignore the same syntax).
If all we are interested is these three bots (and for most of us that may be the case) then using;
use of two special characters for pattern matching in Disallow (and 'Allow') statements.
User-Agent: Yahoo! Slurp
Disallow: /calendar/archive
Allow: /calendar/archive/2006/11/*.htm