I have a site that updates client pages to their root directory on a daily basis. We want to disallow these pages to the robots, but we do not want to block out instructional/informational pages. The first thought was to create a sub-directory (ie .com/clients/business1) for these pages, then to disallow the /clients/ sub-d. Well that won't work with the site because of some jacked up CMS issues.
So my other thought was to disallow every page of the site, but then allow individual instructional/info pages.
Something along the lines of: User-agent: * Disallow: / User-agent: * Allow: /info/step1 Allow: /info/step2 Allow: /info/step3
Is this feasible? Will this confuse the crawlers? Any help would be greatly appreciated.
[edited by: SwipeTheMagnets at 5:57 pm (utc) on July 1, 2009]
you should use the robots.txt checker [google.com] available in your Google Webmaster Tools dashboard.
if i recall correctly, your robots.txt file will exclude everything for all robots since it uses the first match in the order specified. you might want to try something like this: User-agent: * Allow: /info/step1 Allow: /info/step2 Allow: /info/step3 Disallow: /