robot.txt: Disallow every subdirectory? Allow certain subdirectories?

Forum Moderators: goodroi

Message Too Old, No Replies

robot.txt: Disallow every subdirectory? Allow certain subdirectories?

SwipeTheMagnets

5:54 pm on Jul 1, 2009 (gmt 0)

I have a site that updates client pages to their root directory on a daily basis. We want to disallow these pages to the robots, but we do not want to block out instructional/informational pages. The first thought was to create a sub-directory (ie .com/clients/business1) for these pages, then to disallow the /clients/ sub-d. Well that won't work with the site because of some jacked up CMS issues.

So my other thought was to disallow every page of the site, but then allow individual instructional/info pages.

Something along the lines of:
User-agent: *
Disallow: /
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3

Is this feasible? Will this confuse the crawlers? Any help would be greatly appreciated.

-swipe

[edited by: SwipeTheMagnets at 5:57 pm (utc) on July 1, 2009]

goodroi

1:53 pm on Jul 2, 2009 (gmt 0)

Your idea will work with the big search engines. Be careful and make sure your robots.txt does not have any typos and is formatted according to the robots.txt protocol.

Whenever you make changes to your robots.txt file it is wise to monitor crawler behavior for a few days to make sure everything is fine. One typo can potentially cause nightmares.

phranque

2:24 am on Jul 20, 2009 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], swipe!
(thanks for delurking)
=8)

you should use the robots.txt checker [google.com] available in your Google Webmaster Tools dashboard.

if i recall correctly, your robots.txt file will exclude everything for all robots since it uses the first match in the order specified.
you might want to try something like this:
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3
Disallow: /