Welcome to WebmasterWorld Guest from 54.163.25.166

Forum Moderators: goodroi

Message Too Old, No Replies

robot.txt: Disallow every subdirectory? Allow certain subdirectories?

     
5:54 pm on Jul 1, 2009 (gmt 0)

New User

5+ Year Member

joined:Aug 29, 2008
posts:14
votes: 0



I have a site that updates client pages to their root directory on a daily basis. We want to disallow these pages to the robots, but we do not want to block out instructional/informational pages. The first thought was to create a sub-directory (ie .com/clients/business1) for these pages, then to disallow the /clients/ sub-d. Well that won't work with the site because of some jacked up CMS issues.

So my other thought was to disallow every page of the site, but then allow individual instructional/info pages.

Something along the lines of:
User-agent: *
Disallow: /
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3

Is this feasible? Will this confuse the crawlers? Any help would be greatly appreciated.

-swipe

[edited by: SwipeTheMagnets at 5:57 pm (utc) on July 1, 2009]

1:53 pm on July 2, 2009 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


Your idea will work with the big search engines. Be careful and make sure your robots.txt does not have any typos and is formatted according to the robots.txt protocol.

Whenever you make changes to your robots.txt file it is wise to monitor crawler behavior for a few days to make sure everything is fine. One typo can potentially cause nightmares.

2:24 am on July 20, 2009 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


welcome to WebmasterWorld [webmasterworld.com], swipe!
(thanks for delurking)
=8)

you should use the robots.txt checker [google.com] available in your Google Webmaster Tools dashboard.

if i recall correctly, your robots.txt file will exclude everything for all robots since it uses the first match in the order specified.
you might want to try something like this:
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3
Disallow: /