Welcome to WebmasterWorld Guest from 50.19.156.133

Forum Moderators: goodroi

Message Too Old, No Replies

robot.txt: Disallow every subdirectory? Allow certain subdirectories?

     

SwipeTheMagnets

5:54 pm on Jul 1, 2009 (gmt 0)

5+ Year Member




I have a site that updates client pages to their root directory on a daily basis. We want to disallow these pages to the robots, but we do not want to block out instructional/informational pages. The first thought was to create a sub-directory (ie .com/clients/business1) for these pages, then to disallow the /clients/ sub-d. Well that won't work with the site because of some jacked up CMS issues.

So my other thought was to disallow every page of the site, but then allow individual instructional/info pages.

Something along the lines of:
User-agent: *
Disallow: /
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3

Is this feasible? Will this confuse the crawlers? Any help would be greatly appreciated.

-swipe

[edited by: SwipeTheMagnets at 5:57 pm (utc) on July 1, 2009]

goodroi

1:53 pm on Jul 2, 2009 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Your idea will work with the big search engines. Be careful and make sure your robots.txt does not have any typos and is formatted according to the robots.txt protocol.

Whenever you make changes to your robots.txt file it is wise to monitor crawler behavior for a few days to make sure everything is fine. One typo can potentially cause nightmares.

phranque

2:24 am on Jul 20, 2009 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], swipe!
(thanks for delurking)
=8)

you should use the robots.txt checker [google.com] available in your Google Webmaster Tools dashboard.

if i recall correctly, your robots.txt file will exclude everything for all robots since it uses the first match in the order specified.
you might want to try something like this:
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3
Disallow: /

 

Featured Threads

Hot Threads This Week

Hot Threads This Month