homepage Welcome to WebmasterWorld Guest from 23.20.77.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robot.txt: Disallow every subdirectory? Allow certain subdirectories?
SwipeTheMagnets




msg:3944229
 5:54 pm on Jul 1, 2009 (gmt 0)


I have a site that updates client pages to their root directory on a daily basis. We want to disallow these pages to the robots, but we do not want to block out instructional/informational pages. The first thought was to create a sub-directory (ie .com/clients/business1) for these pages, then to disallow the /clients/ sub-d. Well that won't work with the site because of some jacked up CMS issues.

So my other thought was to disallow every page of the site, but then allow individual instructional/info pages.

Something along the lines of:
User-agent: *
Disallow: /
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3

Is this feasible? Will this confuse the crawlers? Any help would be greatly appreciated.

-swipe

[edited by: SwipeTheMagnets at 5:57 pm (utc) on July 1, 2009]

 

goodroi




msg:3944841
 1:53 pm on Jul 2, 2009 (gmt 0)

Your idea will work with the big search engines. Be careful and make sure your robots.txt does not have any typos and is formatted according to the robots.txt protocol.

Whenever you make changes to your robots.txt file it is wise to monitor crawler behavior for a few days to make sure everything is fine. One typo can potentially cause nightmares.

phranque




msg:3955506
 2:24 am on Jul 20, 2009 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], swipe!
(thanks for delurking)
=8)

you should use the robots.txt checker [google.com] available in your Google Webmaster Tools dashboard.

if i recall correctly, your robots.txt file will exclude everything for all robots since it uses the first match in the order specified.
you might want to try something like this:
User-agent: *
Allow: /info/step1
Allow: /info/step2
Allow: /info/step3
Disallow: /

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved