homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
What's the Proper Way to Block Sub Folders
Looking for the best way to block 2nd level sub folders
graywolf




msg:3295269
 12:12 pm on Mar 28, 2007 (gmt 0)

I'm looking at new site where I need to block the 2nd level of sub folders but still keep the first indexed, Never actually tried it this way so looking for some guidance. Here's the structure:

example.com/username1/foo/
example.com/username2/foo/
example.com/username3/foo/
...
example.com/username5000/foo/

There are going to be thousands of "username" folders and I want them indexed. However I don't want any of the "foo" folders indexed. Which is the best way to block those folders

User-agent: *
Disallow: /foo/

or

User-agent: *
Disallow: /*/foo/

We'll be using meta tags as well just want to keep the robots file in order.

 

phranque




msg:3295307
 12:52 pm on Mar 28, 2007 (gmt 0)

according to the Web Server Administrator's Guide
to the Robots Exclusion Protocol
[robotstxt.org]:
Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".

in other words, they so don't want to support wildcarding that they mistakenly refer to it as a regular expression, which they also don't want to support.

note also that grammar are not important to robots...

graywolf




msg:3295596
 6:07 pm on Mar 28, 2007 (gmt 0)

heh well I'm not so much concerned about the robots protocol as much as I am google who seems to be supporting it

[google.com...]

jimbeetle




msg:3295630
 6:43 pm on Mar 28, 2007 (gmt 0)

G and Y! support wildcards, I don't think Ask does as yet, nor most of all those other little pesky critters that flit around all over the net. So, anything blocked with wildcards is going wind up in the wild eventually anyway. I think the best bet is to use the robots meta. Just be sure not to block the subfolders in robots.txt so bots will be able to read and obey the instruction.

<added>
And of course, if there are links to the pages in the subfolders, the meta robots will ensure that those pages don't wind up as a URL only listing in the SERPs.
</added>

goodroi




msg:3295671
 7:43 pm on Mar 28, 2007 (gmt 0)

Google Pattern Matching Instructions [google.com]
Yahoo Wildcards Instructions [ysearchblog.com]
MSN Instructions [search.msn.com]<cough> careful with msn

also you may want to use htaccess

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved