Welcome to WebmasterWorld Guest from 54.163.115.193

Forum Moderators: goodroi

What's the Proper Way to Block Sub Folders

Looking for the best way to block 2nd level sub folders

   
12:12 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm looking at new site where I need to block the 2nd level of sub folders but still keep the first indexed, Never actually tried it this way so looking for some guidance. Here's the structure:

example.com/username1/foo/
example.com/username2/foo/
example.com/username3/foo/
...
example.com/username5000/foo/

There are going to be thousands of "username" folders and I want them indexed. However I don't want any of the "foo" folders indexed. Which is the best way to block those folders

User-agent: *
Disallow: /foo/

or

User-agent: *
Disallow: /*/foo/

We'll be using meta tags as well just want to keep the robots file in order.

12:52 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



according to the Web Server Administrator's Guide
to the Robots Exclusion Protocol
[robotstxt.org]:
Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".

in other words, they so don't want to support wildcarding that they mistakenly refer to it as a regular expression, which they also don't want to support.

note also that grammar are not important to robots...

6:07 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



heh well I'm not so much concerned about the robots protocol as much as I am google who seems to be supporting it

[google.com...]

6:43 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



G and Y! support wildcards, I don't think Ask does as yet, nor most of all those other little pesky critters that flit around all over the net. So, anything blocked with wildcards is going wind up in the wild eventually anyway. I think the best bet is to use the robots meta. Just be sure not to block the subfolders in robots.txt so bots will be able to read and obey the instruction.

<added>
And of course, if there are links to the pages in the subfolders, the meta robots will ensure that those pages don't wind up as a URL only listing in the SERPs.
</added>

7:43 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Google Pattern Matching Instructions [google.com]
Yahoo Wildcards Instructions [ysearchblog.com]
MSN Instructions [search.msn.com]<cough> careful with msn

also you may want to use htaccess

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month