I'm having about 10 brand pages like this, which are very important to remain indexed domain.com/brand1/ domain.com/brand2/ domain.com/brand3/ etc.
Next, we have a lot of clickouts which need to be blocked by robots.txt. These clickouts are located as ID's under the brands: domain.com/brand1/123/ domain.com/brand1/456/ domain.com/brand1/789/ domain.com/brand2/010/ domain.com/brand2/111/ domain.com/brand3/213/ etc.
How do we block the latter links without disallow the brand pages?
There's no good way to do this that will work for all robots. You should really put URLs you don't want spidered into a separate directory, or divide the brands directory into spiderable and non-spiderable directories, such as
/brands/public/brand/ and brands/private/brand or /brands-public/brand/ and brands-private/brand or /brands/brand-public/ and brands/brand-private etc.
That is, spidering should be considered in the design of the directory layout.
For Google and some other major search engines, you can use the "Allow:" directive and/or wild-card paths in robots.txt. But many search engines don't support "Allow:" and wild-card patsh because they is not part of the original Standard for Robot Exclusion. That leaves you with using the on-page (HTML meta-tag) robots control method, which may or may not be applicable to your situation. Or look into the X-Robots HTTP header -- but again, this is not supported by all robots.
Really, the best approach is to consider file organization, spiderability, access-control, and cacheability as a fundamental part of directory-layout design...
Ok thanks for the information. So the best way is to move clickouts to a subfolder, say: domain.com/brand1/go/123/ domain.com/brand1/go/456/ domain.com/brand1/go/789/ domain.com/brand2/go/010/ domain.com/brand2/go/111/ domain.com/brand3/go/213/
And then User-agent: * Disallow: /brand1/go Disallow: /brand2/go Disallow: /brand3/go
That wouldn't hurt the brand pages itself wouldn't it?