Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: goodroi
I'm having about 10 brand pages like this, which are very important to remain indexed
Next, we have a lot of clickouts which need to be blocked by robots.txt. These clickouts are located as ID's under the brands:
How do we block the latter links without disallow the brand pages?
/brands/public/brand/ and brands/private/brand
/brands-public/brand/ and brands-private/brand
/brands/brand-public/ and brands/brand-private
That is, spidering should be considered in the design of the directory layout.
For Google and some other major search engines, you can use the "Allow:" directive and/or wild-card paths in robots.txt. But many search engines don't support "Allow:" and wild-card patsh because they is not part of the original Standard for Robot Exclusion. That leaves you with using the on-page (HTML meta-tag) robots control method, which may or may not be applicable to your situation. Or look into the X-Robots HTTP header -- but again, this is not supported by all robots.
Really, the best approach is to consider file organization, spiderability, access-control, and cacheability as a fundamental part of directory-layout design...
That wouldn't hurt the brand pages itself wouldn't it?
Robots.txt uses prefix-matching; Any URL-path that begins with the specified string is affected.