Excluding a stub page with robots.txt

I have a CMS which generates pages without filename extensions. That's OK with me.

So in the following example 'page' is a file, not a directory. If I'm making sense... :/

http://www.example.com/directory/page

However, there are 'stubs' of information, or details from that page that I want to exclude from search engines with robots.txt.

e.g. in the following url, I want to exclude everything after 'page' (which I believe is now treated as a directory by SEs). However, I DO want to include the above URL, where 'page' is treated as a file.

http://www.example.com/directory/page/i-want-to-exclude-this.html

Am I right in saying that the following will do what I want? Or do I need to put some 'allow:' instructions in there?


User-agent: *
Disallow: /example.com/directory/page/


User-agent: Googlebot
Disallow: /example.com/directory/page/*

Once again, I want the following to be indexed by search engines;

http://www.example.com/directory/page

But I want the following to be excluded by search engines;

http://www.example.com/directory/page/anything-after-this...

Thanks.

Excluding a stub page with robots.txt

Dealing with trailing forward slash

bouncybunny

Samizdata

bouncybunny

bouncybunny

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week