Forum Moderators: goodroi

Message Too Old, No Replies

Excluding a stub page with robots.txt

Dealing with trailing forward slash

         

bouncybunny

9:30 am on Feb 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a CMS which generates pages without filename extensions. That's OK with me.

So in the following example 'page' is a file, not a directory. If I'm making sense... :/

http://www.example.com/directory/page

However, there are 'stubs' of information, or details from that page that I want to exclude from search engines with robots.txt.

e.g. in the following url, I want to exclude everything after 'page' (which I believe is now treated as a directory by SEs). However, I DO want to include the above URL, where 'page' is treated as a file.

http://www.example.com/directory/page/i-want-to-exclude-this.html

Am I right in saying that the following will do what I want? Or do I need to put some 'allow:' instructions in there?


User-agent: *
Disallow: /example.com/directory/page/


User-agent: Googlebot
Disallow: /example.com/directory/page/*

Once again, I want the following to be indexed by search engines;

http://www.example.com/directory/page

But I want the following to be excluded by search engines;

http://www.example.com/directory/page/anything-after-this...

Thanks.

Samizdata

12:41 pm on Feb 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My understanding is that the directive in this case is always "starts with", so:

User-agent: *
Disallow: /directory/page/

should do exactly what you want for all compliant robots.

bouncybunny

2:27 pm on Feb 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, of course I shouldn't have included the domain. Was half asleep when I posted this.

I just wondered whether I need to allow anything. Such as;

User-agent: *
Allow: /directory/page
Disallow: /directory/page/

bouncybunny

12:41 am on Feb 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I found a problem with this.

In cases where I had a sub directory, I was banning anything below the first directory.

For example;

http://www.example.com/directory/page/page2/anything-after-this...

I essentially need to deal with every url, in this section of my site, that *ends* with / and allow all others.