Forum Moderators: goodroi

Message Too Old, No Replies

Block sub directory

         

maxbear

7:42 pm on Jul 27, 2007 (gmt 0)

10+ Year Member



I want to use robots.txt to block a sub directory. But I am so confuse with the "/".

If I want to block "welcome-to-my-site": [site.com...]

Which disallow should I use?

1. Disallow: /news/welcome-to-my-site/ I know this one 100% work.

2. Disallow: /welcome-to-my-site I am not sure this one. But as far as I know, without "/", it means all stuff start with /welcome-to-my-site will be blocked.

3. Disallow: /welcome-to-my-site/ I am not sure for this one.

4. Disallow: /*/welcome-* I think this one work for yahoo and google bot only.

Any suggestion or idea? Thanks.

Marshall

8:01 pm on Jul 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



User-agent: *
Disallow: /news/welcome-to-my-site/

Should block all robots. If you want to block the whole "news" file, use
User-agent: *
Disallow: /news/

It is best to start with the top folder and work down, excluding the root folder that is.

Marshall

maxbear

4:03 am on Jul 28, 2007 (gmt 0)

10+ Year Member



Thanks.

But in some situation, I don't want to put the full path. So that's why I am thinking if there any other way to do it besides the full path.

Marshall

4:52 am on Jul 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This would tell google not to index any urls containing "news".

User-agent: *
Disallow: /*news*

Marshall

maxbear

5:24 am on Jul 28, 2007 (gmt 0)

10+ Year Member



Thanks. For some reason, I really can't put the real path "news" in the disallow command. I will explain why.

Since I use one wordpress to host many sites. All site point to the same root directory(e.g. /home/user/docs). Wordpress can handle many sites without problem if you can modify something in the config.php.

So the path might be like(they are in the same real path e.g. /home/user/docs):

[site1.com...]
[site2.com...]
[site3.com...]
....

So it's not good for me to list all the real path in the robots.txt.

That's why I am thinking is there is any other way to block that directory.

I checked [robotstxt.org...]

Disallow :

The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved.

For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.

So if I apply the above principle and put:

disallow: /welcome-to-my-site I don't know whether it will block block the sub directory.