Forum Moderators: goodroi

Message Too Old, No Replies

Correct way to ban a directory

         

palmpal

9:54 pm on Jan 22, 2006 (gmt 0)

10+ Year Member



Hello,

I have the following in my robots.txt file:

User-agent: *
Disallow: /coppermine/

I've noticed that pages from my Coppermine Gallery are being indexed by Google. Is my directive incorrect?

Thanks!

Dijkgraaf

10:13 pm on Jan 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is the coppermine directory in the root of you web site?

If not you need to have
Disallow: /dir1/dir2/coppermine

Note I dropped the trailing slash as well,
see this thread [webmasterworld.com] as to why I did that.

palmpal

10:41 pm on Jan 22, 2006 (gmt 0)

10+ Year Member



Hello and thanks for replying.

My Coppermine direction is located here:

http:// www.domain.com/coppermine

I set the disallow the same as I did for my images directory.

Thanks

maccas

12:22 am on Jan 23, 2006 (gmt 0)

10+ Year Member



I always thought it was

Disallow: /directory/

[searchengineworld.com...]

If it is then Google is completely ignoring it. According to them it is

Disallow: /directory

[google.com...]

Dijkgraaf

3:18 am on Jan 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well it certainly looks right the way you have it.
Have you run the robots.txt validator accross it?
(Link at the top of the forum).

The disallow tells spiders/bots not to ask for anything begining with what you specify.

As it is possible to have a url pointing at
www.example.com/directory without a trailing slash and this causes some bots not match this to the disallow: /directory/ instruction and hence could cause issues.

Some other bots are known to actually ignore the trailing / in the disallow when matching which can cause another issue if you have a dictory and another resource called starting with the same name
e.g.
disallow: /myinfo
would disallow both a directory called myinfo and a page called myinfo.htm.