Forum Moderators: goodroi

Message Too Old, No Replies

Disallow /file.php same as Diallow file.php?

         

One_on_One

4:52 am on May 26, 2005 (gmt 0)

10+ Year Member



I have a network of sites where I want to disallow a file called buy.php, however that file is in a different directory depending on the domain. For instance, I want to disallow maybe /abc/buy.php or just /buy.php or /def/buy.php

Can I just use:

Disallow: buy.php

Or do I have to create different robots.txt files for the files in different subdirectories?

Reid

10:13 am on May 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[webmasterworld.com...]

look at msg #10 in this thread for a classic explanation of robots.txt posted by jdMorgan

there is only one robots.txt located in the root directory.

try
Disallow: /*buy.php

each construct needs to start with the root /
but the wildcard * will include /alldirectories/ between the root "/" and buy.php
but you will notice the footnote on jdMorgan's post that you should not use the wildcard in a user-agent: * construct because it is not widely supported so you should do


user-agent: googlebot
disallow: /*buy.php

user-agent: otherbotswhosupportthis
disallow: /*buy.php

user-agent: *
disallow:

or else you need to


user-agent: *
disallow: /buy.php
disallow: /directory/buy.php
disallow: /dir/buy.php

and if buy.php is the only .php file then there is another way.
disallow: /*.php
# this will disallow all .php files in any directory
same rule applies for user-agent: * though

Dijkgraaf

10:24 pm on Jun 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



According to the Robot Exclusion Protocal Standard posted at [robotstxt.org...] wildcards are not allowed (and hence not supported by most bots).

"Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif"."

So you will have to have

disallow: /abc/buy.php
dissalow: /buy.php
dissalow: /def/buy.php

Dijkgraaf

11:11 pm on Jun 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot does support one wildcard feature

[google.com...]
How do I block all crawlers except Googlebot from my site?
The following robots.txt file will achieve this

User-agent: Googlebot
Disallow: /*?

Reid

3:03 am on Jun 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How do I block all crawlers except Googlebot from my site?
The following robots.txt file will achieve this

User-agent: Googlebot
Disallow: /*?

Why would other bots use the directives of user-agent: googlebot?
This robots.txt would just cause all other bots to assume that all is allowed. Same as:

User-agent: Googlebot
Disallow: /*?

User-agent: *
disallow:

You just have to remember that wildcards in the disallow are only supported by a few, including googlebot.
So if you want to stop google from listing your dynamic pages but Yahoo and MSN are OK already then

user-agent: googlebot
disallow: /*?

would have the desired effect.

Dijkgraaf

3:43 am on Jun 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Reid

That question was part of text I copied from the FAQ from google, not mine, maybe I made an error when I copied it. It didn't actually add anything to answering the original question by One_on_One, so I probably shouldn't have tried clarifying my about google and wildcards :-)