Welcome to WebmasterWorld Guest from 54.205.74.11

Forum Moderators: goodroi

robots.txt wildcard matching

How wild is it?

   
7:38 pm on Feb 19, 2005 (gmt 0)

10+ Year Member



In this robots.txt example...

User-agent: *
Disallow: /123

This would DISallow any folder named 123, and any file named 123.

The question:

Would it also disallow the folder named 12345 and the files named 1234, 12345, etc etc?

Thanks!

--Kenn

7:40 pm on Feb 19, 2005 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



User-agent: *
Disallow: /123

this actually says

disallow all user agents from indexing the folder 123, or a file of the same name, located in the root directory and anything below it

so, no, it won't affect any other folder that might have 123 in the beginning of the directory name

7:52 pm on Feb 19, 2005 (gmt 0)

10+ Year Member



By leaving off the trailing slash, this induces a wildcard nature. (I think I read that on a cereal box somewhere <grin>) So you are saying pretty much its not so 'wild'?

-Kenn (who sometimes mixes up the rules and might have been thinking 'htaccess style' as in <Files .htaccess*>)

7:57 pm on Feb 19, 2005 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I believe, and I could be wrong, the wild nature means it will match a directory or a file of that name regardless of it's extension, so it is wild but not in the way you were thinking.

I did check here just to be sure
[searchengineworld.com...]

8:04 pm on Feb 19, 2005 (gmt 0)

10+ Year Member



This place

[thesitewizard.com...]

implies that files will be matched (though it does not explicitly say that folders will be matched.)


Remember to add the trailing slash ("/") if you are indicating a directory. If you simply add
User-agent: *
Disallow: /privatedata

the robots will be disallowed from accessing privatedata.html as well as privatedataandstuff.html as well as the directory tree beginning from /privatedata/ (and so on). In other words, there is an implied wildcard character following whatever you list in the Disallow line.

--Kenn

8:11 pm on Feb 19, 2005 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I have always done all files and directories explicitly so I have never worried about it.

so for your original question the answer would then be it will wildcard match files but, as stated in that link, not directories.

8:22 pm on Feb 19, 2005 (gmt 0)

10+ Year Member



Yes. Im still hmmmming about the directories part (unix rules?)....


I have always done all files and directories explicitly so I have never worried about it.

Me neither, but since robots.txt is human readable, its a good way to tell hacker kiddies what folders you dont want them in.

Thanks for replying!

--Kenn (and then there is the argument about using * in the disallow... some places say yes, some say no....)

12:51 am on Feb 21, 2005 (gmt 0)

10+ Year Member



"Disallow: /123" will disallow any file or directory path starting with the text "123".
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month