homepage Welcome to WebmasterWorld Guest from 54.197.110.151
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt wildcard matching
How wild is it?
kwasher

10+ Year Member



 
Msg#: 553 posted 7:38 pm on Feb 19, 2005 (gmt 0)

In this robots.txt example...

User-agent: *
Disallow: /123

This would DISallow any folder named 123, and any file named 123.

The question:

Would it also disallow the folder named 12345 and the files named 1234, 12345, etc etc?

Thanks!

--Kenn

 

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 553 posted 7:40 pm on Feb 19, 2005 (gmt 0)

User-agent: *
Disallow: /123

this actually says

disallow all user agents from indexing the folder 123, or a file of the same name, located in the root directory and anything below it

so, no, it won't affect any other folder that might have 123 in the beginning of the directory name

kwasher

10+ Year Member



 
Msg#: 553 posted 7:52 pm on Feb 19, 2005 (gmt 0)

By leaving off the trailing slash, this induces a wildcard nature. (I think I read that on a cereal box somewhere <grin>) So you are saying pretty much its not so 'wild'?

-Kenn (who sometimes mixes up the rules and might have been thinking 'htaccess style' as in <Files .htaccess*>)

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 553 posted 7:57 pm on Feb 19, 2005 (gmt 0)

I believe, and I could be wrong, the wild nature means it will match a directory or a file of that name regardless of it's extension, so it is wild but not in the way you were thinking.

I did check here just to be sure
[searchengineworld.com...]

kwasher

10+ Year Member



 
Msg#: 553 posted 8:04 pm on Feb 19, 2005 (gmt 0)

This place

[thesitewizard.com...]

implies that files will be matched (though it does not explicitly say that folders will be matched.)


Remember to add the trailing slash ("/") if you are indicating a directory. If you simply add
User-agent: *
Disallow: /privatedata

the robots will be disallowed from accessing privatedata.html as well as privatedataandstuff.html as well as the directory tree beginning from /privatedata/ (and so on). In other words, there is an implied wildcard character following whatever you list in the Disallow line.

--Kenn

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 553 posted 8:11 pm on Feb 19, 2005 (gmt 0)

I have always done all files and directories explicitly so I have never worried about it.

so for your original question the answer would then be it will wildcard match files but, as stated in that link, not directories.

kwasher

10+ Year Member



 
Msg#: 553 posted 8:22 pm on Feb 19, 2005 (gmt 0)

Yes. Im still hmmmming about the directories part (unix rules?)....


I have always done all files and directories explicitly so I have never worried about it.

Me neither, but since robots.txt is human readable, its a good way to tell hacker kiddies what folders you dont want them in.

Thanks for replying!

--Kenn (and then there is the argument about using * in the disallow... some places say yes, some say no....)

LowLevel

10+ Year Member



 
Msg#: 553 posted 12:51 am on Feb 21, 2005 (gmt 0)

"Disallow: /123" will disallow any file or directory path starting with the text "123".

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved