homepage Welcome to WebmasterWorld Guest from 54.226.191.80
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Can I disallow an entire directory but one file?
too many files to list, plus I'd be advertising them
walkman




msg:1526243
 5:06 am on Feb 24, 2004 (gmt 0)

and some are best if left unlisted. However, I'd like to have search engines follow the links in one file from that directory.

is there a way, other than listing them oen by one?

thanks,

 

closed




msg:1526244
 8:14 pm on Feb 25, 2004 (gmt 0)

Yeah. You could have a Disallow line for the directory, then an Allow line right after that for the file. I'm fairly sure Googlebot does that. You'd have to do your research to see which robots support Allow statements that way, though.

tschild




msg:1526245
 8:59 pm on Feb 25, 2004 (gmt 0)

Another way (that does not use the nonstandard Allow directive) would be to use the property of Disallow directives to match any file whose path begins with the specified term.

Example: your files are

/directory/a1234xx.html
/directory/b2345yy.html
/directory/b2433zz.html
/directory/c8768aa.html

You want to block spidering of all those files except for directory/b2433zz.html

Solution:


User-agent: *
Disallow: /directory/a
Disallow: /directory/b23
Disallow: /directory/c

This disallows all other files without specifying their full path.

walkman




msg:1526246
 9:21 pm on Feb 25, 2004 (gmt 0)

No need for a * at the end? Just the first letter of two and the entire file /directory is excluded?
I'll do a through z and just leave the o out. That file starts with it.

This seems like a nice workaround it. The Allow directive was not validated.

thanks,

BarkerJr




msg:1526247
 3:43 am on Feb 26, 2004 (gmt 0)
No, don't use an asterisk anywhere unless you have the asterisk character in the filename. Most spiders do not support wildcards anywhere in robots.txt (too expensive?). The asterisk in the useragent is not a wildcard, it's just a character that represents all spiders.
walkman




msg:1526248
 4:13 am on Feb 26, 2004 (gmt 0)

thank you. Done. I have blocked
/dir/a to /dir/z but the one I need.
All in the correct format of course....validated it and everything. Thanks again for your help.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved