homepage Welcome to WebmasterWorld Guest from 54.198.8.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt disallow searching reoccuring folders
jchristi




msg:4356402
 8:10 pm on Aug 29, 2011 (gmt 0)

I'm searching my heart out, but must not be using the correct terms to get to the following answer.

We have a large site that has plenty of subdirectories of similar construction.

Within each subdir, there are specific folders that we want to disallow searches of.

I'm trying to ascertain how little I can get by with in the robots.txt file to achieve the desired results.

If the server has the four following structures:

www.server.com/Images/picture.gif
www.server.com/subdirectory1/Images/picture.gif
www.server.com/subdirectory2/Images/picture.gif
www.server.com/subdirectory3/andevenmorefolderstructure/Images/picture.gif

what disallow language would tell the search engines not to search any "Images" folder?

Would */Images/ do it?

or /Images/
*/Images/
to catch the root, and then all deeper buried folders?


or, must I specify entire subdir paths to the folders I want blocked?
/Images/
/subdirectory1/Images/
/subdirectory2/Images/
/subdirectory3/andevenmorefolderstructure/Images/

 

g1smd




msg:4356408
 8:30 pm on Aug 29, 2011 (gmt 0)

The pattern matching is "from the left".

So /images disallows any URL path "beginning" /images

And /*images disallows any URL path "containing" images

jchristi




msg:4356414
 8:41 pm on Aug 29, 2011 (gmt 0)

thanks...

and i need to include the trailing / ( /*images/ ) if I want to be sure i match only "images" folders and not "imageofsomething" folders?

g1smd




msg:4356417
 8:55 pm on Aug 29, 2011 (gmt 0)

Yes.

phranque




msg:4356446
 10:34 pm on Aug 29, 2011 (gmt 0)

also note that the wildcarding/file globbing patterns are not specified in the robots exclusion protocol but rather are extensions supported by most of the big players including google.
in other words, don't expect EVERY "well-behaved" bot to necessarily understand and respect your exclusions.

jchristi




msg:4356510
 2:14 am on Aug 30, 2011 (gmt 0)

I understand it's not honored by everything.

Basically, we have a Sharepoint deployment, and for every site listed below the domain, they have similar structures. I do not want to spell out every disallow for every site.

if this gets me through the big dogs... that will be enough for me.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved