Welcome to WebmasterWorld Guest from 54.197.171.28

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt disallow searching reoccuring folders

   
8:10 pm on Aug 29, 2011 (gmt 0)



I'm searching my heart out, but must not be using the correct terms to get to the following answer.

We have a large site that has plenty of subdirectories of similar construction.

Within each subdir, there are specific folders that we want to disallow searches of.

I'm trying to ascertain how little I can get by with in the robots.txt file to achieve the desired results.

If the server has the four following structures:

www.server.com/Images/picture.gif
www.server.com/subdirectory1/Images/picture.gif
www.server.com/subdirectory2/Images/picture.gif
www.server.com/subdirectory3/andevenmorefolderstructure/Images/picture.gif

what disallow language would tell the search engines not to search any "Images" folder?

Would */Images/ do it?

or /Images/
*/Images/
to catch the root, and then all deeper buried folders?


or, must I specify entire subdir paths to the folders I want blocked?
/Images/
/subdirectory1/Images/
/subdirectory2/Images/
/subdirectory3/andevenmorefolderstructure/Images/
8:30 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The pattern matching is "from the left".

So /images disallows any URL path "beginning" /images

And /*images disallows any URL path "containing" images
8:41 pm on Aug 29, 2011 (gmt 0)



thanks...

and i need to include the trailing / ( /*images/ ) if I want to be sure i match only "images" folders and not "imageofsomething" folders?
8:55 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes.
10:34 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



also note that the wildcarding/file globbing patterns are not specified in the robots exclusion protocol but rather are extensions supported by most of the big players including google.
in other words, don't expect EVERY "well-behaved" bot to necessarily understand and respect your exclusions.
2:14 am on Aug 30, 2011 (gmt 0)



I understand it's not honored by everything.

Basically, we have a Sharepoint deployment, and for every site listed below the domain, they have similar structures. I do not want to spell out every disallow for every site.

if this gets me through the big dogs... that will be enough for me.