Forum Moderators: goodroi

Message Too Old, No Replies

Prohibiting folders by name regardless of root relation

Is this possible or not?

         

Jeremy_H

8:01 am on Jan 13, 2006 (gmt 0)

10+ Year Member



Is it possible to prohibit access to all folders with a specific name, regardless of relation to the root?

Example, if all folders named "app" were off limits, but the "apples" folder would still be fair game?

/
/app/
/apples/
/apples/app/
/folder1/app/
/folder2/app/
/folder3/subfolder/app/

Thanks

Lord Majestic

1:39 pm on Jan 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only URLs that start with one of disallow statements should be ignored.

In your example you disallowed /apples/ folder in addition to others, if you removed that line then your robots.txt would work as intended.

Jeremy_H

5:58 pm on Jan 13, 2006 (gmt 0)

10+ Year Member



Thanks Lord Majestic,

So are you saying it is possible to do this, but I need to structure my folders in a way so I don't have the prohibited word come up? Like removing folders named "apples" because it starts with "app"?

Does anybody know how I could go about writing this?

What if part of the prohibited word was in the domain name? Like "ExampleApps.com" or "ApplesAppsExample.com"? Would If the domain contains a prohibited string, does that mean there's a risk of blocking robots from the whole domain?

Thanks

Lord Majestic

6:25 pm on Jan 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only URI part after domain name gets checked against Disallow directives, but you do indeed need to be careful about how you structure your directories - you can just create one directory (say called /norobots/) and then disallow it and put all bits that you don't want to be crawled into this directory. This way you would make your robots.txt very clear and also reduce need for its maintenance.

Jeremy_H

12:01 am on Jan 14, 2006 (gmt 0)

10+ Year Member



Thanks for your help.

The thing about my website is that it's broken down into different folders, and each folder has it's own set of "assets", each in it's own sub-folder.

Right now I have to go into to the robots txt file and create several entries each time I add more content.

I was hoping to be just able to create a set of rules, such as if the folder was called "img", regardless of which folder it's in, then prohibit it.

Looks like doing this is either impossible or messy.

Which makes me wonder. Is it possible to just prohibit all instances of certain types of files, like .jpg . jpeg .gif and .pdf files, regardless of their location?

Lord Majestic

2:38 pm on Jan 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it possible to just prohibit all instances of certain types of files, like .jpg . jpeg .gif and .pdf files, regardless of their location?

No - or to be not exact not for all bots: you will need wildcards for what you want and they are explicitly not supported in Disallow statement, though some bots extended standard and do support wildcards.

Write a small script that would iterate through directories with your assets and generate robots.txt based on wildcard that you specify - just don't make it too big, anything above 100k is certainly out of question.