Forum Moderators: goodroi
Read much, understood most, but one issue never seems to be adequately addressed in robots.txt docs - the trailing slash delimiter and its full power.
Everything I read says it's risky to omit it because
Disallow: /example
will also prevent robots from accessing my theoretical file under root called example.html - the trailing backslash prevents that omission.
------
Well, I'm unlikely to have an example.html file under root. However under /example I have more folders that I wish to keep private:
/example/assets
/example/liabilities
----
Which of the following statements or groups of statements will keep everything under /example from being indexed?
a) /example
b) /example/
c) /example/assets
/example/liabilities
d) or something different
Will the successful example also keep anything under assets & liabilities off limits too? Seems like inheritance should handle this, no?
Thank you so much for the expert answer I know is shortly forthcoming!
Israel
But yes you are right in that disallow: /example would also disallow GET /example.html or anything else starting with /example
I'd advice avoiding having directories the same name as other resources as how various bots handle this issue varies. I know of at least one search engine that will strip the trailing slashes of disallow's just to be safe.
I'd advice avoiding having directories the same name as other resources as how various bots handle this issue varies.
Dijkgraaf,
I'm very careful to not name a directory the same as a page name, etc. However, I guess my real concern, which I didn't make too clear, is this:
Disallow: /example
Will the above prevent most robots from going any deeper into folders below example like:
/example/assets
/example/liabilities
-- or --
must those deeper folders also be mentioned in robots.txt?
---
Funny how I look at all different robots.txt files on "authority" sites whom you think would be using the best method and I see a variety of syntaxes in use!
Sometimes, the trailing slash
Sometimes, no trailing slash
Sometimes, the higher level folder and well as folders beneath it are on their own lines
----
All I'm concerned with is this issue though:
I'm inclined to believe that /examples will prevent most robots from going any deeper into folders under /examples. Would you agree?
Thanks,
Israel
User-agent: SomeAgent
Disallow: /
User-agent: *
Disallow: /
will cause Google to spider the entire site.
There MUST be a blank line BEFORE the SECOND user-agent declaration, otherwise, to Google at least, it looks like there is only a disallow rule for only "SomeAgent".