Forum Moderators: goodroi

Message Too Old, No Replies

Need Trailing Slash Clarification

         

Israel

7:01 pm on Mar 20, 2006 (gmt 0)

10+ Year Member



Hi Ladies & Gents,

Read much, understood most, but one issue never seems to be adequately addressed in robots.txt docs - the trailing slash delimiter and its full power.

Everything I read says it's risky to omit it because

Disallow: /example

will also prevent robots from accessing my theoretical file under root called example.html - the trailing backslash prevents that omission.

------

Well, I'm unlikely to have an example.html file under root. However under /example I have more folders that I wish to keep private:

/example/assets
/example/liabilities

----

Which of the following statements or groups of statements will keep everything under /example from being indexed?

a) /example

b) /example/

c) /example/assets
/example/liabilities

d) or something different

Will the successful example also keep anything under assets & liabilities off limits too? Seems like inheritance should handle this, no?

Thank you so much for the expert answer I know is shortly forthcoming!

Israel

Dijkgraaf

9:11 pm on Mar 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The reason to avoid the trailing slash is that a request for GET /example where example is a directory will have some web servers serve up the default page of that directory (rather than a redirect) and as /example does not beging with /example/ it would not match the rule disallow: /example/ and so would not be disallowed.

But yes you are right in that disallow: /example would also disallow GET /example.html or anything else starting with /example

I'd advice avoiding having directories the same name as other resources as how various bots handle this issue varies. I know of at least one search engine that will strip the trailing slashes of disallow's just to be safe.

Israel

12:25 pm on Mar 24, 2006 (gmt 0)

10+ Year Member



I'd advice avoiding having directories the same name as other resources as how various bots handle this issue varies.

Dijkgraaf,

I'm very careful to not name a directory the same as a page name, etc. However, I guess my real concern, which I didn't make too clear, is this:

Disallow: /example

Will the above prevent most robots from going any deeper into folders below example like:

/example/assets
/example/liabilities

-- or --

must those deeper folders also be mentioned in robots.txt?

---

Funny how I look at all different robots.txt files on "authority" sites whom you think would be using the best method and I see a variety of syntaxes in use!

Sometimes, the trailing slash

Sometimes, no trailing slash

Sometimes, the higher level folder and well as folders beneath it are on their own lines

----
All I'm concerned with is this issue though:

I'm inclined to believe that /examples will prevent most robots from going any deeper into folders under /examples. Would you agree?

Thanks,

Israel

Dijkgraaf

6:06 am on Mar 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Disallow: /example
will stop the robot requesting anything begining with
/example
So yes, it would also stop it requesting sub directories.

g1smd

8:00 pm on Apr 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any URL that starts with exactly these characters will not be accessed: /example

Reid

11:44 pm on Apr 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



robots.txt is based on -prefix matching- meaning that any URL with a prefix matching the string you give (starting with the root) will be disallowed.

g1smd

11:59 pm on Apr 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have today found out that a file like:

User-agent: SomeAgent
Disallow: /
User-agent: *
Disallow: /

will cause Google to spider the entire site.

There MUST be a blank line BEFORE the SECOND user-agent declaration, otherwise, to Google at least, it looks like there is only a disallow rule for only "SomeAgent".

Reid

11:17 pm on Apr 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's interesting GLSMID true you must have a space between directives. I didn't know googlebot acted that way though.