how2 disallow includes

Forum Moderators: phranque

Message Too Old, No Replies

how2 disallow includes

royalelephant

12:12 am on Aug 11, 2003 (gmt 0)

What's the correct way to tell search engines NOT to spider and index pages such as SSL includes? I'm thinking I should use a line in robots.txt, but can't figure out how it would look. Anyone know the elegant way to solve this?

jdMorgan

3:42 am on Aug 11, 2003 (gmt 0)

royalelephant,

You can't do it with robots.txt. SSI includes are inserted into Web pages *before* they are served, and appear as part of the page.

You can use SSI itself to test the requestor's IP address or user-agent, and then conditionally include contents... but that is cloaking, and search engines frown on it.

Jim

DaveAtIFG

5:00 am on Aug 11, 2003 (gmt 0)

When I use SSI, I usually park the includes in a subdirectory named "/Includes", and use robots.txt to disallow sidering of that subdirectory.

royalelephant

12:35 pm on Aug 11, 2003 (gmt 0)

Over on Google's page for webmasters I found a note about having their agent disallowed from gif files,

[google.com...]

so I was thinking I could just do the same for .inc's...

User-agent: Googlebot
Disallow: /*.gif$

((I think I could substitute inc for gif, but I hate blundering into the unknown where the googlebot is concerned.))

... But, (just thought over JD's post) if search engines only spider .*htm* pages then they're never "seeing" includes, for as JD reminded me, they are stripped into the .*htm* pages before they're shown to the public. In which case, my worry/idea is groundless?