Forum Moderators: goodroi

Message Too Old, No Replies

Denying access to everything but the home page?

Using robots.txt to only allow robot access to the home page

         

monkeylytics

5:12 am on Jan 15, 2008 (gmt 0)

10+ Year Member



We're putting up a temporary partner site that mimics our site's functionality and content. To avoid duplicate content issues and reduce server load for us, we do not want the site crawled except for the home page which will be unique.

Can anyone suggest a good way of doing it?

Right now, we're thinking about denying access to all pages and then granting access to the home page.

User-agent: *

Disallow: /

Allow: /index.html

[en.wikipedia.org...]

as Google, Yahoo, and I think Ask support the Allow function (not MSN?). But then we have the issue of how to get the home page indexed for a call to www.widgets.com. Is it possible to 301 permanent re-direct calls to www.widgets.com to www.widgets.com/index.html?

Is there a better way that isn't as icky?

Thanks,

Steve

Receptional Andy

8:59 am on Jan 15, 2008 (gmt 0)



Another way would be to use a robots meta tag on all pages but the homepage, although depending on how your site is set up this may be more or less difficult (if there's a small number of server-side templates this shouldn't be too onerous). This wouldn't have the same impact on server load either, although I don't know how much of a concern this is for you.

monkeylytics

8:00 pm on Jan 28, 2008 (gmt 0)

10+ Year Member



What about this?

User-agent: *
Disallow: /*/

Apparently works in Google Webmaster Tools robots.txt validation. Google Webmaster Tools robots checker says that www.example.com/ is ok but will not crawl www.example.com/blah. I think Yahoo will allow this given their documentation at

[help.yahoo.com...]

One problem is that we haven't been able to figure out if MSN allows this or not. The other problem is that technically speaking the robots.txt formal specification does not support wild-carding in the disallow although Google and Yahoo support it. :/

I guess we'll go back to denying specific directories instead. Anybody have any experience with /*/ in Disallow?

Steve