Howdy, I have a site that happens to be very large. (70+meg, 8000+ pages) Because of this, I have "index" pages that link to individual pages. Would it be wise to exclude either the "index" pages or the individual pages using robots.txt? I was thinking of excluding the individual pages just because I doubt many spiders will really bother to go through all of the site. The site itself is also highly interrelated, often having dozens on links to other pages within the site on an individual page.
its a good question. In my instances, the only content I disallow is content not relevant to the site (ie contact us/feedback/buy this) sort of thing.
You have to consider that if you exclude certain parts of your site, that you will prevent the bot from navigating your whole site, and it will affect your overall navigation (as far as certain search engines and page rank is concerned)
im sure one of the experienced ones in here will be able to chip in. My largest site is about the quarter of the size of yours and its pretty much left open to spidering.