pageoneresults - 5:23 pm on Dec 6, 2010 (gmt 0)
Are there really that many differences when dealing with 1,000 documents or 1,000,000,000?
The solutions at the 1000 level are not much different than those at the 1000000000 level. Or are they? I think the same concepts apply to any number of documents.
A place for everything, everything in its place.
For me, I focus mainly on the "final destination" documents. Anything in-between is subject to noindex or noindex, nofollow. Using just noindex allows me to direct the flow of equity between entry and final destination. I want the bot to follow all internal links (where applicable) but I only want a specific path indexed to the final destination.
Think of it like a map. You have your "shortest route" and that is the way I view this. There may be plenty of "other routes" that lead you to the same destination and those are "alternatives" to the "shortest route". But, the shortest is always the one that leads the bot to the final destination hence the use of noindex in many areas.
Typically most efficient structures are going to be somewhat shallow and very broad horizontally. Each entrance into a horizontal structure stands on its own. Think of it like having thousands of websites that all make up the "pyramid". Not only do you go horizontally, you also travel upwards and work at the host name level.
This is not a one size fits all solution although the basic concepts are the same. I'm looking at it from an ecommerce perspective and for me, it all comes down to how you manage the "equity" within the site. Heck, if you've got it just right, you can put a link on one of the top level horizontal (or host name) documents and that would probably have just as much, if not more importance than most external links.
<meta name="robots" content="noarchive">
The above is mandatory for all sites that we do. We typically serve it as a X-Robots-Tag in the server headers.
<meta name="robots" content="noindex">
<meta name="robots" content="noindex, nofollow">
The above are used judiciously throughout all the documents we work with. Most of the documents have default indexing directives. Some are controlled by the user based on the content being published.
This is all "basic stuff" too. There's much more at the "leeching level" that needs to be addressed, that's one thing I guess stands out between the 1,000 and 1,000,000,000 levels. You really have to control bot access to the site. Error reporting routines become much more robust at the larger document levels. But these are things you also need to worry about if you only have 100 documents. You need to protect and control access to everything. Ask IncrediBILL, he's the Master of grabbing a bot by the balls and giving them instructions on what they can and cannot do. :)