pageoneresults - 8:21 pm on Mar 17, 2010 (gmt 0)
My strategy is to provide only one URI per product. We do this by using NoIndex at the intermediary page levels. I've found that most carts are breadcrumb based and you end up with product URIs that represent each category which is not optimal. In fact, it will drag you down in the SERPs if you're not careful.
^ That is the final destination URI. Don't give me any crap about not having keywords in the URI either. The breadcrumb leading up to the final destination had all the keywords I was targeting. ;)
We'll NoIndex all those category levels inbetween and send the bot directly to the final destination URIs. Oh, they do an excellent job of obeying protocol and not indexing the document, they will Follow links as that is the default behavior of just using NoIndex.
One of me peers on Twitter referred to it as a Reverse Pyramid. Note, there are other elements at play here like the link rel using next, prev, start, etc. There are ways to create a taxonomy for the bots and not have them bounce around all the intermediary.
You also need to make sure that any sitemaps are feeding only those final destination URIs. There is no need to waste crawl time on documents that serve no other purpose than to take the visitor one step further into the drilldown.
If you have 100k documents and you only have 50k crawl time credit, you surely don't want that bot wasting resources on intermediary (taxonomy drilldown) documents, do you?
Oh, we also Ajax the heck out of everything we don't want the bot getting into. We know from experience that if bots can get into dynamic content, they will do harm. They will find some flaw in your rewrites, whatever. I never thought I'd hear myself say that we block the bots from getting content that most others would think they want indexed. Think really hard, do those documents really serve a purpose? :)
Ya, I know, ya'll are going to raise a big stink over the above. There is a method to my madness. And no, we are not adding NoIndex to the upper level categories, those are your power docs. :)
Note: I think using robots.txt to block indexing of content is the Kiss of Crawl Death if the right environment is present.