I know the theory behind the different methods of controlling how a crawler accesses a site and I just wanted to sanity check my approach. We run a e-commerce site with pages generated dynamically by our cart software. Some pages, like categories have options to let you refine a list of items – by price or manufacturer for example. These use the query string and essentially produce a page with no unique content and no improvement from the un-refined page as far as the search engines are concerned. With so many dynamic pages I think it is important that we guide crawlers by pointing them to important pages and not un-important ones. I see a few 'tools' available to me for tackling this :
1) Use a canonical link on the refined pages pointing to the un-refined version 2) Use a noindex meta tag to inform crawlers to ignore the refined page 3) Block in robots.txt – tell the crawlers no to crawl the refined page in the first place 4) Add a nofollow attribute to all links pointing to these refined pages
What I am trying to achieve is helping the SEs crawl our site efficiently. Presently I use a combination of 2 & 4 – my thinking being instruct the SEs not to follow links to these refined pages from within our navigation and if they access if from elsewhere (i.e. linked from somewhere other than our navigation) then they see the noindex meta tag.