TheOptimizationIdiot - 6:32 pm on Mar 3, 2013 (gmt 0)
That's something you'll have to test, cause canonical really says "B is the preferred version", but if you're Google and conservative and the canonical points to a noindex version of the page, would you be more likely to think the webmaster is "shooting themself in the foot and show A" or "trust the canonical"?
Personally, I think in that situation I'd err on the side of caution and show A.
The sites I work on are tens of thousands of pages too, so I definitely understand the challenges of doing it another way and redirecting or noindexing specific pages, but I also understand there is almost always another way.
You could even do something based on page length where you "compile the page" with concatenation ($page.="") and a robots noindex in place then check the length of the compiled page prior to output and if it's over N characters str_replace() the robots meta tag with "" or even str_replace() the "noindex" with "index", which would be a more sure way to "pull" short pages from the index, but leave longer pages in.
Trust me, I know it might be a PITA to figure out a different way, but there's almost always another way (actually always in my experience) and I wouldn't trust the canonical to solve the issue permanently, cause even if you test and it works today the way Google handles it could change tomorrow, so you could be back in the same situation until you find another way, so I'd start with "another way" personally. Of course I am crazy about only wanting to "fix" things once and be done with it.