Sgt_Kickaxe - 4:08 am on Dec 7, 2012 (gmt 0)
Way, way, way worse TMS.
Ultimate Solution: GWT says "hey, here's a complete list of all urls we know about on your site that do not return 404/410 codes, yo" combined with "hey again, here are pages on your site with internal links that seem to be broken". Unfortunately that's not likely to happen, some sites are way too big.
Cutts Confusion: Matt, and several others, suggest that if someone is linking to a url on your site but they get it wrong, miss a character or whatever, that it's ok to redirect the broken url to the good one. Matt says this is the easiest way to capture incoming "juice" and plugs GWT as a great tool for finding such broken incoming links.(in a youtube video from a couple years ago, I don't know if his position has changed since)
On the flip side, spammy webmasters rather enjoy redirecting tons of pages to a url in order to get it to rank more highly and Google takes action against them. GWT now also reports them as "not selected" which they describe as "URLs from your site that redirect to other pages...".
So which is it? Personally I feel that a redirect now tells Google "hey, our site really does have both of these urls" and that it's not as benign as it once was, if you have too many of them.
Example of a problem: Many webmasters running wordpress don't know that if you add ".." or "--" to the end of a url that the page will render just fine with that url unless you block this in htaccess. Wordpress also redirects partially formed urls to the correct page on a best guess basis. While this is standard behavior you can expect your friendly neighborhood spammer to know this too and kindly create a bazillion duplicate urls for you intentionally, simply by linking to those pages from his spam-city-network.
Canonical is great but doesn't solve the problem that Google now thinks you really do have a lot of extra urls, they aren't returning 404 or 410. Is that a big deal? probably not, in fact almost certainly not, unless you're the incredibly unlucky webmaster among us who seems to get devalued by this type of thing which *some* undoubtedly will, the algo is a machine after all.
My experience: I had a small 300 page blog that Google thought had over 8000 pages, 7700 of those "not selected". I made a 100% static copy of the site, ditched the CMS, ditched the affiliate redirects I had on 5-6 pages, made a sitemap detailing which 300 pages actually exist and reduced my .htaccess file to near nothing so that very little could be redirected.
My result: Thousands of error messages in GWT for pages that I didn't want in existence anyway but, eventually, no "not selected" either.
Note: I don't think Google visits all of the urls in question so on page elements like canonical may not work, they seem to just ping for a header response without requesting the page. Fix those headers too.
Panda Concern: Several top SEO sites recommend removing low quality and duplicate content in order to minimize your exposure to Panda, your CMS is likely not helping you by creating pages you don't even know about and that Google doesn't report, unless they disapear. Without seeing them in a 404 report or by watching your server logs closely you can't possibly minimize how many you have. I now avoid redirects if at all possible, if a page isn't mean to exist I want a 404 and if Google has found it and I want it gone I prefer 410.
I won't even get into Google's testing of fictitious urls at random just to see what they get.