I work mostly on the Adwords side of the business at my company, but they are discussing how to best handle duplicate content pages and I thought what better place to ask than webmasterworld. Because of poor site architecture, each page on the site has 10 identical pages...the only thing different is a slight difference in the URL based on how the user clicked on to the page. 301 redirects seems to be the way to fix this issue on the existing, and indexed, pages. But my concern is that we are seriously talking about 100,000+ 301 redirects and will Google not look favorably on this many 301's? The concern is the number of 301's and the timing of implementing them. We could do them in batches, but even still I am worried this could be very damaging.
Anyone with experience handling 301's at very high volume?
I went through this with a very large site with millions of pages. Most pages had duplicated urls. We had a www vs non-www issue. We had issues with the cms generating new urls when titles changed like /article-40932-title-of-the-article vs /article-40932-new-title. We had tons of different parameters that were stuck on urls for tracking purposes; both internal "return to" parameters and external parameters from ad campaigns.
We implemented canonicalization using 301 redirects. Googlebot started doing less crawling. Our traffic stayed steady during this process and for six months afterwards. Then we lost lots of rankings due to some sort of algorithm or penalty for about a month, then recovered. Some on our staff thought that the traffic loss could have been caused be the massive number of 301 redirects we introduced six months earlier. I since have come to believe that the loss was due to an algorithm update that Google launched changing the way they handled long lists of internal links. I believe that we recovered from that algorithm because Google manually put our site in an exception list such that the algorithm wouldn't apply to us. So, I believe that you can launch massive numbers of redirects without a problem, but my experience lends some doubt. This was also more than five years ago, so the situation may have changed.
The other way to handle this type of situation today.
If your problems are mainly driven from tracking parameters, you can use your webmaster tools account to tell Googlebot to ignore the tracking parameters. Under "Configuration" -> "URL Parameters" set the parameters that are used only for tracking users to "Representative URL".
Another way would be to implement the canonical meta tag rather than use 301 redirects. This has the advantage of not breaking any of the functionality of the tracking parameters that you have on your urls. With 301 redirects we had had to cookie all the tracking info from the urls before redirecting. You don't need to do that with the canonical tag.
Less than a year ago I folded 80 000 duplicate content URLs into a new URL format and new URL structure on a site with 1200 pages, with redirects for all non-canonical URL formats. It took Google several months to figure things out. WMT reports were crazy for quite a while. There were some times where traffic fell, but the long term saw quite reasonable growth.