I have a website that provides listings throughout the US. Back when I was first building the site about three years ago, I created separate category pages for categories that were very similar to each other.
So, for every city and zip code in the US we provide listings under 5 different categories, and three of those categories are very similar. Basically the 3 categories are the same but the term used varies greatly depending on the region of the country you're from.
Up until recently we were getting a high volume of traffic from each category. But now I think these 3 similar categories are being considered duplicate content, or at minimum cannibalizing each other in G SERPS.
I believe this is true because, besides the major loss of traffic, now when searching for any of the three category keywords the remaining two will get highlighted by G if they appear in the title/desc. of any results displayed.
My question is what's the best strategy for removing two of the categories from the index and setting the third as the version to index? I still want to keep the two being removed from index available to maintain user experience.
I'm thinking the best strategy is to set a canonical tag on the two categories being removed to the one category remaining in index.
Would I also update the robots tag to NOINDEX or 'NOINDEX, NOFOLLOW'?
Also, there are about 70,000 pages on the site per category. So this update will possibly be no-indexing/manipulating 140,000 pages on my site. With my site already down in the SERPS it's vital that I don't screw this up!
Would you perform this on a small test section first(I'm thinking yes now as I ask this)?