Lets say I have a UGC heavy site (over 1 million pages), where there are a lot of links to signup and add content etc, all of which have a "return to" parameter, e.g.:
bluewidgets.com/widget1
has links to
bluewidgets.com/widget1?action=add_comment&return_to=widget1
bluewidgets.com/signup?return_to=widget1
etc.
What is the best way to handle this (assuming there are 5 of these a page, i.e. 5 million links)? There seem to be a plethora of options:
a) open up all these pages and canonical them (e.g. bluewidgets.com/signup?return_to=widget1 > bluewidgets.com/signup). downside of this is there are millions of these links, and letting the Googlebot crawl them seems like a pretty big waste of its time.
b) robots.txt block them. on the upside this should improves the Googlebot's crawl efficiency. on the downside these pages still get indexed and accumulate pagerank.
c) open them up but noindex them. on the upside this lets pagerank flow to other pages (rather than a deadend like the robots.txt). again the downside to this seems that allowing them to be crawled is a pretty big waste of the Googlebot's time.
d) nofollow the links. downside of this is that these links are "wasted" from a pagerank perspective (at least based on my understanding). upside is more crawl efficiency.
e) turn them into buttons...
So far I have opted to robots.txt block them. But I suspect this is not the most optimal choice. I'm leaning towards removing the "return_to" parameter, and opening them up, although I think this might be pretty hard for the dev team to implement.
Would appreciate any thoughts.
Cheers,
Dan