enigma1 - 3:45 pm on Apr 28, 2011 (gmt 0)
For the initial request I will always do a redirect on a mismatch. It's just where to redirect but that depends on the query. I think a 404 passes no juice in any case and for retired items, I would prefer the visitor to go to a similar one if possible. If it's not possible redirect to the home page.
Linking of products into multiple categories or articles into topics etc I found it harder to manage in various cases. That is if the requirement is to expose both categories and products, or brands and products with the product url. I would have to pick up the first brand or category and prefix the url one way or another to get around the dup generation. Using ids the challenge is going to be the same.
And yes, is way simpler to generate the urls for a single entity vs combining parameters.
Adding a hash would mean another encoder/decoder although tiny I try to simplify the code as much as possible. You also need a prefix for the ids in order to differentiate the entity type.
Back to the OP's post a bit, robots.txt I rarely used it to filter urls because it doesn't work. Google will still access them because it won't read the robots.txt before every single request. If for some reason you need to modify the url structure or add/remove parameters make sure the old structure is not regenerated in some way.
I think lots of the canonical problems because of it. In some cases hard-coded links are forgotten, hidden with content and the store owner tries to block the old url but Google still sees it. There is no detailed information about it in WMT, in other words all steps the search engine followed to get to the duplicate page. That would be really helpful.
Years ago when shared hosting was popular, I remember seeing cases where urls will be inserted with the content and will include the session id. Needless to say what the consequences are. I am sure a similar thing happens today with tracking ids, referrers etc and just one mistake in the content, can cause a great number of duplicated pages and security issues.