deadsea - 2:49 pm on Apr 26, 2011 (gmt 0)
Parameter order can be effected by the order of elements in forms. Googlebot is now submitting and crawling simple forms. Even short of that, if users can get to the url, Google may crawl it based on toolbar, analytics, or adsense visibility.
Extra tracking parameters are often external, however sessionids in particular are often a problem on internal links. Some forum software is configured to use params when cookies are disabled and does so for googlebot.
One of my websites has mixed case canonical urls. A bunch of people on the web assume that all urls look better lower case. I've seen inbound links that lowercase entire urls. Some crawlers (not googlebot) try to crawl sites only with lower case urls.
For extra slashes, I've seen malformed links like href=".//page.html" that end up making a spider trap. Every time that link is clicked it gives back the same page with an extra slash in the url.
Once googlebot finds a non-canonical version of a url, it will continue to crawl it forever. Even if you fix the bug on your site that caused the link to become visible to start with. And unfortunately, there are a large number of possible ways to create non-canonical urls. Even savvy well meaning developers create them from time to time.