This is me being ignorant, but are there not standardised character conversion tables/ tools that can be used to produce or assist re write rules to produce the URL's?
I'm sure there are, but I haven't personally used them :( There's no intrinsic relationship between a "base" letter and its modified forms; any given application has to be told that, for example, "ô" belongs with "o" rather than "a". (If I were giving the long version of this answer-- yes, this is the short version-- you would here get a disquisition on precombined vs. combining forms and the historical reasons for using one or the other. Well, it's not the Romans' fault that their language had less than 30 phonemes. We should all be so lucky.)
The problem, of course, is that one man's diacritic is another man's entirely different letter. For example æ began historically as a fusion of "a" and "e", but today it's a full-fledged independent letter in some languages' alphabets. Similarly it's no use saying that å is a modification of "a" and people will know what you mean if you just use "a". (You
might convince a French person that in some circumstances é, è and ê can all be expressed as "e", but you obviously can't reduce ç to c.)
I've got a feeling this comes down to a database question. I find it hard to believe that the database itself is currently ASCII-encoded; probably it
could accept some non-ASCII letters, though not necessarily all of them. The questions is which ones.
When I asked about URLs, I meant, for example,
example.com/hotels/montréal
as a real-life URL that someone wants to link to. That's é in the actual URL, as opposed to a simplified URL using
example.com/hotels/montreal
Or you could have
example.com/hotels/düsseldorf
vs.
example.com/hotels/duesseldorf
(I think Germans generally throw up their hands and do it this way for safety, but then German is an exceptionally easy language to work with in this respect.)
Throughout this post I've intentionally limited examples to characters in the Latin-1 (but beyond ASCII) character set, because I know they will display correctly in the present forums.