Forum Moderators: phranque
Does anyone know how I would go about spotting such accents and converting them to the correct ascii, short of just looking out for é and all the other such characters that come through.
Using a french language keyboard as I do I get all my typing "corrected" at source ..
most of the sites I run or watch over are bi or tri lingual ..the logs sometimes show that user searches are not correctly encoded ..in french language I would expect this to be a problem more frequently encountered by Canadian fora owners ..other languages , ..
You could make a script to catch non English letters and convert them ..but if they are in the middle of non English words ..what good would that do ..if you have a multi lingual audience ..split your fora or ask that the posters translate to a common language their own posts as much as they can ..( the common language of your choice otherwise modding it will be a touch hit and miss :)..
I did run /admin/mod a bilingual posting site ( not mine ..clients ) ..c'etait un vrai bordel..( a real mess ) ..the actual translation of "c'etait un vrai bordel" wouldn't get past the WebmasterWorld bad word filters ..which shows the problems of cultural usage and modding and filters etc ..
posters would always be trying to see if I knew the latest street bad words in their language ..plus creole ..plus buerre ..waloff etc ..
life was too short to play their games ..so I shut the board ..
bon chance :)
If you are dealing with western-European languages like French, you could choose ISO-8859-1, or if you want to be more forward-looking, you can encode everything in UTF-8 . In both cases, characters such as éèàçû etc. do not need to be converted into HTML entities.
You might want to check out this this recent thread [webmasterworld.com] about character encoding.
<added> You mention "uploading" - are your users sending text documents which are stored on your server? If so, is the server running Linux/Unix? </added>
I read the excellent post you linked to, thanks for that. As I understand it I should probably put ISO-8859-1 in the meta tag to solve the issue (does it need to be there for both the page they use to input the data and the display page?) What I'm less clear about is on the fonts side what happens - will my standard English setup have the right version of arial to display those European characters?