People are writing their info in MS Word and copy and pasting it into the textarea field and uploading to the database. I process feeds from this data. Problem is fancy quotes, ellipses, hyphens and a few other things are breaking the feeds.
Im using PHP. Is there a regular expression that will test for all these characters? I don't need to translate them. Im satisfied with deleting them.
Real problem is that browsers are POST-ing those characters even when a form is served UTF-8. If a form is served UTF-8 then the browser should only be allowing UTF-8 content within it.