Forum Moderators: coopster

Message Too Old, No Replies

cleanup cp1252 chars between 127 and 160

How to best cleanup cp1252 characters between 127 (80) and 160 (9F)

         

bwakkie

10:14 am on Apr 1, 2009 (gmt 0)

10+ Year Member



I have the following function:

function clean_up_cp1252($str){
$badlatin1_cp1252_to_htmlent = array("\x80"=>"€","\x81"=>'?',"\x82"=>'‚',"\x83"=>'ƒ',"\x84"=>'„', "\x85"=>'…',"\x86"=>'†',"\x87"=>'‡', "\x88"=>'ˆ',"\x89"=>'‰',"\x8A"=>'Š', "\x8B"=>'‹',"\x8C"=>'Œ',"\x8D"=>'?', "\x8E"=>'Ž',"\x8F"=>'?',"\x90"=>'?',"\x91"=>'‘',"\x92"=>'’',"\x93"=>'“', "\x94"=>'”',"\x95"=>'•',"\x96"=>'–', "\x97"=>'—',"\x98"=>'˜',"\x99"=>'™', "\x9A"=>'š',"\x9B"=>'›',"\x9C"=>'œ', "\x9D"=>'?',"\x9E"=>'ž',"\x9F"=>'Ÿ');
$str = strtr($str, $badlatin1_cp1252_to_htmlent);
return $str;
}

It suppose to clean-up word pasted text in a form.
It works almost...
when someone pastes a € (Euro) sign it gets replaced in a strange way: € So with the extra â and ¬ around it
How can I solve this? What am I missing?

[edited by: eelixduppy at 1:47 pm (utc) on April 1, 2009]
[edit reason] fixed side scroll [/edit]

bwakkie

10:43 am on Apr 1, 2009 (gmt 0)

10+ Year Member



In vim it does work like this:
(I used it to cleanup a mysql dump)

:%s:\%x80:\€:g
:%s:\%x82:\‚:g
:%s:\%x83:\ƒ:g
:%s:\%x84:\„:g
:%s:\%x85:\…:g
:%s:\%x86:\†:g
:%s:\%x87:\‡:g
:%s:\%x88:\ˆ:g
:%s:\%x89:\‰:g
:%s:\%x91:\‘:g
:%s:\%x92:\’:g
:%s:\%x93:\“:g
:%s:\%x94:\”:g
:%s:\%x95:\•:g
:%s:\%x96:\–:g
:%s:\%x97:\—:g
:%s:\%x98:\˜:g
:%s:\%x99:\™:g
:%s:\%x8B:\‹:g
:%s:\%x8C:\Œ:g
:%s:\%x8E:\Ž:g
:%s:\%x8A:\Š:g
:%s:\%x9A:\š:g
:%s:\%x9B:\›:g
:%s:\%x9C:\œ:g
:%s:\%x9E:\ž:g
:%s:\%x9F:\Ÿ:g

bwakkie

10:48 am on Apr 1, 2009 (gmt 0)

10+ Year Member



lets see how webmasterworld.com handles them btw ;-)

€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ