Welcome to WebmasterWorld Guest from 54.147.20.131

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

striping characters

     

camilord

9:03 pm on Nov 24, 2013 (gmt 0)

5+ Year Member



hi...

do you have any idea how to strip non-ascii characters?


all i need to use are characters visible in the keyboard.

A-Z
a-z
0-9
~!@#$%^&*()-=_+{}[]:";'<>,./?
space and tab

anybody can help me please?

penders

11:56 am on Nov 25, 2013 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



If you know the precise subset of characters you wish to allow (as you seem to do) then you are probably better doing this with a regex:

$str = 'String to sanitize'; 
$str = preg_replace('/[^A-Za-z0-9~!@#$%^&*()=_+{}[\]:";\'<>,.\/? \t-]/','',$str);


(Disclaimer: I think I've escaped the necessary characters in the regex, but need to test!)

The regex consists of one big negated (^ prefix) character class containing the chars you wish to allow. Any chars not belonging to this class are replaced with an empty string.

The hyphen (-) is placed at the end of the char class to remove its special meaning and match a literal hyphen.

\] - The closing square bracket is escaped to match a literal square bracket
\' - The apostrophe/single quote is escaped since I've used single quoted strings in PHP.
\/ - the slash is escaped since that is our regex delimiter.
\t - tab

Note that this regex omits the blackslash (\) and pipe (|) characters (omitted from your list of chars).

swa66

3:43 pm on Nov 25, 2013 (gmt 0)

WebmasterWorld Senior Member swa66 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



all i need to use are characters visible in the keyboard.

Take care there are keyboard out there with a lot more characters visible ... E.g. in Germany the QWERTZU layout has characters with umlauts on it e.g. (if that survives this board ...)
The French (and Belgians) use an AZERTY layout that has characters like etc on it ...

And I'm sure in japan or other countries where the roman alphabet isn't used all that much there will be even weirder stuff on their keyboards.

IMHO: it's best to assume anything that's valid as UTF-8 will be valid input and keep a clear channel as much a possible.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month