homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
striping characters
camilord




msg:4625699
 9:03 pm on Nov 24, 2013 (gmt 0)

hi...

do you have any idea how to strip non-ascii characters?


all i need to use are characters visible in the keyboard.

A-Z
a-z
0-9
~!@#$%^&*()-=_+{}[]:";'<>,./?
space and tab

anybody can help me please?

 

penders




msg:4625776
 11:56 am on Nov 25, 2013 (gmt 0)

If you know the precise subset of characters you wish to allow (as you seem to do) then you are probably better doing this with a regex:

$str = 'String to sanitize'; 
$str = preg_replace('/[^A-Za-z0-9~!@#$%^&*()=_+{}[\]:";\'<>,.\/? \t-]/','',$str);


(Disclaimer: I think I've escaped the necessary characters in the regex, but need to test!)

The regex consists of one big negated (^ prefix) character class containing the chars you wish to allow. Any chars not belonging to this class are replaced with an empty string.

The hyphen (-) is placed at the end of the char class to remove its special meaning and match a literal hyphen.

\] - The closing square bracket is escaped to match a literal square bracket
\' - The apostrophe/single quote is escaped since I've used single quoted strings in PHP.
\/ - the slash is escaped since that is our regex delimiter.
\t - tab

Note that this regex omits the blackslash (\) and pipe (|) characters (omitted from your list of chars).

swa66




msg:4625820
 3:43 pm on Nov 25, 2013 (gmt 0)

all i need to use are characters visible in the keyboard.

Take care there are keyboard out there with a lot more characters visible ... E.g. in Germany the QWERTZU layout has characters with umlauts on it e.g. (if that survives this board ...)
The French (and Belgians) use an AZERTY layout that has characters like etc on it ...

And I'm sure in japan or other countries where the roman alphabet isn't used all that much there will be even weirder stuff on their keyboards.

IMHO: it's best to assume anything that's valid as UTF-8 will be valid input and keep a clear channel as much a possible.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved