homepage Welcome to WebmasterWorld Guest from 54.196.62.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Cyrillic strlen
fm86

5+ Year Member



 
Msg#: 4339482 posted 2:05 pm on Jul 14, 2011 (gmt 0)

Good day everybody!

I have a problem validating a form whose textarea contains cyrillic characters. I want that every text longer than 450 gets rejected.
With javascript the string has lenght 440 so it passes the pre-submit control. But for PHP strlen(stripslashes($_POST['text'])) returns 792 and so the text is rejected. How can I solve this?

Thanks a lot!

This is my test string:
ет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им откр

 

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4339482 posted 3:07 pm on Jul 14, 2011 (gmt 0)

It sounds like your $_POST['text'] is a multi-byte string containing unicode (multi-byte) characters. strlen() counts the number of characters assuming single-byte chars (which I don't think is simply the number of bytes) so the figure is too high. You probably need to call mb_strlen() [uk3.php.net] instead which will count the number of characters in a particular encoding.

fm86

5+ Year Member



 
Msg#: 4339482 posted 3:32 pm on Jul 14, 2011 (gmt 0)

You are right, it works. Thanks a lot!

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4339482 posted 7:59 pm on Jul 14, 2011 (gmt 0)

It sounds like your $_POST['text'] is a multi-byte string containing unicode (multi-byte) characters. strlen() counts the number of characters assuming single-byte chars (which I don't think is simply the number of bytes) so the figure is too high.

Many string-length counting functions in many languages (computer langs, not human langs) run into the same problem. The more common non-Roman scripts, including the non-ASCII half of Latin-1, are all in the two-byte block and therefore get counted double-- but only for spaces, not for punctuation. ("Vanilla" punctuation is in the one-byte range but if they've used anything fancy like curly quotes you are in still deeper trouble because now you're in the three-byte range.)

When necessary you can take advantage of the fact that certain bytes only occur as the first element of a two-byte character: in the case of Cyrillic, D0 - D4, or E0 for fancy punctuation.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved