Welcome to WebmasterWorld Guest from 54.161.178.52

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Cyrillic strlen

     
2:05 pm on Jul 14, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 30, 2008
posts: 42
votes: 0


Good day everybody!

I have a problem validating a form whose textarea contains cyrillic characters. I want that every text longer than 450 gets rejected.
With javascript the string has lenght 440 so it passes the pre-submit control. But for PHP strlen(stripslashes($_POST['text'])) returns 792 and so the text is rejected. How can I solve this?

Thanks a lot!

This is my test string:
ет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им открет им откр
3:07 pm on July 14, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3123
votes: 0


It sounds like your $_POST['text'] is a multi-byte string containing unicode (multi-byte) characters. strlen() counts the number of characters assuming single-byte chars (which I don't think is simply the number of bytes) so the figure is too high. You probably need to call mb_strlen() [uk3.php.net] instead which will count the number of characters in a particular encoding.
3:32 pm on July 14, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 30, 2008
posts: 42
votes: 0


You are right, it works. Thanks a lot!
7:59 pm on July 14, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13436
votes: 389


It sounds like your $_POST['text'] is a multi-byte string containing unicode (multi-byte) characters. strlen() counts the number of characters assuming single-byte chars (which I don't think is simply the number of bytes) so the figure is too high.

Many string-length counting functions in many languages (computer langs, not human langs) run into the same problem. The more common non-Roman scripts, including the non-ASCII half of Latin-1, are all in the two-byte block and therefore get counted double-- but only for spaces, not for punctuation. ("Vanilla" punctuation is in the one-byte range but if they've used anything fancy like curly quotes you are in still deeper trouble because now you're in the three-byte range.)

When necessary you can take advantage of the fact that certain bytes only occur as the first element of a two-byte character: in the case of Cyrillic, D0 - D4, or E0 for fancy punctuation.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members