Forum Moderators: coopster & phranque

Message Too Old, No Replies

Question about regular expression (unicode)

         

tntpower

12:39 pm on May 19, 2005 (gmt 0)

10+ Year Member



I want to match two words in a file (aisan language). I use ~ m/\u4E0A\u4E0B/ to try to match these two words. But it does not work. Can any guru give me some idea? Thanks

moltar

12:26 am on May 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First of all, check your Perl version.
  • Perl version 5.6 introduced partial Unicode support.
  • Perl 5.6.1 fixed some of its issues.
  • Perl 5.8 has comprehensive support for it.

If your Perl is up to date, then make sure you do one of the following:

Convert a string into Unicode:

$ustring = pack "U0C*", unpack "C*", $ustring;

Open a file in Unicode:

open( FILE, "<:utf8", $fname );

Switch existing descriptor into Unicode format after it was already open:

binmode DATA, ":utf8";