Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regex failing in Unicode

Patterns fail Unicode text file

         

timster

1:51 pm on Jul 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was trying to do some pretty basic regular expression matches, such as:

/COLLATE/

but they weren't matching. When I noticed the input file was Unicode, I saved it as "non-Unicode" and tried again, and the patterns matched.

But I need to be able to parse the original Unicode.

I'm running Perl 5.8.1 on Mac OS X.

I reckon this is an easy one, but Googling has not helped. Any help is appreciated.

timster

12:42 pm on Jul 30, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone?

Perl 5.8 reads a Unicode file correctly, except that regexes don't work.

coopster

4:10 pm on Jul 30, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Don't have an answer for you, but did you read this already?

[perl.com...]

Unfortunately, there is currently no way to tell Perl that incoming data from an external file is Unicode; while you can write Unicode data out to a file, you cannot read Unicode data back in again. While you can work around this with tr///CU, it's obviously a serious shortcoming, which we hope will be addressed soon.