[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Perl in a UTF-8 locale
On Mon, Nov 10, 2003 at 05:20:59PM +0000, Edmund GRIMLEY EVANS wrote:
> I have a problem here with Perl v5.8.0 on Red Hat 9. Simplified, my
> script looks like this:
>
> while (<>) {
> s/ĉ/cx/g;
> print;
> }
>
> This works with older versions of Perl, and it works in the C locale,
> but it doesn't work here in a UTF-8 locale. I tried putting stuff like
> "use bytes" or "no utf8" or "no locale", but it didn't help.
As long as the Perl script and the input is in the same encoding, it
works for me. (Debian unstable)
This is perl, v5.8.0 built for i386-linux-thread-multi
10:14am glenn@zewt/2 [~] cat testing.txt; file testing.txt
abĉd
testing.txt: UTF-8 Unicode text
10:17am glenn@zewt/2 [~] LANG=en_US.UTF-8 ./xxx.pl < testing.txt
abcxd
10:14am glenn@zewt/2 [~] LANG=C ./xxx.pl < testing.txt
abcxd
10:14am glenn@zewt/2 [~] LANG=en_US.ISO-8859-3 ./xxx.pl < testing.txt
abcxd
ISO-8859-3:
10:17am glenn@zewt/2 [~] LANG=en_US.UTF-8 ./xxx3.pl < testing-3.txt
abcxd
10:18am glenn@zewt/2 [~] LANG=C ./xxx3.pl < testing-3.txt
abcxd
10:18am glenn@zewt/2 [~] LANG=en_US.ISO-8859-3 ./xxx3.pl < testing-3.txt
abcxd
(Of course, no locale works if I mix encodings.)
> exec("/path/to/this/script", @ARGV);
> }
> .)??D??-|??ˊ{??v??W?z[
Hmm. What's this garbage at the end of the message? Oh. Poking at the
raw message body, it's the stupid footer that the mailing list blindly
spams on every message (despite this being a base64 message).
--
Glenn Maynard
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/