[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Perl & unicode weirdness.
(To avoid confusion, we don't call our encoding UTF-8. We tend to
say UTF-8 when we mean UTF-8, and "utf8" when we mean the more general
not-necessarily-Unicode encoding.
This is an insane way to make a distinction, just as silly as trying to
differentiate between "kilobits" and "kilobytes" with "kb" and "kB".
Changing hyphens and case doesn't make distinctions or avoid confusion.
I think he meant that the perl utf-8 implementation wasnt excessively
restrictive, not so much that it contained a unique or incompatible
encoding.
I personally think filtering the code-point range is a separate concern
from encoding itself. I dont think you would want a utf-32 input stream
to start dropping words just because they exceed 0x10FFFF.
So, imho, wrt to the terms "UTF-8", and "utf8", there is no difference in
"encoding", and hence no confusion.
(It's a shame that Perl doesn't behave like everyone else and obey
locale settings correctly; I thought we were finally getting away
from having to tell each program individually to use UTF-8. I don't
understand the logic of "RedHat set the locale to UTF-8 prematurely,
so Perl shouldn't obey the locale".)
I think because most programmers and existing code tend to expect binary
i/o, it is a practical
setting.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/