[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-16
I also believe that most everyone agrees that if Unicode had had
available in 1988 or even 1993 the current level of sophistication in
fonts and layout engines and the experience with character encoding
(including IDS and variation selectors), then it could have stayed
with a fixed-width 16-bit form.
Composing characters, and context-sensitive character make the value of
fixed width per code point somewhat diminished.
(for example you cannot assume that it is safe to break a string at any
even codepoint boundary)
UTF-16 may be "ugly" to some, but it works. (Before someone jumps in
here: I am not saying UTF-8 doesn't! All of UTF-8/16/32 "work".)
For processing, it is easier to deal with one-or-two units per code
point than one-or-two-or-three-or-four of them, and single-unit
performance optimizations are very useful for UTF-16.
Hardly, UTF-16 combines the worst aspects of UTF-32 and UTF-8 into one
congealed cluster.
It cannot recover from a single byte miss, it is sensitive to machine
byte order, it cannot be sorted
naively as a binary object, it cannot be embedded into source code as
literals (at best you get
ugly escape sequences), it cannot be used for web-pages or any existing
common wire protocols,
it cannot be sanely recommended for any future wire protocols, and its
the only one of the three
with no room for expansion, and it actually impinges on a good chunk of
the BMP that would
otherwise be useful.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/