[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Encoding conversions
On Mon, 10 Sep 2001, Oyvind Holm wrote:
> > UTC is also working on restricting UTF-8 to something equivalent to
> > RFC 2279's definition (well, for the range U+0000 to U+10FFFF) in
> > Unicode 3.2. That's very good news I think.
>
> What will these restrictions be? Big changes?
Well, UTF-8 will be made simpler. Currently, Unicode-conformant UTF-8
decoders should accept 'irregular' UTF-8 (which is codepoint coded as
UTF-16, and then reencoded as UTF-8). With the change, there will be no
need for that anymore, and the decoder will be allowed to reject
irregulars, or even forget about their existance.
roozbeh
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/