[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Linux console internationalization
Keld =?iso-8859-1?Q?J=F8rn?= Simonsen wrote on 2003-08-10:
> If we stick to ISO 10646 then you need to generate the fully
> composed characters to get the characters. Of cause these characters
> are needed, you cannot leave witout them in most languages in the
> latin script.
>
Why? What does ISO 10646 lack that Unicode has? I thought they are
pretty much the same... Doesn't ISO 10646 define combining
characters?
> In Unicode parlance that would mean that you use NFC for the input.
>
> For renderring you just output the fully composed characters. You
> should of cause also output the combining characters but that would
> mean further processing in the renderring engine. Some scripts do
> need this further processing in the renderring engine, such as the
> Indic scripts and Hangul Jamo.
>
Since NFC is merely recommended but not required, it is very possible
for me to have a UTF-8 text file even in simple european languages,
that has decomposed characters. If I want to be able to ``cat file``
onto the console, it's absolutely required that the console can handle
normalization. If it works in one form but not the other, users are
going to *very* surprised.
--
Beni Cherniavsky <cben@xxxxxxxxxxxxxxxxx>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/