[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: questions with combining characters [was: Unicode: endpoint of evolution of encodings?]
On Thu, Nov 18, 2004 at 11:44:09AM -0500, Edward H. Trager wrote:
> On Thursday 2004.11.18 01:44:07 +0000, Christopher Fynn wrote:
>
> Hmmm, I'll have to read that document again and think about this one.
> One of the problems with Unicode is that it is, in many ways, such a mess.
> Based on first principles, people wanted Unicode to use a "character"
> model, not a "glyph" model. But it seems that what has really happened
> is that we've basically ended up with a "glyph" model for all of those scripts
> that already had legacy computer encodings at the time that Unicode came into existance:
> This includes Latin, Cyrillic, Greek, and Arabic among others.
> Only scripts that had never (or barely) had the fortune --or misfortune, depending on how
> you look at it-- to be encoded for use on computers have ended up in Unicode
> using a "character" rather than "glyph" based model. These would include
> scripts like Thaana, Devanagari, and Burmese. For those scripts, there are
> no "precomposed" forms -- and thus no difference between NFC versus NFD "normalizations".
> So, although it is more of a burden to display Burmese correctly, it might be
> easier to collate Burmese than it is to collate some European language texts where
> the text could be in NFC, NFD, or even some combination thereof ...
Hmm, I see it differently. All the "fully composed" characters are
indeed full characters in their own right, and Unicode is now adopting a
policy of not having the full characters encoded anymore , you need to
construct many latin letters out of a number of characters. So Unicode
has left the principle of encoding characters - symbols with distinct
meaning - and is now a kind of glyph registry.
This makes sorting harder to do, although it it not unfeasible to sort
eg latin letters in their full encoding together with decomposed
approximations in a convenient way, as demonstrated by ISO 14651.
Best regards
Keld
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/