[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: filename and normalization (was gcc identifiers)
>> For that reason, I dont like form D at all. I wonder how much space
>> it would take to represent every possible Jamo-combination, then just
>> do away with combining characters alltogether...
> No way!! The biggest blunder ever made by Korean nat'l standard body
>is to insist that 11,172 modern precomposed syllables be encoded
>in Unicode/10646. Next biggest blunder they made is to encode tens
>of totally unnecessary cluster-Jamos when only 17+11+17+ a few more
>would have been more than sufficient. Next stupid thing they did is
>to remove compatibility decomposition between cluster Jamos and basic
>Jamo sequences although they should be canonically(not just compatibly)
>equivalent. Now, you're saying that all possible combinations of them
>be encoded. How many? It's __infinite__ in theory. In practice, it
could
>be around 1.5 milllion. That's more than the total number of
codepoints
>available in 20.1 bit coded character set which is ISO 10646/Unicode.
Wow, ok, I guess that idea wont work for Korean.
Also, since glyph swapping has to be done for merely adjacent
characters,
doing it for combining ones must be a relatively minor concern.
Out of curiousity, how many of those Korean letters are actually
made use of by the language? 1.5 million sounds higher than any
number of phoneme's that a human can produce.... (what if the
cluster jamo's were dropped?)
Are we heading for a long-run scenario, where Form-D becomes canonical,
and all the old pre-composed codepoints are deprecated? NF-C seems
to be getting more and more entrenched from what I can tell...
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/