[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: filename and normalization (was gcc identifiers)
> No way!! The biggest blunder ever made by Korean nat'l standard body
> is to insist that 11,172 modern precomposed syllables be encoded
> in Unicode/10646. Next biggest blunder they made is to encode tens
> of totally unnecessary cluster-Jamos when only 17+11+17+ a few more
> would have been more than sufficient. Next stupid thing they did is
> to remove compatibility decomposition between cluster Jamos and basic
> Jamo sequences although they should be canonically(not just compatibly)
> equivalent. Now, you're saying that all possible combinations of them
> be encoded. How many? It's __infinite__ in theory. In practice, it could
> be around 1.5 milllion. That's more than the total number of codepoints
> available in 20.1 bit coded character set which is ISO 10646/Unicode.
Would Chinese be in a similiar situation if it the radicals were
combining characters, and any combination of them could in theory be
a valid character? In practice, of course, a normal person would use
far fewer than 10,000 distinct characters.
Have you ever needed a character that wasnt among the 11,172 precomposed
ones?
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/