[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hangul Jamo and wcwidth()



Markus Kuhn wrote:
 > 
 > Actually, I believe that X11 fonts should remain completely free of any
 > conjoining Jamo (U+1100 - U+11F9). This way, the question whether they
 > are half-width or full-width will not matter in practice.
 > 
 > Is anyone seriously planning to use ISO 10646-1 Level 2 for encoding
 > Korean? It is rather storage-space inefficient to decompose the Hangul
 > syllables (factor 3 blow-up).
 > 
 > I had always assumed that all Hangul that finds its way into POSIX
 > files, filenames, etc. will be exclusively from the precomposed
 > syllables in the U+AC00-U+D7A3 range. I thought of the conjoining Jamo
 > more as things that you might use temporarily inside input methods, but
 > nothing that should every be printed to the screen.

Well, personally I think it was a big waste of the BMP to encode the
enormous Hangul block introduced in Unicode 2.0.  Granted it is
elegant to be able to compose syllables algorithmically so easily, but
the truth is that the current Hangul block is full of "syllables" that
have never been and will never be used in Korean writing.

I would have preferred to stick to the 2350 syllables encodable in
EUC-KR and defined in Unicode 1.0, and to encode everything else using
conjoining Jamo. Since those 2350 syllables account for 99.9% of all
characters appearing in a modern Korean text, the fact that you need
three Jamo to encode the rest doesn't matter.

Not everything that can possibly be written in Hangul is encoded in
the U+AC00-U+D7A3 range, so conjoining Jamo are necessary to give
truth to the claim that Unicode can encode Korean writing.  For
practical purposes, Jamo are far less relevant, as the encodings
currently in use in Korea cannot encode anything outside the Hangul
block either (with the exception of HWP, a very widely used word
processor in Korea, which allowed to represent archaic Korean for many
years already---I don't know what encoding it uses).

Archaic Korean cannot be written using the Hangul block alone (and to
encode all archaic syllables in precomposed form would require
something like 400 000 more code points).  Granted, there is not such
a big market for writing medieval Korean (but I've also seen the
area-a (U+119E) being used for writing the dialect of Cheju island in
the south of Korea), although I do know people that need this.

I don't see a good reason why one shouldn't allow the possibility of
viewing a non-conjoint form of syllables.  After all, you also include
some representation for combining marks.  On the other hand, if you
consistently want to exclude scripts with a complex char-to-glyph
relationship such as Indian or Arabic, you might as well leave out the
Jamo.

 > Have a look at the xterm keysym2ucs.c mapping on
 > 
 >   http://www.cl.cam.ac.uk/~mgk25/ucs/keysym2ucs.c
 > 
 > and let me know, how you would like the Korean part to be mapped (if
 > mapping makes sense at all, since Korean keysyms should probably go
 > directly into a proper Hangul input method).
 > 
 > Is there already a Korean input method available that could easily be
 > modified to output UTF-8 instead of the old KS C 5601? If yes, then I'd
 > rather prefer to remove the entire Hangul part of xterm's keysym -> UCS
 > mapping as it is probably completely useless.

I think the latter is probably true.  Linux is quite popular in Korea
and there's more than one free X-IM available.  I've quickly looked at
the sources for hanIM (ftp://ftp.mizi.com/pub/hanIM) and it seems to
use ordinary 'English' keysym's to feed its IM.  I've never seen these
Korean keysyms, and don't really know how they are supposed to be
used.

Otfried


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/