[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
XTerm char-width handling
The Plan9 paper writes about a font loading mechanism, where people can
specify a set of fonts for subranges of UTF8 in a simple text file.
Meanwhile, thanks to the great work of the UTF8 conspirators in this list,
I am using UTF-8 Emacs, UTF-8 XTerm patchlevel 117, UTF8 Pine and much
more, but I still have one little big problem with my CJK texts, for which
Markus Kuhn already suggested the solution elsewhere a month ago:
> > When I compiled it and used it with your Japanese ISO-10646 fontset, I got
> > all latin characters in double width. This may be one of the first
> > things that need to be changed.
>
> Agreed. xterms needs to be able to load two fonts, one for half-width
> and one for full-width (e.g., 9x18 and 18x18). The following function
> should decide, which Unicode character should be normal or wide:
>
> /* This function tests, whether the ISO 10646/Unicode character code
> * ucs belongs into the East Asian Wide (W) or East Asian FullWidth
> * (F) category as defined in Unicode Technical Report #11. In this
> * case, the terminal emulator should represent the character using a
> * a glyph from a double-wide font that covers two normal (Latin)
> * character cells. */
>
> int iswide(int ucs)
> {
> if (ucs < 0x1100)
> return 0;
>
> return
> (ucs >= 0x1100 && ucs <= 0x115f) || /* Hangul Jamo */
> (ucs >= 0x2e80 && ucs <= 0xa4cf && ucs != 0x303f) || /* CJK ... Yi */
> (ucs >= 0xac00 && ucs <= 0xd7a3) || /* Hangul Syllables */
> (ucs >= 0xf900 && ucs <= 0xfaff) || /* CJK Compatibility Ideographs */
> (ucs >= 0xfe30 && ucs <= 0xfe6f) || /* CJK Compatibility Forms */
> (ucs >= 0xff00 && ucs <= 0xff5f) || /* Fullwidth Forms */
> (ucs >= 0xffe0 && ucs <= 0xffe6);
> }
>
> Note that the widths are not those of JIS X0208. JIS X0208 is bad here,
> because Greek and Cyrillic letters will also show up as wide characters,
> while Unicode gets this right, because cell-width and
> bytes-per-character are fortunately independent in UTF-8.
Can this go into the next patchlevel of XTerm ?
-phm
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/