[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XTerm char-width handling



The Plan9 paper writes about a font loading mechanism, where people can
specify a set of fonts for subranges of UTF8 in a simple text file.

Meanwhile, thanks to the great work of the UTF8 conspirators in this list,
I am using UTF-8 Emacs, UTF-8 XTerm patchlevel 117, UTF8 Pine and much
more, but I still have one little big problem with my CJK texts, for which
Markus Kuhn already suggested the solution elsewhere a month ago:

> > When I compiled it and used it with your Japanese ISO-10646 fontset, I got
> > all latin characters in double width.  This may be one of the first
> > things that need to be changed.
> 
> Agreed. xterms needs to be able to load two fonts, one for half-width
> and one for full-width (e.g., 9x18 and 18x18). The following function
> should decide, which Unicode character should be normal or wide:
> 
>   /* This function tests, whether the ISO 10646/Unicode character code
>    * ucs belongs into the East Asian Wide (W) or East Asian FullWidth
>    * (F) category as defined in Unicode Technical Report #11. In this
>    * case, the terminal emulator should represent the character using a
>    * a glyph from a double-wide font that covers two normal (Latin)
>    * character cells. */
> 
>   int iswide(int ucs)
>   {
>     if (ucs < 0x1100)
>       return 0;
> 
>     return
>       (ucs >= 0x1100 && ucs <= 0x115f) || /* Hangul Jamo */
>       (ucs >= 0x2e80 && ucs <= 0xa4cf && ucs != 0x303f) || /* CJK ... Yi */
>       (ucs >= 0xac00 && ucs <= 0xd7a3) || /* Hangul Syllables */
>       (ucs >= 0xf900 && ucs <= 0xfaff) || /* CJK Compatibility Ideographs */
>       (ucs >= 0xfe30 && ucs <= 0xfe6f) || /* CJK Compatibility Forms */
>       (ucs >= 0xff00 && ucs <= 0xff5f) || /* Fullwidth Forms */
>       (ucs >= 0xffe0 && ucs <= 0xffe6);
>   }
> 
> Note that the widths are not those of JIS X0208. JIS X0208 is bad here,
> because Greek and Cyrillic letters will also show up as wide characters,
> while Unicode gets this right, because cell-width and
> bytes-per-character are fortunately independent in UTF-8.

Can this go into the next patchlevel of XTerm ?

-phm



-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/