[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Progress on xterm with combining characters, wcwidth




Robert Brady announced his xterm with support for doublewidth characters
and combining characters three weeks ago. [1]

I've modified CLISP Common Lisp [2] to take advantage of this feature:

- Added two functions
    (char-width character) -> integer
    (string-width string)  -> integer
  which return the number of screen column needed for a character or string.

- Modified the pretty printer and formatted I/O system to use
  (string-width string) instead of (length string) in the right places.

A screenshot is available in [3].

This is all based on `wcwidth'. The pretty printer uses `wcwidth'
extensively, to keep track of the screen columns. For speed, I chose a
`wcwidth' implementation based on table lookup [4], not binary search,
and even so the pretty printer got a 30% slowdown.

Thomas Wolff wrote:
> One thing I'd need is a function to tell me which characters have which 
> sort of behaviour.

Markus posted such a function. Except that you should call it "isnonspacing",
not "iscombining": It covers the "Non-spacing" property of PropList.txt [5].
Note there are also combining characters with a width of 1. The first one
is U+0903. All of them are in Indic scripts. How are they supposed to be
rendered by a simple rendering engine as xterm?

Bruno

[1] http://mail.nl.linux.org/linux-utf8/1999-11/msg00069.html
[2] ftp://cellar.goems.com/pub/clisp/clisp-1999-11-30.tgz
[3] http://clisp.cons.org/~haible/fibjap-xterm.gif
[4] ftp://ftp.ilog.fr/pub/Users/haible/utf8/libutf8-0.6.1.tar.gz
    file libutf8-0.6.1/extras/wcwidth.c
[5] ftp://ftp.unicode.org/Public/3.0-Update/PropList-3.0.0.txt
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/