[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8 curses



I now have mutt and slang more or less working in UTF-8, but I want to
get the interface right between the two.

How should curses be extended to Unicode?

Mutt uses slang's curses-compatible functions. I changed none of the
function prototypes: addch and addstr and friends all take UTF-8. (It
would have been harder to modify mutt if I hadn't allowed a single
character to be delivered by multiple calls to addch.) You could
presumably have additional functions addwch, addwstr, etc for wide
characters, if you wanted.

But how should one switch the library into UTF-8 mode? You could have
an additional function for this, but is it possible or desirable to
avoid having an extra function? Without an additional function, a
program compiled for UTF-8-curses could still run, in non-UTF-8-mode,
with an older version of curses. Or is this easy to achieve with weak
symbols anyway?

Double-width chars: I think it's clear that these fill two character
cells, and if you overwrite one of the cells, then the other should be
replaced by a space in the same colour as the double-width character
just destroyed. A really nasty case is when you receive one of these
characters when the cursor is in the last column.

This case is nasty, because a program might want to avoid wrapping
onto the next line, and perhaps even causing the screen to scroll, by
outputing UTF-8 octets while watching which column the cursor is in.
If you're on the last column, you think it's safe to continue, but
then you suddenly find you've trashed the next line, and perhaps the
whole screen because of scrolling.

Both slang and curses allow you to adjust the line-wrapping and
scrolling behaviour, but I haven't yet investigated in detail ...

Last question: how useful is it to allow characters with more than 16
bits?

It's easiest to change slang by storing the character plus attributes
(colour, etc) in a single integer, which is unsigned short at present
and can easily be extended to unsigned long. Then you have the choice
of either 24 bits character plus 8 bits colour, or perhaps 16 bits
colour and 16 bits character. If you think you might one day want 32
bit characters, it would be wise to provide for that in the API, even
if you don't want to implement it internally immediately.

So it would be useful for me to have a better idea of when and with
what probability characters with more than 16/24 bits might be useful
in the context of curses. Thanks for any clues.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/