[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: linux-utf8 terminfo description
Klaus Weide wrote on 1999-11-08 01:02 UTC:
> Bruno Haible's terminfo description source file, in
>
> <ftp://ftp.ilog.fr/pub/Users/haible/utf8/linux-utf8.terminfo>,
>
> has the following contents:
>
> linux-utf8|linux in Unicode (UTF-8) mode,
> use=linux,
>
> in other words, no change from the regular "linux" terminal type except
> for the name.
Is it really necessary to signal the character encoding via TERM
conventions? Isn't that, what LC_CTYPE is there for? Termcap/terminfo
have so far remained ignorant about the character encoding, and I am not
convinced, why we have to change this now.
> 1)
> acs_chars acsc ac graphics charset
> pairs, based on
> vt100
>
> Afaik these don't work in UTF-8 mode at all.
Correct. Switching half of ASCII to mean something else is ISO 2022
stuff, which is mutually exclusive with UTF-8. The DEC graphics
characters have Unicode codes in the U+2500 range, and that is what
ncurses should use when it is told (e.g., via having the substring
"UTF-8" in LC_CTYPE) that input/output is done in UTF-8 now. Unicode has
much more and nicer block graphics capabilities, so ncurses can be
extended to show the full graphical richness that we were used from
MS-DOS text mode programs. In XFree86 4.0, all fonts with
-Misc-Fixed-Medium-*-ISO10646-1 will be supersets of CP437 and WGL4, so
there is no need for ncurses to restrict itself to just the few DEC
graphics characters for the screen layout that it offers in UTF-8 mode.
Similarly, telnet/ssh clients on other platforms with UTF-8 support
(Kermit95, etc.) can also be expected to cover the entire WGL4
repertoire, including all DOS characters.
> Finally, a general observation: deducing the UTF-8 state of the terminal
> environment form the name of the $TERM is an ugly trick... All neccessary
> information an apllication should need should be in the *contents* of the
> temrinal description, not in its name. The same goes for attempts to
> get this info from LC_ALL/LC_CTYPE/LANG environment variables (Bruno's
> utf8locale.c). The info should be *in* the description, the name should
> not matter at all.
Basically agreed. For LC_ALL/LC_CTYPE/LANG however, since not all
application developpers want to use the C locale functions, a check on
the substring "UTF-8" seems to be a justifyable hack occasionally, at
least until UTF-8 support in C libraries has reached a high standard and
deployment.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/