[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
linux-utf8 terminfo description
From: Klaus Weide <kweide@enteract.com>
: Bruno Haible's terminfo description source file, in
:
: <ftp://ftp.ilog.fr/pub/Users/haible/utf8/linux-utf8.terminfo>,
:
: has the following contents:
:
: linux-utf8|linux in Unicode (UTF-8) mode,
: use=linux,
:
: in other words, no change from the regular "linux" terminal type except
: for the name. This should be adapted to better describe the capabilities
: that are actually there when the console is in UTF-8 mode. Maybe we can
: come up with the necessary changes here, and "linux-utf8" can become
: somewhat more "offical" (as far as that goes; I guess it would mean its
: becomes part of the ncurses distribution and/or esr's distribution).
: ...
I have also seen such a thing for "xterm-utf8" already, so I introduced
this as one of the detection methods for my editor. It is by no means
reliable, however...
: Finally, a general observation: deducing the UTF-8 state of the terminal
: environment form the name of the $TERM is an ugly trick... All neccessary
: information an apllication should need should be in the *contents* of the
: temrinal description, not in its name. The same goes for attempts to
: get this info from LC_ALL/LC_CTYPE/LANG environment variables (Bruno's
: utf8locale.c). The info should be *in* the description, the name should
: not matter at all. Any disagreements?
Whether in the name or the contents - I think another aspect is more
important here:
The termcap and terminfo entries normally have to be installed by the
system administrator. So you cannot rely on such entries to be available.
Of course, if you regulary work on a system and have a cooperative
system administrator you can get that done but this is not always the case.
I often have to access rarely used machines (e.g. in a lab) quickly and
want to get a minimal usable personal environment out of the box. Also
system administrators are sometimes quite neglectful of any system
configuration which they consider non-standard fancy...
To summarize my argument shortly: There are many situations in which
normal users need a way to achieve configuration setup themselves and
cannot spend time on negotiations with some administrator first. That's
why TERM is not practical as a standard means of indicating a UTF-8
environment.
From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>
: Klaus Weide wrote on 1999-11-08 01:02 UTC:
: > Bruno Haible's terminfo description source file, in
: >
: > <ftp://ftp.ilog.fr/pub/Users/haible/utf8/linux-utf8.terminfo>,
: >
: > has the following contents:
: >
: > linux-utf8|linux in Unicode (UTF-8) mode,
: > use=linux,
: >
: > in other words, no change from the regular "linux" terminal type except
: > for the name.
:
: Is it really necessary to signal the character encoding via TERM
: conventions? Isn't that, what LC_CTYPE is there for? Termcap/terminfo
: have so far remained ignorant about the character encoding, and I am not
: convinced, why we have to change this now.
In contrast to what I said above, there is one thing in favour of the
TERM mechanism, on the other hand;
TERM is the only information that is widely and almost reliably passed
over telnet and rlogin connections. This is a major deficiency of Unix
networking (from the user interface perspective). For years now, I
have used a trick to stuff at least the X DISPLAY reference in the TERM
variable in order to get it passed over because it's really a nuisance
always to loose this information. I have extended the trick now:
script "rl" contains:
case "$DISPLAY" in
:*) DISPLAY=${HOSTNAME-`hostname`}$DISPLAY;;
esac
TERM="$TERM@$DISPLAY@$LC_CTYPE"
.profile contains:
set - `echo $TERM | sed -e "s,[@|], ,g"`
TERM="$1"
DISPLAY=${DISPLAY-$2}
case "$3" in
*UTF*|*utf*) LC_CTYPE=$3;;
*) LC_CTYPE=${LC_CTYPE-$3};;
esac
(Maybe "LC_CTYPE=$3" would be sufficient.)
An attempt to establish a more generic mechanism (arbitrary variables whose
names are also included in the string) failed due to some weird length
limitation for TERM in the Linux rlogin daemon.
For the Linux version of telnet, UTF-8 connections would only gain
reasonable handling anyway if finally someone would make it 8-bit clean
in a decent way (i.e. by default and without that silly implication of
disabling CR->LF adaptation).
: > Finally, a general observation: deducing the UTF-8 state of the terminal
: > environment form the name of the $TERM is an ugly trick... All neccessary
: > information an apllication should need should be in the *contents* of the
: > temrinal description, not in its name. The same goes for attempts to
: > get this info from LC_ALL/LC_CTYPE/LANG environment variables (Bruno's
: > utf8locale.c). The info should be *in* the description, the name should
: > not matter at all.
:
: Basically agreed. For LC_ALL/LC_CTYPE/LANG however, since not all
: application developpers want to use the C locale functions, a check on
: the substring "UTF-8" seems to be a justifyable hack occasionally, at
: least until UTF-8 support in C libraries has reached a high standard and
: deployment.
As there are good reasons (the same that apply against TERM) not to use the
locale mechanism, I would not say use of these variables is a hack -
it's the only mechanism freely available to users and thus very important.
Also, I don't see how that locale library stuff could obsolete the
environment variables. The man page says, programs start up in "C"
locale by default and have to invoke something like
setlocale (LC_ALL, "") so what's the advantage of using it at all?
From: Bruno Haible <haible@ilog.fr>
Message-Id: <199911081128.MAA17407@jaures.ilog.fr>
To: linux-utf8@nl.linux.org
Subject: Re: linux-utf8 terminfo description
In-Reply-To: <Pine.NEB.3.96.991107183615.24148A-100000@shell-2.enteract.com>
References: <Pine.NEB.3.96.991107183615.24148A-100000@shell-2.enteract.com>
X-Orcpt: rfc822;linux-utf8@nl.linux.org
Sender: owner-linux-utf8@nl.linux.org
Precedence: bulk
Reply-To: linux-utf8@nl.linux.org
X-UIDL: 4f3e3754a853fd68d1a22b95e752ae0a
Status: RO
From: Bruno Haible <haible@ilog.fr>
: Klaus Weide writes:
:
: > Finally, a general observation: deducing the UTF-8 state of the terminal
: > environment form the name of the $TERM is an ugly trick... All neccessary
: > information an apllication should need should be in the *contents* of the
: > temrinal description, not in its name.
:
: I entirely agree with you. But before deciding something here, let's get
: the "big picture".
:
: 1. A kernel tty must be informed about UTF-8. The line editing behaviour
: (tab and backspace) depends on it. Actually, so that pressing Tab in
: an xterm with wide font works, the kernel needs to have a full wcwidth
: function in kernel space (1.5 KB of data).
:
: If that goes accepted:
:
: 2. telnet and rlogin must be modified to pass the UTF-8 state from one
: machine do the other. telnet already passes the DISPLAY environment
: variable, so this is nothing unsurmountable. But rlogin is a bit harder.
: I introduced "linux-utf8" and "xterm-utf8" terminfos to get this done
: with minimum impact on rlogin, but we can probably get away without it:
: "rlogin" could add the "-utf8" suffix itself, and "rlogind" would then
: remove it and set the remote tty into UTF-8 mode.
This would again introduce incompatibility with any legacy system -
not an acceptable solution for many applications or environments.
: I'm not convinced "linux-utf8" and "xterm-utf8" are a good idea, because
: they are redundant with LC_CTYPE. If your LC_CTYPE is iso-8859-1 and your
: xterm is UTF-8, or vice versa, not screen aware applications (like "cat")
: will do the wrong thing.
:
: > The same goes for attempts to get this info from LC_ALL/LC_CTYPE/LANG
: > environment variables (Bruno's utf8locale.c).
:
: If nl_langinfo(CODESET) would work on all systems, we wouldn't need code
: which peeks at the environment variables.
Yes but it doesn't and will not work for as long as it would be needed...
(I assume you mean the locale stuff.)
That's why I think it's pretty useless to rely on it or spend any effort
in improving it for handling of UTF-8 - what's wrong with environment
variables?
Thomas Wolff
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/