[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Testing for UTF-8 tty mode



    Date:   Thu, 16 Sep 1999 00:09:56 +0100
    From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>

    My big vision is that /etc/profile just has to contain the line

    export LC_CTYPE=UTF-8

    and suddenly my system behaves on all levels like Plan9, i.e. ISO 8859-1
    is replaced absolutely everywhere with UTF-8. UTF-8 is used in
    filenames, environment variables, config files, C source code (relevant
    for the interpretation of L"..." strings), as the multi-byte encoding by
    the C library, in standard input/output, etc., mount passes it down to
    the foreign file system drivers, stty passes it down to the ttys (where
    it might resurface on the other side in an xterm or in the console/
    keyboard kernel driver), etc. Ext2fs treats filenames only as byte
    sequences and remains fully ignorant of the character encoding.

A good plan.
(But note that various other filesystems have built-in ideas
about the character set of the filenames.)

    > Today we have two distinct Unicode modes. Bruno adds a third one.
    > Is that a good idea? Maybe.

    I never liked the idea that that we have two different mechanisms in the
    Linux console to activate the UTF-8 mode, and I still consider this just
    to be a historic accident:

      - ESC % G to activate it in the part that sends characters to the screen
      - ioctl() to activate it in the part that processes the keystrokes

Probably Bruno's IUTF8 bit can replace the ioctl for the keyboard.
That would reduce us to two again. That is reasonable enough -
one should be able to control input and output separately.

Now the IUTF8 bit has properties very different from those of the ioctl:
The ioctl sets properties of the keyboard driver, while the bit
is set by some application programs and not by others.
Thus, the keyboard driver cannot do the conversion before
it is known who will read the data; that is, it must produce
16-bit values as found in the keymap, and leave it to the tty driver
to decide whether conversion to UTF8 is desired.

However, there are technical difficulties, since the tty driver
expects a byte stream. So, perhaps the keyboard driver should
always produce UTF8, a byte stream, and the tty driver should,
if the IUTF8 bit is not set, convert this back (and hope that
conversion back yields an 8-bit character).


Note that UTF8 is used here in the proper meaning of the word:
transformation format, without implication that Unicode is
involved. For example, a user uses ISO-8859-2 and not UTF8
and has a keymap showing ISO-8859-2 values - single bytes,
unrelated to Unicode. The keyboard driver uses the transformation
format to encode these as bytes or byte pairs (not knowing anything
about the character set the keymap is supposed to be in) and feeds
this to the tty driver, who converts back to the single ISO-8859-2
bytes. Awkward, but I suppose in this way the code will become
simplest.


Comments?


Andries
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/