[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

To Do List for 2001



I visited SuSE headquarters in Nürnberg over the holidays, and we played
around a bit with their latest beta release and discussed UTF-8 locales
to see what major milestones there are still before we can recommend
them for general use. With the first glibc 2.2 Linux distributions being
shipped, the UTF-8 infrastructure (glibc, X11) has now reached a level
where we can more easily approach a large number of developers to get
them interested in adding UTF-8 support to their applications.

Quite a number of things work already very nicely out of the box in for
example the SuSE 7.1 beta. You can do

  LANG=ja_JP.UTF-8 xterm

and then type in the new window simply

  date

to see nice Kanji month names etc.

The current top open problems that we identified were:

  - readline and the bash command line editor still live in a 1 byte =
    1 character = 1 terminal column world, and could do well with a little
    bit of added mbrtowc() and wcwidth() logic to get this right.

    http://x-lt.richard.eu.org/me/rch/ll.html#bash
    http://kki.net.pl/qrczak/bash-2.02-utf8.patch.gz

    Is this already on its way into the next bash release?

  - Emacs - there is the mule-ucs package, which is an acceptable temporary
    solution, but UTF-8 is not yet really properly integrated in the Emacs
    architecture, and lot of tiny details fail (search, etc.). It seems
    though that proper native UTF-8 support is on the agenda for after
    Emacs 21 is released.

  - groff still lives in a 1 character = 1 terminal column world,
    with the expected horrible paragraph reformatting results if you
    pursue adventures such as Japanese UTF-8 man pages.

  - less still assumes wcwidth() == 1 for all characters and therefore
    fails to wrap UTF-8 lines with ideographs correctly

  - Many popular email packages still need to be updated (pine, nmh, etc.).
    Increasing use of UTF-8 in email will no doubt act as a big catalyst
    for improving UTF-8 support elsewhere.

  - Tk 8.4 still patches only 8-bit fonts together (incredibly slowly)
    and can't use ISO10646-1 fonts directly.

What else?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/