[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
To Do List for 2001
I visited SuSE headquarters in Nürnberg over the holidays, and we played
around a bit with their latest beta release and discussed UTF-8 locales
to see what major milestones there are still before we can recommend
them for general use. With the first glibc 2.2 Linux distributions being
shipped, the UTF-8 infrastructure (glibc, X11) has now reached a level
where we can more easily approach a large number of developers to get
them interested in adding UTF-8 support to their applications.
Quite a number of things work already very nicely out of the box in for
example the SuSE 7.1 beta. You can do
LANG=ja_JP.UTF-8 xterm
and then type in the new window simply
date
to see nice Kanji month names etc.
The current top open problems that we identified were:
- readline and the bash command line editor still live in a 1 byte =
1 character = 1 terminal column world, and could do well with a little
bit of added mbrtowc() and wcwidth() logic to get this right.
http://x-lt.richard.eu.org/me/rch/ll.html#bash
http://kki.net.pl/qrczak/bash-2.02-utf8.patch.gz
Is this already on its way into the next bash release?
- Emacs - there is the mule-ucs package, which is an acceptable temporary
solution, but UTF-8 is not yet really properly integrated in the Emacs
architecture, and lot of tiny details fail (search, etc.). It seems
though that proper native UTF-8 support is on the agenda for after
Emacs 21 is released.
- groff still lives in a 1 character = 1 terminal column world,
with the expected horrible paragraph reformatting results if you
pursue adventures such as Japanese UTF-8 man pages.
- less still assumes wcwidth() == 1 for all characters and therefore
fails to wrap UTF-8 lines with ideographs correctly
- Many popular email packages still need to be updated (pine, nmh, etc.).
Increasing use of UTF-8 in email will no doubt act as a big catalyst
for improving UTF-8 support elsewhere.
- Tk 8.4 still patches only 8-bit fonts together (incredibly slowly)
and can't use ISO10646-1 fonts directly.
What else?
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/