[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Transliteration for use in UTF-8 locales
Edmund GRIMLEY EVANS wrote on 2000-10-13 13:58 UTC:
> The sort of cases I'm worrying about are:
>
> - sending data to a child process through command-line arguments and
> pipes, e.g. mutt talking to gpg
Pick a suitable locale for communicating with the child process (e.g.,
C.UTF-8), set the locale before you send out the data, and tell the
locale to the child process in the LC_CTYPE environment variable that
you provide in the execle() call.
I suggest the (I think obvious and sensible) convention that if locales
end in the name with ".UTF-8", they shall never use transliteration. But
it would be nice to have additional locales such as
de_DE.UTF-8@romanized that do transliteration, and as a warning of the
transliteration, an additional qualifiers should be appended to the
locale name. If the encoding does not cover all of Unicode (say
de_DE.ISO-8859-15), then you have to expect transliteration to take
place as soon as you output a wide character that is not covered by say
ISO 8859-15.
You want to control the locale for communication with your subprocesses
anyway for a variety of other reasons, so that shouldn't be in practice
an additional burden.
> - sending strings to a library, e.g. mutt talking to curses
Curses will know about the locale and should act properly accordingly,
just like the normal application.
> I hope at least that wcwidth(wc) gives the appropriate result, i.e.
> wcwidth(L'å') is 2 if 'å' is going to be transcribed as "aa" by
> wctomb.
I certainly hope so as well, and I shall file loads of bug reports and
patches if it does not on platforms that I run into.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/