[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Transliteration for use in UTF-8 locales



Edmund GRIMLEY EVANS wrote on 2000-10-13 13:58 UTC:
> The sort of cases I'm worrying about are:
> 
>  - sending data to a child process through command-line arguments and
>    pipes, e.g. mutt talking to gpg

Pick a suitable locale for communicating with the child process (e.g.,
C.UTF-8), set the locale before you send out the data, and tell the
locale to the child process in the LC_CTYPE environment variable that
you provide in the execle() call.

I suggest the (I think obvious and sensible) convention that if locales
end in the name with ".UTF-8", they shall never use transliteration. But
it would be nice to have additional locales such as
de_DE.UTF-8@romanized that do transliteration, and as a warning of the
transliteration, an additional qualifiers should be appended to the
locale name. If the encoding does not cover all of Unicode (say
de_DE.ISO-8859-15), then you have to expect transliteration to take
place as soon as you output a wide character that is not covered by say
ISO 8859-15.

You want to control the locale for communication with your subprocesses
anyway for a variety of other reasons, so that shouldn't be in practice
an additional burden.

>  - sending strings to a library, e.g. mutt talking to curses

Curses will know about the locale and should act properly accordingly,
just like the normal application.

> I hope at least that wcwidth(wc) gives the appropriate result, i.e.
> wcwidth(L'å') is 2 if 'å' is going to be transcribed as "aa" by
> wctomb.

I certainly hope so as well, and I shall file loads of bug reports and
patches if it does not on platforms that I run into.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/