[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wcwidth and glibc 2.2



Can anybody give me a "what's up" on which LC_  I should define to do
unicode....I'm running redhat 7.0 and what to be able to display CJK characters
easily.. are there certain fonts I need installed, how do I check to see which
ones I have installed, do I use wprintf, or what... if I do a setlocale(LC_ALL,
"Japanese") MB_CUR_MAX is 3.... what do I need to setlocale to to get MB_CUR_MAX
to 2 ......  any info would be helpful... thanks

Dennis



Markus Kuhn wrote:

> Pablo Saratxaga wrote on 2001-01-12 12:07 UTC:
> > glibc 2.2 locales should do it; at least the source files for the locales
> > have character width information; eg for TIS-620 charset definition:
> >
> > WIDTH
> > <U0E31>               0
> > <U0E34>..<U0E3A>      0
> > <U0E47>..<U0E4E>      0
> > END_WIDTH
> >
> > and from EUC-JP definition:
> >
> > WIDTH
> > <U3000>..<U7199>      2
> > <U02D8>..<U9FA5>      2
> > END_WIDTH
> >
> > uh? isn't the first line completly useless in such case?
>
> Yes, obviously.
>
> > Isn't it even very wrong ? (that couldexplain maybe why all programs launched
> > with LC_CTYPE=ja are segfaulting on my computer...)
> >
> > UTF-8 definition has a much longer and complex list; with, among other
> > values:
> >
> > <U0E31>               0
> > <U0E34>..<U0E3A>      0
> > <U0E47>..<U0E4E>      0
> > ...
>
> All this seems like a quick HACK, perhaps motivated by the idea of
> keeping the locale files small (even though the difference is negligible
> and full wcwidth() can be implemented highly efficiently). I would much
> favour if glibc had only two wcwidth definitions and used these in *all*
> locales, irrespective of the encoding. The first should be the one used
> by xterm (EastAsian Ambiguous character -> 1), the other one should be
> for accommodating EUC-CJK/kterm/ etc. backwards compatibility (EastAsian
> Ambiguous character -> 1). There is no reason, why wcwidth() should
> depend on the selected multi-byte encoding. I expect the CJK legacy
> standards to be extended with all Unicode characters in the foreseeable
> future (China has already done exactly that with GB18030), so simplified
> wcwidth() hacks for legacy encodings are a dead end anyway.
>
> Please try to investigate your "all programs launched with LC_CTYPE=ja
> are segfaulting on my computer" before glibc 2.2.1 is released (2nd
> pre-release already out).
>
> Markus
>
> --
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
>
> -
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/lists/

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/