[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth and glibc 2.2
Pablo Saratxaga wrote on 2001-01-12 12:07 UTC:
> glibc 2.2 locales should do it; at least the source files for the locales
> have character width information; eg for TIS-620 charset definition:
>
> WIDTH
> <U0E31> 0
> <U0E34>..<U0E3A> 0
> <U0E47>..<U0E4E> 0
> END_WIDTH
>
> and from EUC-JP definition:
>
> WIDTH
> <U3000>..<U7199> 2
> <U02D8>..<U9FA5> 2
> END_WIDTH
>
> uh? isn't the first line completly useless in such case?
Yes, obviously.
> Isn't it even very wrong ? (that couldexplain maybe why all programs launched
> with LC_CTYPE=ja are segfaulting on my computer...)
>
> UTF-8 definition has a much longer and complex list; with, among other
> values:
>
> <U0E31> 0
> <U0E34>..<U0E3A> 0
> <U0E47>..<U0E4E> 0
> ...
All this seems like a quick HACK, perhaps motivated by the idea of
keeping the locale files small (even though the difference is negligible
and full wcwidth() can be implemented highly efficiently). I would much
favour if glibc had only two wcwidth definitions and used these in *all*
locales, irrespective of the encoding. The first should be the one used
by xterm (EastAsian Ambiguous character -> 1), the other one should be
for accommodating EUC-CJK/kterm/ etc. backwards compatibility (EastAsian
Ambiguous character -> 1). There is no reason, why wcwidth() should
depend on the selected multi-byte encoding. I expect the CJK legacy
standards to be extended with all Unicode characters in the foreseeable
future (China has already done exactly that with GB18030), so simplified
wcwidth() hacks for legacy encodings are a dead end anyway.
Please try to investigate your "all programs launched with LC_CTYPE=ja
are segfaulting on my computer" before glibc 2.2.1 is released (2nd
pre-release already out).
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/