[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth and glibc 2.2
Bruno Haible wrote on 2001-01-16 15:14 UTC:
> > http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>
> It is indeed possible to get away with two different tables: one for
> the UTF-8 and GB18030 locales, and one for BIG5, CP949, EUC-JP,
> EUC-KR, EUC-TW, GB2312, GBK, JOHAB locales.
Shouldn't GB18030 be upwards compatible with GBK and hence also use the
CJK varirant of wcwidth()?
> But your wcwidth_cjk() function needs more modifications. It differs
> from the EUC-JP wcwidth in more than 200 values.
Thanks for your findings! This needs more investigation to distinguish
the following three cases:
a) http://www.unicode.org/Public/3.1-Update/EastAsianWidth-4d3.beta.txt
needs to be modified
b) glibc EUC-JP wcwidth() needs to be modified
c) the rules according to which I derived by wcwidth[_cjk]() from
the unicode.org data need to be modified
I derived my wcwidth() directly from there, by assigning all "Ambiguous"
characters a width of 2, and it would be nice if I could keep my
wcwidth() definition in a form such that it can be directly derived from
unicode.org databases using the steps specified in the comments in
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/