[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wcwidth and glibc 2.2



Bruno Haible wrote on 2001-01-16 18:01 UTC:
> Markus Kuhn writes:
> > Shouldn't GB18030 be upwards compatible with GBK and hence also use the
> > CJK varirant of wcwidth()?
> 
> The wcwidth of GB18030 cannot be compatible with GB2312 and UTF-8
> simultaneously. Given the structure of the encoding, I assumed
> compatibility with UTF-8 is more important.

Given the sole purpose of GB18030, I assumed GBK compatibility is more
important. May be we need GB18030 locales with both wcwidth notions
here ... (or <activate-true-answer-mode> maybe we don't need GB18030 ;-)

> Does anyone have a copy of the GB18030 standard?

Me too, in English please.

> >   a) http://www.unicode.org/Public/3.1-Update/EastAsianWidth-4d3.beta.txt
> >      needs to be modified
> > 
> >   b) glibc EUC-JP wcwidth() needs to be modified
> > 
> >   c) the rules according to which I derived by wcwidth[_cjk]() from
> >      the unicode.org data need to be modified
> 
> It's (c). CJK wcwidth must implement legacy behaviour. You cannot
> assume that the unicode.org material will give the right answer here.

But the Unicode EastAsianWidth Annex was specifically written for the
purpose of documenting the width semantics of Unicode characters in CJK
legacy sets. Please read

  http://www.unicode.org/unicode/reports/tr11/

in particular section 6, with which my implementation tries to comply.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/