[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wcwidth and glibc 2.2



> Pablo Saratxaga wrote on 2001-01-12 12:07 UTC:
> > glibc 2.2 locales should do it; at least the source files for the locales
> > have character width information; eg for TIS-620 charset definition:
> > 
> > WIDTH
> > <U0E31>		0
> > <U0E34>..<U0E3A>	0
> > <U0E47>..<U0E4E>	0
> > END_WIDTH
> > 
> > and from EUC-JP definition:
> > 
> > WIDTH
> > <U3000>..<U7199>	2
> > <U02D8>..<U9FA5>	2
> > END_WIDTH

Where did you get this from? In the glibc-2.2 sources a
tree-dot-ellipsis is used, not a two-dot-ellipsis.

Markus Kuhn writes:
> > uh? isn't the first line completly useless in such case?
> 
> Yes, obviously.

No it isn't. Read in ISO 14652 draft (n801.pdf) about "absolute ellipsis".
In the EUC-JP definition, the first line says that all JIS X 0208
characters shall have width 2, and the second line says that all JIS X
0212 characters shall have width 2. There is no redundancy.

> I would much favour if glibc had only two wcwidth definitions and
> used these in *all* locales, irrespective of the encoding.

The wcwidth definition in the EUC-JP locale is given by the fact, that
outputting a JIS X 0208 or JIS X 0212 character in kterm or similar
terminal emulators needs two columns. You can't change that just
because of xterm. It's xterm which has to adapt to these locales.

> There is no reason, why wcwidth() should depend on the selected
> multi-byte encoding.

The reason is legacy. People which don't want this legacy (double-width
cyrillic etc.) will use UTF-8 locales.

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/