[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth and glibc 2.2
> Pablo Saratxaga wrote on 2001-01-12 12:07 UTC:
> > glibc 2.2 locales should do it; at least the source files for the locales
> > have character width information; eg for TIS-620 charset definition:
> >
> > WIDTH
> > <U0E31> 0
> > <U0E34>..<U0E3A> 0
> > <U0E47>..<U0E4E> 0
> > END_WIDTH
> >
> > and from EUC-JP definition:
> >
> > WIDTH
> > <U3000>..<U7199> 2
> > <U02D8>..<U9FA5> 2
> > END_WIDTH
Where did you get this from? In the glibc-2.2 sources a
tree-dot-ellipsis is used, not a two-dot-ellipsis.
Markus Kuhn writes:
> > uh? isn't the first line completly useless in such case?
>
> Yes, obviously.
No it isn't. Read in ISO 14652 draft (n801.pdf) about "absolute ellipsis".
In the EUC-JP definition, the first line says that all JIS X 0208
characters shall have width 2, and the second line says that all JIS X
0212 characters shall have width 2. There is no redundancy.
> I would much favour if glibc had only two wcwidth definitions and
> used these in *all* locales, irrespective of the encoding.
The wcwidth definition in the EUC-JP locale is given by the fact, that
outputting a JIS X 0208 or JIS X 0212 character in kterm or similar
terminal emulators needs two columns. You can't change that just
because of xterm. It's xterm which has to adapt to these locales.
> There is no reason, why wcwidth() should depend on the selected
> multi-byte encoding.
The reason is legacy. People which don't want this legacy (double-width
cyrillic etc.) will use UTF-8 locales.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/