[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth() implementation
Ulrich Drepper wrote on 2000-02-08 16:55 UTC:
> Markus Kuhn <Markus.Kuhn@xxxxxxxxxxxx> writes:
>
> > Attached is my public domain implementation of the wcwidth() and
> > wcswidth() functions. I hope you will find it useful for inclusion into
> > glibc,
>
> Why? I have a correct implementation in glibc.
Great! Have you already released a beta test version that we can test
thoroughly? Where?
> It is not based on
> some obscure hardcoded algororithm but instead allows locale-dependent
> information.
Good! Where?
> Hardcoding is really unsuitable since not all character
> sets agree with UCS about the width.
I wrote this code for several reasons:
- In the context of Robert's work on xterm, I wanted to propose a standard
of how charcell terminal emulators should interpret UCS with regard
to the width of characters. Neither UCS nor Unicode provide a useful
guideline here, therefore I decided to propose an unambiguous one.
I hope that you find it practical and acceptable and that in the
usual UTF-8 locales, your wcwidth and mine will behave *exactly*
identical.
- I do fully understand that for legacy applications some CJK users
might perhaps want to use locales in which the width is compatible to
JIS X 201 or similar standards. However, this will cause many Greek and
Cyrillic letters to become wide, as will some math characters.
This is generally *not* acceptable for non-CJK users and therefore
we have to agree on a practical unambiguous wcwidth() standard
for locales such as en.UTF-8.
- Once you have announced glibc 2.2beta here, I will conduct a thorough
stress test of all Unicode related functions, and for that it is useful
to have documented in the form of sample implementation the precise
functionality that I expect then to provide in the end for certain
locales.
- Many application authors feel very uncomfortable about using functions
that
- will not be available on many non GNU libc systems for quite some time
- are even in the current glibc releases still vapourware
We want to have a clear semantics for terminal emulators now on which
application developers can build, and this clear semantics should also
be easy to implement on systems where either wcwidth() or a suitable
locale are not available. That is also what Bruno Haible's UTF-8
and iconv() libraries are all about. If in five years C libraries
everywhere have fully implemented these functions, then the independent
portable locale-independent implementations might become obsolete.
My implementation is a bit too simple and straightforward to deserve the
attribute "obscure", especially as it contains the precise documentation
of how its output can be derived from the Unicode 3.0 database. It also
performs rather well in both space and time.
A warning to people working on terminal emulators: Note that Unicode
does also contain a few wide combining characters (U+302A .. U+302F,
U+3099, U+309A). wcwidth() will return 0 on these, so you will have to
look at something else to determine whether to take this character from
the narrow or wide font. I recommend to take combining characters always
from the same font from which the base character came. I have just added
U+3099 and U+309A to 18x18ja and 12x13ja to allow you to test this case
as well.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/