[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Linux console UTF-8 by default
Roozbeh Pournader wrote on 2004-01-11 14:15 UTC:
> On Sat, 2004-01-10 at 23:51, Edward H. Trager wrote:
> > I guess I was recalling (from http://www.cl.cam.ac.uk/~mgk25/unicode.html)
> > that six bytes allows encoding all possible
> > 2^31 UCS code points, although
> > I suppose nothing above plane 1 has been defined.
>
> 1. That page is a little out of date (although a wonderful resource).
I don't think there is anything out of date:
"The definitions of UTF-8 in UCS and Unicode differed originally
slightly, because in UCS, up to 6-byte long UTF-8 sequences were
possible to represent characters up to U-7FFFFFFF, while in Unicode only
up to 4-byte long UTF-8 sequences are defined to represent characters up
to U-0010FFFF."
The 21-bit limit is definitely described after the reader first gets an
introduction to UTF-8 that reflects its original ISO definition.
Markus
--
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/