[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Linux console UTF-8 by default
On Sun, Jan 11, 2004 at 05:45:19PM +0330, Roozbeh Pournader wrote:
> On Sat, 2004-01-10 at 23:51, Edward H. Trager wrote:
> > I guess I was recalling (from http://www.cl.cam.ac.uk/~mgk25/unicode.html)
> > that six bytes allows encoding all possible
> > 2^31 UCS code points, although
> > I suppose nothing above plane 1 has been defined.
>
> 1. That page is a little out of date (although a wonderful resource).
>
> 2. Although UCS theoretically allows 2^31 code points, it will never
> encode any character higher than U+10FFFF.
Well, you can never tell. I know that Sc2/WG2 has said that they will
never allocate something above the 21th bit, but then again they said
they would never reallocate characters, and then they did it anyway.
I would say: "be liberal in what you accept, and conservative in what
you generate", and thus accept valid UTF-8 until the 31 bit.
I also think there is code around to handle full UTF-8, so that is not
an extra burden to do it.
Best regards
Keld
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/