[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
- To: Andrew Cunningham <andjc@xxxxxxxxxxxxxx>
- Subject: Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
- From: Jamie Lokier <egcs@xxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 4 Aug 2000 15:20:56 +0200
- Cc: linux-utf8@xxxxxxxxxxxx, sap-list@xxxxxxxxxx, gcc@xxxxxxxxxxx, libc-hacker@xxxxxxxxxxxxxxxxxx, "Nuesser, Wilhelm" <wilhelm.nuesser@xxxxxxx>, "Rohland, Hans-Christoph" <hans-christoph.rohland@xxxxxxx>
- In-reply-to: <000e01bffe12$b0075440$7dd2223f@libadmin>; from andjc@ozemail.com.au on Fri, Aug 04, 2000 at 10:51:18PM +1000
- References: <816D93CCC927D31188570008C75D1DE1011A0BDF@dbwdfx1a.wdf.sap-ag.de> <000e01bffe12$b0075440$7dd2223f@libadmin>
- Reply-to: linux-utf8@xxxxxxxxxxxx
- Sender: owner-linux-utf8@xxxxxxxxxxxx
Andrew Cunningham wrote:
> any implimentation of utf-16 must include the capacity to correctly handle
> valid surrogate pairs. You cann't restrict utf-16 characters to 2-bytes.
That's way conversion from utf-16 to utf-32 should be analogous to
conversion from utf-8 to wchar_t, à la mbtowcs. Etc. The rules about
character by character processing apply. You may wish to use utf32_t
for the intermediate characters, e.g. in a simple parser.
-- Jamie
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/