[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Revision of UTF-8 history in draft-yergeau-rfc2279bis-05.txt
Followup to: <E37E01957949D611A4C30008C7E691E2E64B42@xxxxxxxxxxxxxxxxxxxxxx>
By author: "Hart, Edwin F." <Edwin.Hart@xxxxxxxxxx>
In newsgroup: linux.utf8
>
> I'd like to make an observation. According to Markus Kuhn, Ken Thompson
> designed UTF-8. This is not quite true. Ken Thompson (according to Markus)
> designed FSS-UTF. Although, the 10646 Working Group based the design of
> UTF-8 on FSS-UTF, UTF-8 and FSS-UTF are very similar but not the same.
>
> As I recall, the ISO/IEC 10646 Working Group was aware of the X-Open,
> FSS-UTF. UTF-8 is a variation of FSS-UTF but not the exact algorithm of
> FSS-UTF. UTF-8 accounted for the surrogates of UTF-16 by forcing a
> conversion of any text encoded with UTF-16 to UCS-4 (32-bit form) and then
> converting text encoded in UCS-4 to UTF-8. This modification made it
> illegal to convert the 1024 surrogate code points of 10646/Unicode to UTF-8.
> Part of the confusion today is that some vendors implemented FSS-UTF but
> called it UTF-8. UTF-8 is not FSS-UTF. and FSS-UTF is not UTF-8.
>
There is no difference. This is just codifying the braindamage of
UTF-16 and its impact on other encoding form, in particular making it
clear that surrogates are not to be recursively encoded.
-hpa
--
<hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/