[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fw: Document Action: UTF-16, an encoding of ISO 10646 to Informational
> The IETF policy on character sets and languages [CHARPOLICY] says that
> IETF protocols MUST be able to use the UTF-8 character encoding scheme
> [UTF-8]. Although UTF-8 has many beneficial properties, such as the
> direct encoding of US-ASCII characters, re-synchronization after loss
> of octets and immunity to the byte-order issue (see 3.1 below), it is
> less dense than UTF-16 for characters whose values are between 0x0800
^^^^^^^^^^
UTF-16 wastes less space on Han characters, but more on Latin characters.
> and 0xFFFF. Some products and network standards already specify
^^^^^^^ Why does IETF
insinuate that there is a trend in this direction?
> UTF-16, making it an important encoding for the Internet.
What is the ratio of Latin vs Han characters on the Internet? And isn't
the Internet the place, where the byte-order issue becomes most painful?
--
phm
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/