[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fw: Document Action: UTF-16, an encoding of ISO 10646 to Informational



> The IETF policy on character sets and languages [CHARPOLICY] says that
> IETF protocols MUST be able to use the UTF-8 character encoding scheme
> [UTF-8]. Although UTF-8 has many beneficial properties, such as the
> direct encoding of US-ASCII characters, re-synchronization after loss
> of octets and immunity to the byte-order issue (see 3.1 below), it is
> less dense than UTF-16 for characters whose values are between 0x0800
  ^^^^^^^^^^

UTF-16 wastes less space on Han characters, but more on Latin characters.

> and 0xFFFF. Some products and network standards already specify
                                                  ^^^^^^^ Why does IETF
insinuate that there is a trend in this direction?

> UTF-16, making it an important encoding for the Internet.

What is the ratio of Latin vs Han characters on the Internet?  And isn't
the Internet the place, where the byte-order issue becomes most painful?

--
phm

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/