[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
> > For a number of languages, the UTF-8 representation saves some
> > storage when compared with UTF-16, but for Asian characters UTF-8
> > requires 50% more storage than UTF-16.
>
> Yes, it does. And for English and German UTF-16 requires 100% more
> storage than UTF-8.
You can use SCSU to compress your data. It works with short strings
also (which is not true for generic compression algorithms like LZW).
The Technical Report #6 (http://www.unicode.org/unicode/reports/tr6/)
gives the following examples:
UTF-16 German: 9 chars (18 Bytes) -> SCSU 9 Bytes
Russian: 6 chars (12 Bytes) -> 7 Bytes
Japanese: 116 chars (232 Bytes) -> 178 Bytes
Werner
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/