[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode, character ambiguities



>What, exactly, needs to be done by an application (or rather, its data
>formats) to accomodate CJK in Unicode (and other languages with similar
>ambiguities)?

The point of Unicode is to avoid such ambiguities, although some archic
scripts (Coptic, Old Italic) have been unified despite that.

>Is it generally important or useful to be able to change language mid-
>sentence?  (It's much simpler to store a single language for a whole data
>element, and it's much easier to render.)

Depends on what you're doing. It can be useful in a word-processor, but
in something like Ogg tags it's generally not nessecary.

>One of them appears to consider Unicode
>currently useless for real-world data exchange in CJK, and believes this
>to be a consensus among Asian users.  

A lot of Japanese users believe this, and a few Chinese. Most Chinese
seem to be happy, and I've never heard a Korean complaint.

>What other languages have similar problems?  Something was mentioned
>about Russian, as well.  What fixes do they need?

There's one character whose form in Russian and Serbian (?) in italics is
different.

The problem with RFC2047, is that 70% of implementations are going to 
recode everything into UTF-8 anyway, 5% are going to recognize a handful
of charsets and display them, and 25% are going to just display it as 
ASCII. What's the win?
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/